Browsing by Author "Erzin, E."

Now showing 1 - 7 of 7

Metadata only
Formant position based weighted spectral features for emotion recognition
(Elsevier, 2011) Bozkurt, E.; Erzin, E.; Eroğlu Erdem, Ç.; Erdem, Tanju; Computer Science; ERDEM, Arif Tanju
In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the normalized inverse harmonic mean function of the line spectral frequency (LSF) features, which are known to be localized around formant frequencies. The above approach can be considered as an early data fusion of spectral content and formant location information. We also investigate methods for late decision fusion of unimodal classifiers. We evaluate the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. The results show that unimodal classifiers with the WMFCC features perform significantly better than the classifiers with standard spectral features. Late decision fusion of classifiers provide further significant performance improvements.
Metadata only
Improving automatic emotion recognition from speech signals
(International Speech Communications Association, 2009) Bozkurt, E.; Erzin, E.; Eroğlu Erdem, Ç.; Erdem, Tanju; Computer Science; ERDEM, Arif Tanju
We present a speech signal driven emotion recognition system. Our system is trained and tested with the INTERSPEECH 2009 Emotion Challenge corpus, which includes spontaneous and emotionally rich recordings. The challenge includes classifier and feature sub-challenges with five-class and two-class classification problems. We investigate prosody related, spectral and HMM-based features for the evaluation of emotion recognition with Gaussian mixture model (GMM) based classifiers. Spectral features consist of mel-scale cepstral coefficients (MFCC), line spectral frequency (LSF) features and their derivatives, whereas prosody-related features consist of mean normalized values of pitch, first derivative of pitch and intensity. Unsupervised training of HMM structures are employed to define prosody related temporal features for the emotion recognition problem. We also investigate data fusion of different features and decision fusion of different classifiers, which are not well studied for emotion recognition framework. Experimental results of automatic emotion recognition with the INTERSPEECH 2009 Emotion Challenge corpus are presented.
Metadata only
INTERSPEECH 2009 duygu tanıma yarışması değerlendirmesi
(IEEE, 2010) Bozkurt, E.; Erzin, E.; Eroğlu Erdem, Ç.; Erdem, Tanju; Computer Science; ERDEM, Arif Tanju
Bu makalede INTERSPEECH 2009 Duygu Tanıma Yarışması sonuçlarını değerlendiriyoruz. Yarışmanın sunduğu problem doğal ve duygu bakımından zengin FAU Aibo konuşma kayıtlarının beş ve iki duygu sınıfına en doğru şekilde ayrılmasıdır. Bu problemi çözmek için bürün ilintili, spektral ve SMM-temelli (sakl Markov model) öznitelikleri Gauss Bileşen Model (GBM) sınıflandırıcılar ile inceliyoruz. Spektral öznitelikler, Mel frekans kepstral katsayıların (MFKK), doru spektral frekans (DSF) katsayılarını ve bunların türevlerini içerirken, bürün öznitelikleri perde, perdenin birinci türevi ve enerjiden oluşuyor. Bürün ilintili özniteliklerin zamanla değimini tanımlayan SMM özniteliklerini, güdümsüz eğitilen SMM yapılar ile elde ediyoruz. Ayrıca, konuşmadan duygu tanıma sonuçların iyileştirmek için farklı özniteliklerin veri kaynaşımın ve farklı sınıflandırıcıların karar kaynaşımını da inceliyoruz. İki aşamalı karar kaynaşım yöntemimiz beş ve iki sınıflı problemler için sırasıyla,% 41.59 ve %67.90 başarım oranını ve tüm yarışma sonuçları arasında 2. ve 4. sırayı elde etti .
Metadata only
RANSAC-based training data selection for emotion recognition from spontaneous speech
(ACM, 2010) Eroğlu Erdem, Ç.; Bozkurt, E.; Erzin, E.; Erdem, Tanju; Computer Science; ERDEM, Arif Tanju
Training datasets containing spontaneous emotional expressions are often imperfect due the ambiguities and difficulties of labeling such data by human observers. In this paper, we present a Random Sampling Consensus (RANSAC) based training approach for the problem of emotion recognition from spontaneous speech recordings. Our motivation is to insert a data cleaning process to the training phase of the Hidden Markov Models (HMMs) for the purpose of removing some suspicious instances of labels that may exist in the training dataset. Our experiments using HMMs with various number of states and Gaussian mixtures per state indicate that utilization of RANSAC in the training phase provides an improvement of up to 2.84% in the unweighted recall rates on the test set. This improvement in the accuracy of the classifier is shown to be statistically significant using McNemar’s test.
Open Access
RANSAC-based training data selection for speaker state recognition
(The International Speech Communications Association, 2011) Bozkurt, E.; Erzin, E.; Erdem, Ç. E.; Erdem, Tanju; Computer Science; ERDEM, Arif Tanju
We present a Random Sampling Consensus (RANSAC) based training approach for the problem of speaker state recognition from spontaneous speech. Our system is trained and tested with the INTERSPEECH 2011 Speaker State Challenge corpora that includes the Intoxication and the Sleepiness Subchallenges, where each sub-challenge defines a two-class classification task. We aim to perform a RANSAC-based training data selection coupled with the Support Vector Machine (SVM) based classification to prune possible outliers, which exist in the training data. Our experimental evaluations indicate that utilization of RANSAC-based training data selection provides 66.32 % and 65.38 % unweighted average (UA) recall rate on the development and test sets for the Sleepiness Sub-challenge, respectively and a slight improvement on the Intoxicationubchallenge performance.
Metadata only
RANSAC-based training data selection on spectral features for emotion recognition from spontaneous speech
(Springer International Publishing, 2011) Bozkurt, E.; Erzin, E.; Erdem, Tanju; Eroğlu Erdem, Ç.; Computer Science; ERDEM, Arif Tanju
Training datasets containing spontaneous emotional speech are often imperfect due the ambiguities and difficulties of labeling such data by human observers. In this paper, we present a Random Sampling Consensus (RANSAC) based training approach for the problem of emotion recognition from spontaneous speech recordings. Our motivation is to insert a data cleaning process to the training phase of the Hidden Markov Models (HMMs) for the purpose of removing some suspicious instances of labels that may exist in the training dataset. Our experiments using HMMs with Mel Frequency Cepstral Coefficients (MFCC) and Line Spectral Frequency (LSF) features indicate that utilization of RANSAC in the training phase provides an improvement in the unweighted recall rates on the test set. Experimental studies performed over the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF and MFCC based classifiers provide further significant performance improvements.
Open Access
Use of line spectral frequencies for emotion recognition from speech
(IEEE, 2010) Bozkurt, E.; Erzin, E.; Eroğlu Erdem, Ç.; Erdem, Tanju; Computer Science; ERDEM, Arif Tanju
We propose the use of the line spectral frequency (LSF) features for emotion recognition from speech, which have not been been previously employed for emotion recognition to the best of our knowledge. Spectral features such as mel-scaled cepstral coefficients have already been successfully used for the parameterization of speech signals for emotion recognition. The LSF features also offer a spectral representation for speech, moreover they carry intrinsic information on the formant structure as well, which are related to the emotional state of the speaker. We use the Gaussian mixture model (GMM) classifier architecture, that captures the static color of the spectral features. Experimental studies performed over the Berlin Emotional Speech Database and the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF features bring a consistent improvement over the MFCC based emotion classification rates.