Computer Science
Permanent URI for this collectionhttps://hdl.handle.net/10679/43
Browse
Browsing by Issue Date
Now showing 1 - 20 of 542
- Results Per Page
- Sort Options
Conference paperPublication Metadata only Improving automatic emotion recognition from speech signals(International Speech Communications Association, 2009) Bozkurt, E.; Erzin, E.; Eroğlu Erdem, Ç.; Erdem, Tanju; Computer Science; ERDEM, Arif TanjuWe present a speech signal driven emotion recognition system. Our system is trained and tested with the INTERSPEECH 2009 Emotion Challenge corpus, which includes spontaneous and emotionally rich recordings. The challenge includes classifier and feature sub-challenges with five-class and two-class classification problems. We investigate prosody related, spectral and HMM-based features for the evaluation of emotion recognition with Gaussian mixture model (GMM) based classifiers. Spectral features consist of mel-scale cepstral coefficients (MFCC), line spectral frequency (LSF) features and their derivatives, whereas prosody-related features consist of mean normalized values of pitch, first derivative of pitch and intensity. Unsupervised training of HMM structures are employed to define prosody related temporal features for the emotion recognition problem. We also investigate data fusion of different features and decision fusion of different classifiers, which are not well studied for emotion recognition framework. Experimental results of automatic emotion recognition with the INTERSPEECH 2009 Emotion Challenge corpus are presented.Conference paperPublication Metadata only Gauss karışım modeli tabanlı konuşmacı belirleme sistemlerinde klasik MAP uyarlanması yönteminin performans analizi(IEEE, 2010) Erdoğan, A.; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, CenkGaussian mixture models (GMM) is one of the most commonly used methods in text-independent speaker identification systems. In this paper, performance of the GMM approach has been measured with different parameters and settings. Voice activity detection (VAD) component has been found to have a significant impact on the performance. Therefore, VAD algorithms that are robust to background noise have been proposed. Significant differences in performance have been observed between male and female speakers and GSM/PSTN channels. Moreover, single-stream GMM approach has been found to perform significantly better than the multi-stream GMM approach. It has been observed under all conditions that data duration is critical for good performance.Conference paperPublication Metadata only INTERSPEECH 2009 duygu tanıma yarışması değerlendirmesi(IEEE, 2010) Bozkurt, E.; Erzin, E.; Eroğlu Erdem, Ç.; Erdem, Tanju; Computer Science; ERDEM, Arif TanjuBu makalede INTERSPEECH 2009 Duygu Tanıma Yarışması sonuçlarını değerlendiriyoruz. Yarışmanın sunduğu problem doğal ve duygu bakımından zengin FAU Aibo konuşma kayıtlarının beş ve iki duygu sınıfına en doğru şekilde ayrılmasıdır. Bu problemi çözmek için bürün ilintili, spektral ve SMM-temelli (sakl Markov model) öznitelikleri Gauss Bileşen Model (GBM) sınıflandırıcılar ile inceliyoruz. Spektral öznitelikler, Mel frekans kepstral katsayıların (MFKK), doru spektral frekans (DSF) katsayılarını ve bunların türevlerini içerirken, bürün öznitelikleri perde, perdenin birinci türevi ve enerjiden oluşuyor. Bürün ilintili özniteliklerin zamanla değimini tanımlayan SMM özniteliklerini, güdümsüz eğitilen SMM yapılar ile elde ediyoruz. Ayrıca, konuşmadan duygu tanıma sonuçların iyileştirmek için farklı özniteliklerin veri kaynaşımın ve farklı sınıflandırıcıların karar kaynaşımını da inceliyoruz. İki aşamalı karar kaynaşım yöntemimiz beş ve iki sınıflı problemler için sırasıyla,% 41.59 ve %67.90 başarım oranını ve tüm yarışma sonuçları arasında 2. ve 4. sırayı elde etti .Conference paperPublication Metadata only E-Cube: multi-dimensional event sequence processing using concept and pattern hierarchies(IEEE, 2010) Liu, M.; Rundensteiner, E. A.; Greenfield, K.; Gupta, C.; Wang, S.; Arı, İsmail; Mehta, A.; Computer Science; ARI, IsmailMany modern applications including tag based mass transit systems, RFID-based supply chain management systems and online financial feeds require special purpose event stream processing technology to analyze vast amounts of sequential multi-dimensional data available in real-time data feeds. Traditional online analytical processing (OLAP) systems are not designed for real-time pattern-based operations, while Complex Event Processing (CEP) systems are designed for sequence detection and do not support OLAP operations. We will demonstrate a novel E-Cube model that combines CEP and OLAP techniques for multi-dimensional event pattern analysis at different abstraction levels. A London transit scenario will be given to demonstrate the utility and performance of this proposed technology.Conference paperPublication Metadata only Authoring and presentation tools for distance learning over interactive TV(ACM, 2010) Gürel, T. C.; Erdem, Tanju; Kermen, A.; Özkan, M. K.; Eroğlu Erdem, Ç.; Computer Science; ERDEM, Arif TanjuWe present a complete system for distance learning over interactive TV with novel tools for authoring and presentation of lectures and exams, and evaluation of student and system performance. The main technological contributions of the paper include the development of plug-in software so that PowerPoint can be used to prepare presentations for the set-top-box, a software tool to convert PDF documents containing multiple-choice questions into interactive exams, and a virtual teacher whose facial animation is automatically generated from speech.Conference paperPublication Metadata only RANSAC-based training data selection for emotion recognition from spontaneous speech(ACM, 2010) Eroğlu Erdem, Ç.; Bozkurt, E.; Erzin, E.; Erdem, Tanju; Computer Science; ERDEM, Arif TanjuTraining datasets containing spontaneous emotional expressions are often imperfect due the ambiguities and difficulties of labeling such data by human observers. In this paper, we present a Random Sampling Consensus (RANSAC) based training approach for the problem of emotion recognition from spontaneous speech recordings. Our motivation is to insert a data cleaning process to the training phase of the Hidden Markov Models (HMMs) for the purpose of removing some suspicious instances of labels that may exist in the training dataset. Our experiments using HMMs with various number of states and Gaussian mixtures per state indicate that utilization of RANSAC in the training phase provides an improvement of up to 2.84% in the unweighted recall rates on the test set. This improvement in the accuracy of the classifier is shown to be statistically significant using McNemar’s test.Conference paperPublication Open Access Use of line spectral frequencies for emotion recognition from speech(IEEE, 2010) Bozkurt, E.; Erzin, E.; Eroğlu Erdem, Ç.; Erdem, Tanju; Computer Science; ERDEM, Arif TanjuWe propose the use of the line spectral frequency (LSF) features for emotion recognition from speech, which have not been been previously employed for emotion recognition to the best of our knowledge. Spectral features such as mel-scaled cepstral coefficients have already been successfully used for the parameterization of speech signals for emotion recognition. The LSF features also offer a spectral representation for speech, moreover they carry intrinsic information on the formant structure as well, which are related to the emotional state of the speaker. We use the Gaussian mixture model (GMM) classifier architecture, that captures the static color of the spectral features. Experimental studies performed over the Berlin Emotional Speech Database and the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF features bring a consistent improvement over the MFCC based emotion classification rates.Conference paperPublication Open Access Processing nested complex sequence pattern queries over event streams(ACM, 2010) Liu, M.; Ray, M.; Rundensteiner, E. A.; Dougherty, D. J.; Gupta, C.; Wang, S.; Arı, İsmail; Mehta, A.; Computer Science; ARI, IsmailComplex event processing (CEP) has become increasingly important for tracking and monitoring applications ranging from healthcare, supply chain management to surveillance. These monitoring applications submit complex event queries to track sequences of events that match a given pattern. As these systems mature the needfor increasingly complex nested sequence queries arises, while thestate-of-the-art CEP systems mostly focus on the execution of flat sequence queries only. In this paper, we now introduce an iterative execution strategy for nested CEP queries composed of sequence, negation, AND and OR operators. Lastly the promise of applying selective caching of intermediate results to optimize the execution. Our experimental study using real-world stock trades evaluates the performance of our proposed iterative execution strategy for differentquery types.Technical reportPublication Open Access Relating Staged Computation to the Record Calculus(Özyeğin University, 2010-09-06) Aktemur, Tankut Barış; Choi, W.; Computer Science; AKTEMUR, Tankut BarişIt has been previously shown that there is a close relation between record calculus and program generation (e.g. Lisp-like quasiquotations): A translation has been defined to convert staged expressions to record calculus expressions, and it has been shown that the call-by-value semantics of the staged and the record calculi are equivalent modulo the translation and admin reductions. In this work, we investigate the relation further. The contributions are twofold: (1) We fine-tune the previously shown relation between the two operational semantics, and obtain more precise results. In particular, we show that only two kinds of admin reductions suffice, and these reductions can be applied exhaustively. (2) We define a reverse translation that converts record calculus expressions back to the staged calculus, allowing us to go back and forth between the two calculi. We believe that these results provide an important step towards reusing already-existing record calculus static analyses to reason about staged expressions.Book ChapterPublication Metadata only RANSAC-based training data selection on spectral features for emotion recognition from spontaneous speech(Springer International Publishing, 2011) Bozkurt, E.; Erzin, E.; Erdem, Tanju; Eroğlu Erdem, Ç.; Computer Science; ERDEM, Arif TanjuTraining datasets containing spontaneous emotional speech are often imperfect due the ambiguities and difficulties of labeling such data by human observers. In this paper, we present a Random Sampling Consensus (RANSAC) based training approach for the problem of emotion recognition from spontaneous speech recordings. Our motivation is to insert a data cleaning process to the training phase of the Hidden Markov Models (HMMs) for the purpose of removing some suspicious instances of labels that may exist in the training dataset. Our experiments using HMMs with Mel Frequency Cepstral Coefficients (MFCC) and Line Spectral Frequency (LSF) features indicate that utilization of RANSAC in the training phase provides an improvement in the unweighted recall rates on the test set. Experimental studies performed over the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF and MFCC based classifiers provide further significant performance improvements.Book ChapterPublication Metadata only Runtime verification of component-based embedded software(Springer, 2011) Sözer, Hasan; Hofmann, C; Tekinerdoğan, B.; Akşit, M.; Computer Science; SÖZER, HasanTo deal with increasing size and complexity, component-based software development has been employed in embedded systems. Due to several faults, components can make wrong assumptions about the working mode of the system and the working modes of the other components. To detect mode inconsistencies at runtime, we propose a “lightweight” error detection mechanism, which can be integrated with component-based embedded systems. We define links among three levels of abstractions: the runtime behavior of components, the working mode specifications of components and the specification of the working modes of the system. This allows us to detect the user observable runtime errors. The effectiveness of the approach is demonstrated by implementing a software monitor integrated into a TV system.Conference paperPublication Metadata only Combining haar feature and skin color based classifiers for face detection(IEEE, 2011) Eroğlu Erdem, Ç.; Ulukaya, S.; Karaali, A.; Erdem, Tanju; Computer Science; ERDEM, Arif TanjuThis paper presents a hybrid method for face detection in color images. The well known Haar feature-based face detector developed by Viola and Jones (VJ), that has been designed for gray-scale images is combined with a skin-color filter, which provides complementary information in color images. The image is first passed through a Haar-Feature based face detector, which is adjusted such that it is operating at a point on its ROC curve that has a low number of missed faces but a high number of false detections. Then, using the proposed skin color post-filtering method many of these false detections can be eliminated easily. We also use a color compensation algorithm to reduce the effects of lighting. Our experimental results on the Bao color face database show that the proposed method is superior to the original VJ algorithm and also to other skin color based pre-filtering methods in the literature in terms of precision.Conference paperPublication Open Access RANSAC-based training data selection for speaker state recognition(The International Speech Communications Association, 2011) Bozkurt, E.; Erzin, E.; Erdem, Ç. E.; Erdem, Tanju; Computer Science; ERDEM, Arif TanjuWe present a Random Sampling Consensus (RANSAC) based training approach for the problem of speaker state recognition from spontaneous speech. Our system is trained and tested with the INTERSPEECH 2011 Speaker State Challenge corpora that includes the Intoxication and the Sleepiness Subchallenges, where each sub-challenge defines a two-class classification task. We aim to perform a RANSAC-based training data selection coupled with the Support Vector Machine (SVM) based classification to prune possible outliers, which exist in the training data. Our experimental evaluations indicate that utilization of RANSAC-based training data selection provides 66.32 % and 65.38 % unweighted average (UA) recall rate on the development and test sets for the Sleepiness Sub-challenge, respectively and a slight improvement on the Intoxicationubchallenge performance.Conference paperPublication Metadata only Towards subtyped program generation in F#(ACM, 2011) Aktemur, Tankut Barış; Computer Science; AKTEMUR, Tankut BarişProgram Generation is the technique of combining code fragments to construct a program. In this work we report on our progress to extend F# with program generation constructs. Our prototype implementation uses a translation that allows simulating program generators by regular programs. The translation enables fast implementation and experimentation. We state how a further extension with subtyping can be integrated by benefiting from the translation.Conference paperPublication Metadata only High-performance nested CEP query processing over event streams(IEEE, 2011) Liu, M.; Rundensteiner, E.; Dougherty, D.; Gupta, C.; Wang, S.; Arı, İsmail; Mehta, A.; Computer Science; ARI, IsmailComplex event processing (CEP) over event streams has become increasingly important for real-time applications ranging from health care, supply chain management to business intelligence. These monitoring applications submit complex queries to track sequences of events that match a given pattern. As these systems mature the need for increasingly complex nested sequence query support arises, while the state-of-art CEP systems mostly support the execution of flat sequence queries only. To assure real-time responsiveness and scalability for pattern detection even on huge volume high-speed streams, efficient processing techniques must be designed. In this paper, we first analyze the prevailing nested pattern query processing strategy and identify several serious shortcomings. Not only are substantial subsequences first constructed just to be subsequently discarded, but also opportunities for shared execution of nested subexpressions are overlooked. As foundation, we introduce NEEL, a CEP query language for expressing nested CEP pattern queries composed of sequence, negation, AND and OR operators. To overcome deficiencies, we design rewriting rules for pushing negation into inner subexpressions. Next, we devise a normalization procedure that employs these rules for flattening a nested complex event expression. To conserve CPU and memory consumption, we propose several strategies for efficient shared processing of groups of normalized NEEL subexpressions. These strategies include prefix caching, suffix clustering and customized “bit-marking” execution strategies. We design an optimizer to partition the set of all CEP subexpressions in a NEEL normal form into groups, each of which can then be mapped to one of our shared execution operators. Lastly, we evaluate our technologies by conducting a performance study to assess the CPU processing time using real-world stock trades data. Our results confirm that our NEEL execution in many cases performs 100 fold fast er than the traditional iterative nested execution strategy for real stock market query workloads.Conference paperPublication Metadata only Static analysis of multi-staged programs via unstaging translation(ACM, 2011) Choi, W.; Aktemur, Tankut Barış; Yi, K.; Tatsuta, M.; Computer Science; AKTEMUR, Tankut BarişStatic analysis of multi-staged programs is challenging because thebasic assumption of conventional static analysis no longer holds: the program text itself is no longer a fixed static entity, but rather a dynamically constructed value. This article presents a semanticpreserving translation of multi-staged call-by-value programs into unstaged programs and a static analysis framework based on this translation. The translation is semantic-preserving in that every small-step reduction of a multi-staged program is simulated by the evaluation of its unstaged version. Thanks to this translation we can analyze multi-staged programs with existing static analysis techniques that have been developed for conventional unstaged programs: we first apply the unstaging translation, then we apply conventional static analysis to the unstaged version, and finally we cast the analysis results back in terms of the original staged program. Our translation handles staging constructs that have beenevolved to be useful in practice (typified in Lisp’s quasi quotation): open code as values, unrestricted operations on references and intentional variable-capturing substitutions. This article omits references for which we refer the reader to our companion technical report.Conference paperPublication Metadata only ÖZÜ konuşmacı doğrulama sisteminin çok sınıflı senaryoda NIST 2010 veritabanı ile başarımı(IEEE, 2011) Yeşil, Fatih; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Yeşil, FatihPerformance of the speaker verification systems is typically measured based on their binary decision accuracy. However, in speaker verification applications where close to %100 accuracy is required, such as the systems that are used in the call centers of finance companies, it is not possible to rely on the binary decisions of the existing verification systems. Still, in such cases, multi-class verification outputs (for example, high, medium and low verification score) returned by the speaker verification systems can be used by a human agent to either reduce the verification time and/or increase the verification accuracy compared to a human-only scenario. In this work, we compare such multiclass output performance of some of the most popular speaker verification systems when a human agent is assumed to be in the verification loop. Performance is measured by the reduction in the number of questions used by the human agent for verifying the identity of the caller without compromising from the security. Experiments are performed using the NIST 2010 database for the 8 conversation sides (5 minutes each) enrollment data and 10 seconds verification data condition.Conference paperPublication Metadata only Konuşmacı aradeğerlemeli SMM tabanlı metinden konuşma sentezleme si̇stemi(IEEE, 2011) Orhan, Mustafa Cem; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Orhan, Mustafa CemHidden Markov Model (HMM) based text-to-speech (TTS) systems offer many advantages compared to the concatenative approach. One of those advantages is the ability to interpolate between different speakers to generate new voices. In this paper, speaker interpolation for HMM-based TTS (HTS) is described and listening test results for the interpolation of English and Turkish voices are presented. Similar to English, we obtained Turkish speech that strongly reflect the interpolation ratio in perceptual similarity. Some insight into the interpolation process is also provided by analysing the spectra of the reference and final voices.Conference paperPublication Metadata only NEEL: The nested complex event language for real-time event analytics(Springer International Publishing, 2011) Liu, M.; Rundensteiner, E. A.; Dougherty, D.; Gupta, C.; Wang, S.; Arı, İsmail; Mehta, A.; Computer Science; ARI, IsmailComplex event processing (CEP) over event streams has become increasingly important for real-time applications ranging from health care, supply chain management to business intelligence. These monitoring applications submit complex event queries to track sequences of events that match a given pattern. As these systems mature the need for increasingly complex nested sequence query support arises, while the state-of-art CEP systems mostly support the execution of only flat sequence queries. In this paper, we introduce our nested CEP query language NEEL for expressing nested queries composed of sequence, negation, AND and OR operators. Thereafter, we also define its formal semantics. Subtle issues with negation and predicates within the nested sequence context are discussed. An E-Analytics system for processing nested CEP queries expressed in the NEEL language has been developed. Lastly, we demonstrate the utility of this technology by describing a case study of applying this technology to a real-world application in health care.Conference paperPublication Metadata only Vurgu ve söyleyiş modelleme hatalarının SMM tabanlı Türkçe MKS sistemindeki etkileri(IEEE, 2011) Güner, Ekrem; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Güner, EkremSMM tabanlı metinden konuşma sentezleme (SMM-MKS) yöntemi, gün geçtikçe daha 1 fazla araştırmacısının ilgisini çekmektedir. Bu yöntemin en önemli avantajlarından biri, birim seçmeli sistemlerde görülen bozulma etkilerinin yokluğudur. Bu bildiride, ilk akademik Türkçe SMM-MKS sisteminin performansı bildirilmektedir. Türkçe yazıldığı gibi okunan bir dil olmasına rağmen, bu dönüşüm her zaman bire bir değildir. Ayrıca Türkçede vurgu belirli kurallara bağlı olmasına rağmen, doğru bir vurgu imlemesi için bunları kullanmak her zaman mümkün olmayabilir. Dolayısıyla, temel tasarım sistemimizin kalitesinin yanı sıra, söyleyiş ve vurgu imi hatalarına karşı duyarlılığı da incelenip sunulmuştur. Karmaşık söyleyiş ve vurgu modeli kullanmanın en çok sesbirimlerinin süresini etkilediğini ve bunun da kaliteyi arttırdığını fakat birlikte kullanıldıklarında katkılarının üst üste konmadığını gözlemledik.