Browsing by Author "Güner, Ekrem"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
Conference ObjectPublication Metadata only Analysis of speaker similarity in the statistical speech synthesis systems using a hybrid approach(IEEE, 2012) Güner, Ekrem; Mohammadi, A.; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Güner, EkremStatistical speech synthesis (SSS) approach has become one of the most popular and successful methods in the speech synthesis field. Smooth speech transitions, without the spurious errors that are observed in unit selection systems, can be generated with the SSS approach. However, a well-known issue with SSS is the lack of voice similarity to the target speaker. The issue arises both in speaker-dependent models and models that are adapted from average voices. Moreover, in speaker adaptation, similarity to the target speaker does not increase significantly after around one minute of adaptation data which potentially indicates inherent bottleneck(s) in the system. Here, we propose using the hybrid speech synthesis approach to understand the key factors behind the speaker similarity problem. To that end, we try to answer the following question: which segments and parameters of speech, if generated/synthesized better, would have a substantial improvement on speaker similarity? In this work, our hybrid methods are described and listening test results are presented and discussed.Master ThesisPublication Restricted A hybrid statistical/unit-selection text-to-speech synthesis system for morphologically rich languages(2013-06) Güner, Ekrem; Demiroğlu, Cenk; Demiroğlu, Cenk; Erdem, Tanju; Bozkurt, B.; Department of Electrical and Electronics Engineering; Güner, EkremTwo most prominent examples of Text-to-Speech (TTS) systems are Unit Selection based TTS (UTTS) and the Hidden Markov Model (HMM) based TTS (HTTS). UTTS has been the dominant approach of the last decade while HTTS has been increasingly getting more attention from the TTS research community. Both systems have distinct pros and cons. Despite its success, UTTS has some disadvantages such as the sudden discontinuities in speech which cause distraction whereas HTTS lacks of those artifacts. However, UTTS systems offer high quality speech given a huge unit database where the storage is not a problem. On the other hand, the small memory footprint requirement of HTTS systems makes them attractive for embedded devices. Here, a novel hybrid statistical/unit selection TTS system for morphologically rich languages is proposed. The proposed hybrid system aims at improving the quality of the baseline HTTS system while keeping the memory footprint small. First, the motivation of the proposed hybrid system is given after the comparison of both systems. Then the proposed hybrid system is presented along with the details of the baseline HTTS system. In order to assess the performances of proposed and baseline systems, the subjective and objective tests are conducted. Intelligibility and quality scores of the baseline system are comparable to the MOS scores of English reported in the Blizzard Challenge tests. Results of the AB preference tests revealed the listeners' preference for the hybrid system over the baseline system.ArticlePublication Open Access Hybrid statistical/unit-selection Turkish speech synthesis using suffix units(Springer International Publishing, 2016-12) Demiroğlu, Cenk; Güner, Ekrem; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Güner, EkremUnit selection based text-to-speech synthesis (TTS) has been the dominant TTS approach of the last decade. Despite its success, unit selection approach has its disadvantages. One of the most significant disadvantages is the sudden discontinuities in speech that distract the listeners (Speech Commun 51:1039-1064, 2009). The second disadvantage is that significant expertise and large amounts of data is needed for building a high-quality synthesis system which is costly and time-consuming. The statistical speech synthesis (SSS) approach is a promising alternative synthesis technique. Not only that the spurious errors that are observed in the unit selection system are mostly not observed in SSS but also building voice models is far less expensive and faster compared to the unit selection system. However, the resulting speech is typically not as natural-sounding as speech that is synthesized with a high-quality unit selection system. There are hybrid methods that attempt to take advantage of both SSS and unit selection systems. However, existing hybrid methods still require development of a high-quality unit selection system. Here, we propose a novel hybrid statistical/unit selection system for Turkish that aims at improving the quality of the baseline SSS system by improving the prosodic parameters such as intonation and stress. Commonly occurring suffixes in Turkish are stored in the unit selection database and used in the proposed system. As opposed to existing hybrid systems, the proposed system was developed without building a complete unit selection synthesis system. Therefore, the proposed method can be used without collecting large amounts of data or utilizing substantial expertise or time-consuming tuning that is typically required in building unit selection systems. Listeners preferred the hybrid system over the baseline system in the AB preference tests.Book PartPublication Metadata only A small footprint hybrid statistical and unit selection text-to-speech synthesis system for Turkish(Springer Science+Business Media, 2012) Güner, Ekrem; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Güner, EkremUnit selection based text-to-speech synthesis (TTS) can generate high quality speech. However, The HMM-based text-to-speech (HTS) has also advantages such as the lack of spurious errors that are observed in the unit selection scheme. Another advantage is the small memory footprint requirement. Here, we propose a novel hybrid statistical/unit selection TTS system for agglutinative languages that aims at improving the quality of the baseline HTS system while keeping the memory footprint small. Listeners preferred the hybrid system over a state-of-the-art HTS baseline system in the A/B preference tests.Conference ObjectPublication Metadata only A small footprint hybrid statistical/unit selection text-to-speech synthesis system for agglutinative languages(IEEE, 2012) Güner, Ekrem; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Güner, EkremDespite its success, unit selection based text-to-speech synthesis (TTS) has has some disadvantages such as sudden discontinuities in speech that distract the listeners. The HMM-based TTS (HTS) approach has been increasingly getting more attention from the TTS research community. One of the advantage is the lack of spurious errors that are observed in the unit selection scheme. Another advantage of the HTS system is the small memory footprint requirement which makes it attractive for embedded devices. Here, we propose a novel hybrid statistical unit selection TTS system for agglutinative languages that aims at improving the quality of the baseline HTS system while keeping the memory footprint small. The intelligibility and quality scores of the baseline system are comparable to the MOS scores of English reported in the Blizzard Challenge tests. Listeners preferred the hybrid system over the baseline system in the A/B preference tests.Conference ObjectPublication Metadata only Vurgu ve söyleyiş modelleme hatalarının SMM tabanlı Türkçe MKS sistemindeki etkileri(IEEE, 2011) Güner, Ekrem; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Güner, EkremSMM tabanlı metinden konuşma sentezleme (SMM-MKS) yöntemi, gün geçtikçe daha 1 fazla araştırmacısının ilgisini çekmektedir. Bu yöntemin en önemli avantajlarından biri, birim seçmeli sistemlerde görülen bozulma etkilerinin yokluğudur. Bu bildiride, ilk akademik Türkçe SMM-MKS sisteminin performansı bildirilmektedir. Türkçe yazıldığı gibi okunan bir dil olmasına rağmen, bu dönüşüm her zaman bire bir değildir. Ayrıca Türkçede vurgu belirli kurallara bağlı olmasına rağmen, doğru bir vurgu imlemesi için bunları kullanmak her zaman mümkün olmayabilir. Dolayısıyla, temel tasarım sistemimizin kalitesinin yanı sıra, söyleyiş ve vurgu imi hatalarına karşı duyarlılığı da incelenip sunulmuştur. Karmaşık söyleyiş ve vurgu modeli kullanmanın en çok sesbirimlerinin süresini etkilediğini ve bunun da kaliteyi arttırdığını fakat birlikte kullanıldıklarında katkılarının üst üste konmadığını gözlemledik.