A hybrid statistical/unit-selection text-to-speech synthesis system for morphologically rich languages
Type : Master's thesis
Publication Status : unpublished
Access : restrictedAccess
Two most prominent examples of Text-to-Speech (TTS) systems are Unit Selection based TTS (UTTS) and the Hidden Markov Model (HMM) based TTS (HTTS). UTTS has been the dominant approach of the last decade while HTTS has been increasingly getting more attention from the TTS research community. Both systems have distinct pros and cons. Despite its success, UTTS has some disadvantages such as the sudden discontinuities in speech which cause distraction whereas HTTS lacks of those artifacts. However, UTTS systems offer high quality speech given a huge unit database where the storage is not a problem. On the other hand, the small memory footprint requirement of HTTS systems makes them attractive for embedded devices. Here, a novel hybrid statistical/unit selection TTS system for morphologically rich languages is proposed. The proposed hybrid system aims at improving the quality of the baseline HTTS system while keeping the memory footprint small. First, the motivation of the proposed hybrid system is given after the comparison of both systems. Then the proposed hybrid system is presented along with the details of the baseline HTTS system. In order to assess the performances of proposed and baseline systems, the subjective and objective tests are conducted. Intelligibility and quality scores of the baseline system are comparable to the MOS scores of English reported in the Blizzard Challenge tests. Results of the AB preference tests revealed the listeners' preference for the hybrid system over the baseline system.
Date : 2013-06
Share this page