Browsing by Author "Sarfjoo, Seyyed Saeed"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Conference ObjectPublication Metadata only Cross-lingual speaker adaptation for statistical speech synthesis using limited data(Interspeech, 2016) Sarfjoo, Seyyed Saeed; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Sarfjoo, Seyyed SaeedCross-lingual speaker adaptation with limited adaptation data has many applications such as use in speech-to-speech translation systems. Here, we focus on cross-lingual adaptation for statistical speech synthesis (SSS) systems using limited adaptation data. To that end, we propose two techniques exploiting a bilingual Turkish-English speech database that we collected. In one approach, speaker-specific state-mapping is proposed for cross-lingual adaptation which performed significantly better than the baseline state-mapping algorithm in adapting the excitation parameter both in objective and subjective tests. In the second approach, eigenvoice adaptation is done in the input language which is then used to estimate the eigenvoice weights in the output language using weighted linear regression. The second approach performed significantly better than the baseline system in adapting the spectral envelope parameters both in objective and subjective tests.ArticlePublication Metadata only Eigenvoice speaker adaptation with minimal data for statistical speech synthesis systems using a MAP approach and nearest-neighbors(IEEE, 2014-12) Mohammadi, Amir; Sarfjoo, Seyyed Saeed; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Mohammadi, Amir; Sarfjoo, Seyyed SaeedStatistical speech synthesis (SSS) systems have the ability to adapt to a target speaker with a couple of minutes of adaptation data. Developing adaptation algorithms to further reduce the number of adaptation utterances to a few seconds of data can have substantial effect on the deployment of the technology in real-life applications such as consumer electronics devices. The traditional way to achieve such rapid adaptation is the eigenvoice technique which works well in speech recognition but known to generate perceptual artifacts in statistical speech synthesis. Here, we propose three methods to alleviate the quality problems of the baseline eigenvoice adaptation algorithm while allowing speaker adaptation with minimal data. Our first method is based on using a Bayesian eigenvoice approach for constraining the adaptation algorithm to move in realistic directions in the speaker space to reduce artifacts. Our second method is based on finding pre-trained reference speakers that are close to the target speaker and utilizing only those reference speaker models in a second eigenvoice adaptation iteration. Both techniques performed significantly better than the baseline eigenvoice method in the objective tests. Similarly, they both improved the speech quality in subjective tests compared to the baseline eigenvoice method. In the third method, tandem use of the proposed eigenvoice method with a state-of-the-art linear regression based adaptation technique is found to improve adaptation of excitation features.ArticlePublication Metadata only Using eigenvoices and nearest-neighbors in HMM-based cross-lingual speaker adaptation with limited data(IEEE, 2017-04) Sarfjoo, Seyyed Saeed; Demiroğlu, Cenk; King, S.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Sarfjoo, Seyyed SaeedCross-lingual speaker adaptation for speech synthesis has many applications, such as use in speech-to-speech translation systems. Here, we focus on cross-lingual adaptation for statistical speech synthesis systems using limited adaptation data. To that end, we propose two eigenvoice adaptation approaches exploiting a bilingual Turkish-English speech database that we collected. In one approach, eigenvoice weights extracted using Turkish adaptation data and Turkish voice models are transformed into the eigenvoice weights for the English voice models using linear regression. Weighting the samples depending on the distance of reference speakers to target speakers during linear regression was found to improve the performance. Moreover, importance weighting the elements of the eigenvectors during regression further improved the performance. The second approach proposed here is speaker-specific state-mapping, which performed significantly better than the baseline state-mapping algorithm both in objective and subjective tests. Performance of the proposed state mapping algorithm was further improved when it was used with the intralingual eigenvoice approach instead of the linear-regression based algorithms used in the baseline system.PhD DissertationPublication Metadata only Using eigenvoices and nearest-neighbours in HMM-based cross-lingual speaker adaptation with limited data(2017-08) Sarfjoo, Seyyed Saeed; Demiroğlu, Cenk; Demiroğlu, Cenk; Şensoy, Murat; Uğurdağ, Hasan Fatih; Saraçlar, M.; Güz, Ü.; Department of Computer Science; Sarfjoo, Seyyed SaeedThesis abstract: Cross-lingual speaker adaptation for speech synthesis has many applications, such as use in speech-to-speech translation systems. Here, we focus on cross-lingual adaptation for statistical speech synthesis systems using limited adaptation data. We propose new methods on HMM-based and DNN-based speech synthesis. To that end, for HMM-based speech synthesis we propose two eigenvoice adaptation approaches exploiting a bilingual Turkish-English speech database that we collected. In one approach, eigenvoice weights extracted using Turkish adaptation data and Turkish voice models are transformed into the eigenvoice weights for the English voice models using linear regression. Weighting the samples depending on the distance of reference speakers to target speakers during linear regression was found to improve the performance. Moreover, importance weighting the elements of the eigenvectors during regression further improved the performance. The second approach proposed here is speaker-specific state-mapping which performed signicantly better than the baseline state-mapping algorithm both in objective and subjective tests. Performance of the proposed state mapping algorithm was further improved when it was used with the intra-lingual eigenvoice approach instead of the linear-regression based algorithms used in the baseline system. We propose new unsupervised adaptation method for DNN-based speech synthesis. In this method, using sequence of acoustic features from target speaker, we estimate continuous linguistic features for unlabeled data. Based on objective and subjective experiments, adapted model outperformed the gender-dependent average voice models in terms of quality and similarity.