Publication: Hybrid nearest-neighbor/cluster adaptive training for rapid speaker adaptation in statistical speech synthesis systems
dc.contributor.author | Mohammadi, Amir | |
dc.contributor.author | Demiroğlu, Cenk | |
dc.contributor.department | Electrical & Electronics Engineering | |
dc.contributor.ozuauthor | DEMİROĞLU, Cenk | |
dc.contributor.ozugradstudent | Mohammadi, Amir | |
dc.date.accessioned | 2016-02-15T13:38:34Z | |
dc.date.available | 2016-02-15T13:38:34Z | |
dc.date.issued | 2013 | |
dc.description | Due to copyright restrictions, the access to the full text of this article is only available via subscription. | |
dc.description.abstract | Statistical speech synthesis (SSS) approach has become one of the most popular methods in the speech synthesis field. An advantage of the SSS approach is the ability to adapt to a target speaker with a couple of minutes of adaptation data. However, many applications, especially in consumer electronics, require adaptation with only a few seconds of data which can be done using eigenvoice adaptation techniques. Although such techniques work well in speech recognition, they are known to generate perceptual artifacts in statistical speech synthesis. Here, we propose two methods to both alleviate those quality problems and improve the speaker similarity obtained with the baseline eigenvoice adaptation algorithm. Our first method is based on using a Bayesian approach for constraining the eigenvoice adaptation algorithm to move in realistic directions in the speaker space to reduce artifacts. Our second method is based on finding a reference speaker that is close to the target speaker, and using that reference speaker as the seed model in a second eigenvoice adaptation step. Both techniques performed significantly better than the baseline eigenvoice method in the subjective quality and similarity tests. | |
dc.description.sponsorship | European Commission ; TÜBİTAK | |
dc.identifier.endpage | 1081 | |
dc.identifier.isbn | 9781629934433 | |
dc.identifier.scopus | 2-s2.0-84906278451 | |
dc.identifier.startpage | 1077 | |
dc.identifier.uri | http://hdl.handle.net/10679/2375 | |
dc.identifier.wos | 000395050000228 | |
dc.language.iso | eng | en_US |
dc.peerreviewed | yes | |
dc.publicationstatus | published | en_US |
dc.publisher | International Speech Communication Association | |
dc.relation | info:eu-repo/grantAgreement/TUBITAK/1001 - Araştırma | |
dc.relation | info:eu-repo/grantAgreement/EC/FP7 | |
dc.relation.ispartof | Interspeech 2013 | |
dc.relation.publicationcategory | International | |
dc.rights | restrictedAccess | |
dc.subject.keywords | Statistical speech synthesis | |
dc.subject.keywords | Speaker adaptation | |
dc.subject.keywords | Cluster adaptive training | |
dc.subject.keywords | Eigenvoice adaptation | |
dc.title | Hybrid nearest-neighbor/cluster adaptive training for rapid speaker adaptation in statistical speech synthesis systems | en_US |
dc.type | conferenceObject | en_US |
dspace.entity.type | Publication | |
relation.isOrgUnitOfPublication | 7b58c5c4-dccc-40a3-aaf2-9b209113b763 | |
relation.isOrgUnitOfPublication.latestForDiscovery | 7b58c5c4-dccc-40a3-aaf2-9b209113b763 |