Show simple item record

dc.contributor.authorMohammadi, Amir
dc.contributor.authorSarfjoo, Seyyed Saeed
dc.contributor.authorDemiroğlu, Cenk
dc.date.accessioned2015-12-17T10:41:50Z
dc.date.available2015-12-17T10:41:50Z
dc.date.issued2014-12
dc.identifier.issn2329-9290
dc.identifier.urihttp://hdl.handle.net/10679/1321
dc.identifier.urihttp://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6918404&tag=1
dc.descriptionDue to copyright restrictions, the access to the full text of this article is only available via subscription.
dc.description.abstractStatistical speech synthesis (SSS) systems have the ability to adapt to a target speaker with a couple of minutes of adaptation data. Developing adaptation algorithms to further reduce the number of adaptation utterances to a few seconds of data can have substantial effect on the deployment of the technology in real-life applications such as consumer electronics devices. The traditional way to achieve such rapid adaptation is the eigenvoice technique which works well in speech recognition but known to generate perceptual artifacts in statistical speech synthesis. Here, we propose three methods to alleviate the quality problems of the baseline eigenvoice adaptation algorithm while allowing speaker adaptation with minimal data. Our first method is based on using a Bayesian eigenvoice approach for constraining the adaptation algorithm to move in realistic directions in the speaker space to reduce artifacts. Our second method is based on finding pre-trained reference speakers that are close to the target speaker and utilizing only those reference speaker models in a second eigenvoice adaptation iteration. Both techniques performed significantly better than the baseline eigenvoice method in the objective tests. Similarly, they both improved the speech quality in subjective tests compared to the baseline eigenvoice method. In the third method, tandem use of the proposed eigenvoice method with a state-of-the-art linear regression based adaptation technique is found to improve adaptation of excitation features.en_US
dc.description.sponsorshipTÜBİTAK ; European Commission
dc.language.isoengen_US
dc.publisherIEEEen_US
dc.relationinfo:turkey/grantAgreement/TUBITAK/109E281en_US
dc.relationinfo:eu-repo/grantAgreement/EC/FP7/268409en_US
dc.relation.ispartofIEEE/ACM Transactions on Audio, Speech, and Language Processing
dc.rightsrestrictedAccess
dc.titleEigenvoice speaker adaptation with minimal data for statistical speech synthesis systems using a MAP approach and nearest-neighborsen_US
dc.typeArticleen_US
dc.peerreviewedyesen_US
dc.publicationstatuspublisheden_US
dc.contributor.departmentÖzyeğin University
dc.contributor.authorID(ORCID 0000-0002-6160-3169 & YÖK ID 144947) Demiroğlu, Cenk
dc.contributor.ozuauthorDemiroğlu, Cenk
dc.identifier.volume22
dc.identifier.issue12
dc.identifier.startpage2146
dc.identifier.endpage2157
dc.identifier.wosWOS:000344459700019
dc.identifier.doi10.1109/TASLP.2014.2362009
dc.subject.keywordsCluster adaptive trainingen_US
dc.subject.keywordsEigenvoice adaptationen_US
dc.subject.keywordsNearest neighboren_US
dc.subject.keywordsSpeaker adaptationen_US
dc.subject.keywordsStatistical speech synthesisen_US
dc.identifier.scopusSCOPUS:2-s2.0-84921805734
dc.contributor.ozugradstudentMohammadi, Amir
dc.contributor.ozugradstudentSarfjoo, Seyyed Saeed
dc.contributor.authorMale3


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record


Share this page