Publication:
DNN-based speaker-adaptive postfiltering with limited adaptation data for statistical speech synthesis systems

dc.contributor.authorÖztürk, M. G.
dc.contributor.authorUlusoy, O.
dc.contributor.authorDemiroğlu, Cenk
dc.contributor.departmentElectrical & Electronics Engineering
dc.contributor.ozuauthorDEMİROĞLU, Cenk
dc.date.accessioned2020-08-27T14:39:00Z
dc.date.available2020-08-27T14:39:00Z
dc.date.issued2019
dc.description.abstractDeep neural networks (DNNs) have been successfully deployed for acoustic modelling in statistical parametric speech synthesis (SPSS) systems. Moreover, DNN-based postfilters (PF) have also been shown to outperform conventional postfilters that are widely used in SPSS systems for increasing the quality of synthesized speech. However, existing DNN-based postfilters are trained with speaker-dependent databases. Given that SPSS systems can rapidly adapt to new speakers from generic models, there is a need for DNN-based postfilters that can adapt to new speakers with minimal adaptation data. Here, we compare DNN-, RNN-, and CNN-based postfilters together with adversarial (GAN) training and cluster-based initialization (CI) for rapid adaptation. Results indicate that the feedforward (FF) DNN, together with GAN and CI, significantly outperforms the other recently proposed postfilters.en_US
dc.description.sponsorshipTÜBİTAK
dc.identifier.doi10.1109/ICASSP.2019.8683714en_US
dc.identifier.endpage7034en_US
dc.identifier.isbn978-1-4799-8131-1
dc.identifier.issn1520-6149en_US
dc.identifier.scopus2-s2.0-85069005473
dc.identifier.startpage7030en_US
dc.identifier.urihttp://hdl.handle.net/10679/6846
dc.identifier.urihttps://doi.org/10.1109/ICASSP.2019.8683714
dc.identifier.wos000482554007053
dc.language.isoengen_US
dc.publicationstatusPublisheden_US
dc.publisherIEEEen_US
dc.relationinfo:eu-repo/grantAgreement/TUBITAK/1001 - Araştırma/115E922
dc.relation.ispartofICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
dc.relation.publicationcategoryInternational
dc.rightsrestrictedAccess
dc.subject.keywordsSpeaker adaptationen_US
dc.subject.keywordsSpeech synthesisen_US
dc.subject.keywordsPostfilteren_US
dc.subject.keywordsDeep learningen_US
dc.titleDNN-based speaker-adaptive postfiltering with limited adaptation data for statistical speech synthesis systemsen_US
dc.typeconferenceObjecten_US
dspace.entity.typePublication
relation.isOrgUnitOfPublication7b58c5c4-dccc-40a3-aaf2-9b209113b763
relation.isOrgUnitOfPublication.latestForDiscovery7b58c5c4-dccc-40a3-aaf2-9b209113b763

Files

License bundle

Now showing 1 - 1 of 1
Placeholder
Name:
license.txt
Size:
1.45 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections