DNN-based speaker-adaptive postfiltering with limited adaptation data for statistical speech synthesis systems

Deep neural networks (DNNs) have been successfully deployed for acoustic modelling in statistical parametric speech synthesis (SPSS) systems. Moreover, DNN-based postfilters (PF) have also been shown to outperform conventional postfilters that are widely used in SPSS systems for increasing the quality of synthesized speech. However, existing DNN-based postfilters are trained with speaker-dependent databases. Given that SPSS systems can rapidly adapt to new speakers from generic models, there is a need for DNN-based postfilters that can adapt to new speakers with minimal adaptation data. Here, we compare DNN-, RNN-, and CNN-based postfilters together with adversarial (GAN) training and cluster-based initialization (CI) for rapid adaptation. Results indicate that the feedforward (FF) DNN, together with GAN and CI, significantly outperforms the other recently proposed postfilters.

Date

2019

Publisher

IEEE

URI

http://hdl.handle.net/10679/6846
https://doi.org/10.1109/ICASSP.2019.8683714

Collections

Computer Science

Full item page

Publication:
DNN-based speaker-adaptive postfiltering with limited adaptation data for statistical speech synthesis systems

Institution Authors

Authors

Research Projects

Journal Title

Journal ISSN

Volume Title

Type

Access

Publication Status

Journal Issue

Abstract

Date

Publisher

Description

Keywords

Citation

URI

Collections

Page Views

0

File Download

0

Publication: DNN-based speaker-adaptive postfiltering with limited adaptation data for statistical speech synthesis systems

Institution Authors

Authors

Research Projects

Journal Title

Journal ISSN

Volume Title

Type

Access

Publication Status

Journal Issue

Abstract

Date

Publisher

Description

Keywords

Citation

URI

Collections

Page Views

0

File Download

0

Publication:
DNN-based speaker-adaptive postfiltering with limited adaptation data for statistical speech synthesis systems