Publication:
Finding relevant features for statistical speech synthesis adaptation

Placeholder

Institution Authors

Research Projects

Journal Title

Journal ISSN

Volume Title

Type

Conference paper

Access

info:eu-repo/semantics/restrictedAccess

Publication Status

published

Journal Issue

Abstract

Statistical speech synthesis (SSS) models typically lie in a very high-dimensional space. They can be used to allow speech synthesis on digital devices, using only few sentences of input by the user. However, the adaptation algorithms of such weakly trained models suffer from the high dimensionality of the feature space. Because creating new voices is easy with the SSS approach, thousands of voices can be trained and a nearest-neighbor algorithm can be used to obtain better speaker similarity in those limited-data cases. Nearest-neighbor methods require good distance measures that correlate well with human perception. This paper investigates the problem of finding good low-cost metrics, i.e. simple functions of feature values that map with objective signal quality metrics. To this aim, we use high-dimensional data visualization and dimensionality reduction techniques. Data mining principles are also applied to formulate a tractable view of the problem, and propose tentative solutions. With a performance index improved by 36% w.r.t. a naive solution, while using only 0.77% of the respective amount of features, our results are promising. Perspectives on new adaptation algorithms, and tighter integration of data mining and visualization principles are eventually given.

Date

2014-05

Publisher

European Language Resources Association

Description

Keywords

Citation

Collections


Page Views

0

File Download

0