Publication:
Finding relevant features for statistical speech synthesis adaptation

dc.contributor.authorBruneau, P.
dc.contributor.authorParisot, O.
dc.contributor.authorMohammadi, Amir
dc.contributor.authorDemiroğlu, Cenk
dc.contributor.authorGhoniem, M.
dc.contributor.authorTamisier, T.
dc.contributor.departmentElectrical & Electronics Engineering
dc.contributor.ozuauthorDEMİROĞLU, Cenk
dc.contributor.ozugradstudentMohammadi, Amir
dc.date.accessioned2016-02-15T13:38:33Z
dc.date.available2016-02-15T13:38:33Z
dc.date.issued2014-05
dc.description.abstractStatistical speech synthesis (SSS) models typically lie in a very high-dimensional space. They can be used to allow speech synthesis on digital devices, using only few sentences of input by the user. However, the adaptation algorithms of such weakly trained models suffer from the high dimensionality of the feature space. Because creating new voices is easy with the SSS approach, thousands of voices can be trained and a nearest-neighbor algorithm can be used to obtain better speaker similarity in those limited-data cases. Nearest-neighbor methods require good distance measures that correlate well with human perception. This paper investigates the problem of finding good low-cost metrics, i.e. simple functions of feature values that map with objective signal quality metrics. To this aim, we use high-dimensional data visualization and dimensionality reduction techniques. Data mining principles are also applied to formulate a tractable view of the problem, and propose tentative solutions. With a performance index improved by 36% w.r.t. a naive solution, while using only 0.77% of the respective amount of features, our results are promising. Perspectives on new adaptation algorithms, and tighter integration of data mining and visualization principles are eventually given.
dc.identifier.isbn978-2-9517408-8-4
dc.identifier.urihttp://hdl.handle.net/10679/2372
dc.identifier.wos000355611001083
dc.language.isoengen_US
dc.publicationstatuspublisheden_US
dc.publisherEuropean Language Resources Association
dc.relation.ispartof9th International Conference on Language Resources and Evaluation (LREC)
dc.relation.publicationcategoryInternational
dc.rightsrestrictedAccess
dc.subject.keywordsSpeech synthesis
dc.subject.keywordsSpeaker adaptation
dc.subject.keywordsFeature selection
dc.subject.keywordsVisual analytics
dc.titleFinding relevant features for statistical speech synthesis adaptationen_US
dc.typeconferenceObjecten_US
dspace.entity.typePublication
relation.isOrgUnitOfPublication7b58c5c4-dccc-40a3-aaf2-9b209113b763
relation.isOrgUnitOfPublication.latestForDiscovery7b58c5c4-dccc-40a3-aaf2-9b209113b763

Files

Collections