Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

Barakat, Huda Mohammed Mohammed; Turk, O.; Demiroğlu, Cenk

Publication:
Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

dc.contributor.author	Barakat, Huda Mohammed Mohammed
dc.contributor.author	Turk, O.
dc.contributor.author	Demiroğlu, Cenk
dc.contributor.department	Electrical & Electronics Engineering
dc.contributor.ozuauthor	DEMİROĞLU, Cenk
dc.contributor.ozugradstudent	Barakat, Huda Mohammed Mohammed
dc.date.accessioned	2024-02-26T08:07:40Z
dc.date.available	2024-02-26T08:07:40Z
dc.date.issued	2024-02-12
dc.description.abstract	Speech synthesis has made significant strides thanks to the transition from machine learning to deep learning models. Contemporary text-to-speech (TTS) models possess the capability to generate speech of exceptionally high quality, closely mimicking human speech. Nevertheless, given the wide array of applications now employing TTS models, mere high-quality speech generation is no longer sufficient. Present-day TTS models must also excel at producing expressive speech that can convey various speaking styles and emotions, akin to human speech. Consequently, researchers have concentrated their efforts on developing more efficient models for expressive speech synthesis in recent years. This paper presents a systematic review of the literature on expressive speech synthesis models published within the last 5 years, with a particular emphasis on approaches based on deep learning. We offer a comprehensive classification scheme for these models and provide concise descriptions of models falling into each category. Additionally, we summarize the principal challenges encountered in this research domain and outline the strategies employed to tackle these challenges as documented in the literature. In the Section 8, we pinpoint some research gaps in this field that necessitate further exploration. Our objective with this work is to give an all-encompassing overview of this hot research area to offer guidance to interested researchers and future endeavors in this field.
dc.description.version	Publisher version
dc.identifier.doi	10.1186/s13636-024-00329-7
dc.identifier.issn	1687-4722
dc.identifier.issue	1
dc.identifier.uri	http://hdl.handle.net/10679/9221
dc.identifier.uri	https://doi.org/10.1186/s13636-024-00329-7
dc.identifier.volume	2024
dc.identifier.wos	001160004600001
dc.language.iso	eng
dc.publicationstatus	Published
dc.publisher	Springer
dc.relation.ispartof	EURASIP Journal on Audio, Speech, and Music Processing
dc.rights	openAccess
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject.keywords	Speech synthesis
dc.subject.keywords	Expressive speech
dc.subject.keywords	Emotional speech
dc.subject.keywords	Deep learning
dc.title	Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources
dc.type	review
dspace.entity.type	Publication
relation.isOrgUnitOfPublication	7b58c5c4-dccc-40a3-aaf2-9b209113b763
relation.isOrgUnitOfPublication.latestForDiscovery	7b58c5c4-dccc-40a3-aaf2-9b209113b763

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Deep learning-based expressive speech synthesis a systematic review of approaches, challenges, and resources.pdf
Size:: 2.19 MB
Format:: Adobe Portable Document Format
Description:

Download