Publication:
Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

dc.contributor.authorBarakat, Huda Mohammed Mohammed
dc.contributor.authorTurk, O.
dc.contributor.authorDemiroğlu, Cenk
dc.contributor.departmentElectrical & Electronics Engineering
dc.contributor.ozuauthorDEMİROĞLU, Cenk
dc.contributor.ozugradstudentBarakat, Huda Mohammed Mohammed
dc.date.accessioned2024-02-26T08:07:40Z
dc.date.available2024-02-26T08:07:40Z
dc.date.issued2024-02-12
dc.description.abstractSpeech synthesis has made significant strides thanks to the transition from machine learning to deep learning models. Contemporary text-to-speech (TTS) models possess the capability to generate speech of exceptionally high quality, closely mimicking human speech. Nevertheless, given the wide array of applications now employing TTS models, mere high-quality speech generation is no longer sufficient. Present-day TTS models must also excel at producing expressive speech that can convey various speaking styles and emotions, akin to human speech. Consequently, researchers have concentrated their efforts on developing more efficient models for expressive speech synthesis in recent years. This paper presents a systematic review of the literature on expressive speech synthesis models published within the last 5 years, with a particular emphasis on approaches based on deep learning. We offer a comprehensive classification scheme for these models and provide concise descriptions of models falling into each category. Additionally, we summarize the principal challenges encountered in this research domain and outline the strategies employed to tackle these challenges as documented in the literature. In the Section 8, we pinpoint some research gaps in this field that necessitate further exploration. Our objective with this work is to give an all-encompassing overview of this hot research area to offer guidance to interested researchers and future endeavors in this field.en_US
dc.description.versionPublisher version
dc.identifier.doi10.1186/s13636-024-00329-7en_US
dc.identifier.issn1687-4722en_US
dc.identifier.issue1en_US
dc.identifier.urihttp://hdl.handle.net/10679/9221
dc.identifier.urihttps://doi.org/10.1186/s13636-024-00329-7
dc.identifier.volume2024en_US
dc.identifier.wos001160004600001
dc.language.isoengen_US
dc.publicationstatusPublisheden_US
dc.publisherSpringeren_US
dc.relation.ispartofEURASIP Journal on Audio, Speech, and Music Processing
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject.keywordsSpeech synthesisen_US
dc.subject.keywordsExpressive speechen_US
dc.subject.keywordsEmotional speechen_US
dc.subject.keywordsDeep learningen_US
dc.titleDeep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resourcesen_US
dc.typeReviewen_US
dspace.entity.typePublication
relation.isOrgUnitOfPublication7b58c5c4-dccc-40a3-aaf2-9b209113b763
relation.isOrgUnitOfPublication.latestForDiscovery7b58c5c4-dccc-40a3-aaf2-9b209113b763

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Deep learning-based expressive speech synthesis a systematic review of approaches, challenges, and resources.pdf
Size:
2.19 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Placeholder
Name:
license.txt
Size:
1.45 KB
Format:
Item-specific license agreed upon to submission
Description: