Publication:
NatiQ: An end-to-end text-to-speech system for arabic

dc.contributor.authorAbdelali, A.
dc.contributor.authorDurrani, N.
dc.contributor.authorDemiroğlu, Cenk
dc.contributor.authorDalvi, F.
dc.contributor.authorMubarak, H.
dc.contributor.authorDarwish, K.
dc.contributor.departmentElectrical & Electronics Engineering
dc.contributor.ozuauthorDEMİROĞLU, Cenk
dc.date.accessioned2023-08-03T12:13:30Z
dc.date.available2023-08-03T12:13:30Z
dc.date.issued2022
dc.description.abstractNatiQ is end-to-end text-to-speech system for Arabic. Our speech synthesizer uses an encoder-decoder architecture with attention. We used both tacotron-based models (tacotron-1 and tacotron-2) and the faster transformer model for generating mel-spectrograms from characters. We concatenated Tacotron1 with the WaveRNN vocoder, Tacotron2 with the WaveGlow vocoder and ESPnet transformer with the parallel wavegan vocoder to synthesize waveforms from the spectrograms. We used in-house speech data for two voices: 1) neutral male “Hamza”- narrating general content and news, and 2) expressive female “Amina”narrating children story books to train our models. Our best systems achieve an average Mean Opinion Score (MOS) of 4.21 and 4.40 for Amina and Hamza respectively.The objective evaluation of the systems using word and character error rate (WER and CER) as well as the response time measured by real-time factor favored the end-to-end architecture ESPnet.NatiQ demo is available online at https://tts.qcri.org.en_US
dc.identifier.endpage398en_US
dc.identifier.isbn978-195942927-2
dc.identifier.scopus2-s2.0-85152915467
dc.identifier.scopusSCOPUS:2-s2.0-85152915467
dc.identifier.startpage394en_US
dc.identifier.urihttp://hdl.handle.net/10679/8557
dc.language.isoengen_US
dc.publicationstatusPublisheden_US
dc.publisherAssociation for Computational Linguistics (ACL)en_US
dc.relation.ispartofWANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop
dc.relation.publicationcategoryInternational
dc.rightsrestrictedAccess
dc.titleNatiQ: An end-to-end text-to-speech system for arabicen_US
dc.typeconferenceObjecten_US
dc.type.subtypeConference paper
dspace.entity.typePublication
relation.isOrgUnitOfPublication7b58c5c4-dccc-40a3-aaf2-9b209113b763
relation.isOrgUnitOfPublication.latestForDiscovery7b58c5c4-dccc-40a3-aaf2-9b209113b763

Files

License bundle

Now showing 1 - 1 of 1
Placeholder
Name:
license.txt
Size:
1.45 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections