SCHOLARLY PUBLICATION

Permanent URI for this communityhttps://hdl.handle.net/10679/9871

Browse

Now showing 1 - 2 of 2

Open Access
Arabic offensive language on twitter: Analysis and experiments
(Association for Computational Linguistics (ACL), 2021) Mubarak, H.; Rashed, Ammar; Darwish, K.; Samih, Y.; Abdelali, A.; Rashed, Ammar
Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization. In this paper, we focus on building a large Arabic offensive tweet dataset. We introduce a method for building a dataset that is not biased by topic, dialect, or target. We produce the largest Arabic dataset to date with special tags for vulgarity and hate speech. We thoroughly analyze the dataset to determine which topics, dialects, and gender are most associated with offensive tweets and how Arabic speakers use offensive language. Lastly, we conduct many experiments to produce strong results (F1 = 83.2) on the dataset using SOTA techniques.
Metadata only
NatiQ: An end-to-end text-to-speech system for arabic
(Association for Computational Linguistics (ACL), 2022) Abdelali, A.; Durrani, N.; Demiroğlu, Cenk; Dalvi, F.; Mubarak, H.; Darwish, K.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk
NatiQ is end-to-end text-to-speech system for Arabic. Our speech synthesizer uses an encoder-decoder architecture with attention. We used both tacotron-based models (tacotron-1 and tacotron-2) and the faster transformer model for generating mel-spectrograms from characters. We concatenated Tacotron1 with the WaveRNN vocoder, Tacotron2 with the WaveGlow vocoder and ESPnet transformer with the parallel wavegan vocoder to synthesize waveforms from the spectrograms. We used in-house speech data for two voices: 1) neutral male “Hamza”- narrating general content and news, and 2) expressive female “Amina”narrating children story books to train our models. Our best systems achieve an average Mean Opinion Score (MOS) of 4.21 and 4.40 for Amina and Hamza respectively.The objective evaluation of the systems using word and character error rate (WER and CER) as well as the response time measured by real-time factor favored the end-to-end architecture ESPnet.NatiQ demo is available online at https://tts.qcri.org.

Browse

Browsing by Author "Abdelali, A."