Graduate School of Engineering and Science
Permanent URI for this communityhttps://hdl.handle.net/10679/8952
Browse
Browsing by Institution Author "DEMİROĞLU, Cenk"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
ReviewPublication Open Access Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources(Springer, 2024-02-12) Barakat, Huda Mohammed Mohammed; Turk, O.; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Barakat, Huda Mohammed MohammedSpeech synthesis has made significant strides thanks to the transition from machine learning to deep learning models. Contemporary text-to-speech (TTS) models possess the capability to generate speech of exceptionally high quality, closely mimicking human speech. Nevertheless, given the wide array of applications now employing TTS models, mere high-quality speech generation is no longer sufficient. Present-day TTS models must also excel at producing expressive speech that can convey various speaking styles and emotions, akin to human speech. Consequently, researchers have concentrated their efforts on developing more efficient models for expressive speech synthesis in recent years. This paper presents a systematic review of the literature on expressive speech synthesis models published within the last 5 years, with a particular emphasis on approaches based on deep learning. We offer a comprehensive classification scheme for these models and provide concise descriptions of models falling into each category. Additionally, we summarize the principal challenges encountered in this research domain and outline the strategies employed to tackle these challenges as documented in the literature. In the Section 8, we pinpoint some research gaps in this field that necessitate further exploration. Our objective with this work is to give an all-encompassing overview of this hot research area to offer guidance to interested researchers and future endeavors in this field.ArticlePublication Open Access Depression screening from voice samples of patients affected by parkinson’s disease(S. Karger AG, 2019-05-01) Özkanca, Yasin Serdar; Öztürk, M. G.; Ekmekci, Merve Nur; Atkins, D. C.; Demiroğlu, Cenk; Ghomi, R. H.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Özkanca, Yasin Serdar; Ekmekci, Merve NurDepression is a common mental health problem leading to significant disability worldwide. It is not only common but also commonly co-occurs with other mental and neurological illnesses. Parkinson's disease (PD) gives rise to symptoms directly impairing a person's ability to function. Early diagnosis and detection of depression can aid in treatment, but diagnosis typically requires an interview with a health provider or a structured diagnostic questionnaire. Thus, unobtrusive measures to monitor depression symptoms in daily life could have great utility in screening depression for clinical treatment. Vocal biomarkers of depression are a potentially effective method of assessing depression symptoms in daily life, which is the focus of the current research. We have a database of 921 unique PD patients and their self-assessment of whether they felt depressed or not. Voice recordings from these patients were used to extract paralinguistic features, which served as inputs to machine learning and deep learning techniques to predict depression. The results are presented here, and the limitations are discussed given the nature of the recordings which lack language content. Our models achieved accuracies as high as 0.77 in classifying depressed and nondepressed subjects accurately using their voice features and PD severity. We found depression and severity of PD had a correlation coefficient of 0.3936, providing a valuable feature when predicting depression from voice. Our results indicate a clear correlation between feeling depressed and PD severity. Voice may be an effective digital biomarker to screen for depression among PD patients.