Person: DEMİROĞLU, Cenk
Name
Job Title
First Name
Cenk
Last Name
DEMİROĞLU
42 results
Publication Search Results
Now showing 1 - 10 of 42
ArticlePublication Metadata only Uncertainty assessment for detection of spoofing attacks to speaker verification systems using a Bayesian approach(Elsevier, 2022-02) Süslü, Çağıl; Eren, Eray; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Süslü, Çağıl; Eren, ErayThere has been tremendous progress in automatic speaker verification systems over the last decade. Still, spoofing attacks pose a significant challenge to their deployment. Even though there are various attack techniques such as voice conversion and speech synthesis, replay attacks pose one of the most important types since they can be done without significant expertise in speech technology. Moreover, replay attacks are hard to detect because they are done with simple replay of the original audio. The problem has gained more attention since the introduction of the ASV spoof 2017 challenge, which included a well-designed database with realistic replay attack conditions. Even though many different deep network types and acoustic features were proposed since the challenge, one key issue, which is model uncertainty around the neural networks’ decision is largely ignored. This is a result of using the softmax function with the cross-entropy loss, which is widely used in many domains. Here, we propose using evidential deep learning, which is a recently proposed method that is rapidly gaining popularity, for assessing the model uncertainty around the network's decision. Experimental results show that the investigated network architectures perform better in terms of equal error rate with the new loss function. Moreover, reliability of measured uncertainty is shown by filtering samples out of the test set using the Bayesian uncertainty measure, which resulted with a consistent decrease in EER with decreasing threshold.ArticlePublication Metadata only Disentangling human trafficking types and the identification of pathways to forced labor and sex: an explainable analytics approach(Springer, 2023-07) Eryarsoy, E.; Topuz, K.; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, CenkTerms such as human trafficking and modern-day slavery are ephemeral but reflect manifestations of oppression, servitude, and captivity that perpetually have threatened the basic right of all humans. Operations research and analytical tools offering practical wisdom have paid scant attention to this overarching problem. Motivated by this lacuna, this study considers two of the most prevalent categories of human trafficking: forced labor and forced sex. Using one of the largest available datasets due to Counter-Trafficking Data Collective (CTDC), we examine patterns related to forced sex and forced labor. Our study uses a two-phase approach focusing on explainability: Phase 1 involves logistic regression (LR) segueing to association rules analysis and Phase 2 employs Bayesian Belief Networks (BBNs) to uncover intricate pathways leading to human trafficking. This combined approach provides a comprehensive understanding of the factors contributing to human trafficking, effectively addressing the limitations of conventional methods. We confirm and challenge some of the key findings in the extant literature and call for better prevention strategies. Our study goes beyond the pretext of analytics usage by prescribing how to incorporate our results in combating human trafficking.ArticlePublication Open Access Depression-level assessment from multi-lingual conversational speech data using acoustic and text features(Springer Nature, 2020-11-17) Demiroğlu, Cenk; Besirli, A.; Özkanca, Yasin Sedar; Celik, S.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Özkanca, Yasin SedarDepression is a widespread mental health problem around the world with a significant burden on economies. Its early diagnosis and treatment are critical to reduce the costs and even save lives. One key aspect to achieve that goal is to use technology and monitor depression remotely and relatively inexpensively using automated agents. There has been numerous efforts to automatically assess depression levels using audiovisual features as well as text-analysis of conversational speech transcriptions. However, difficulty in data collection and the limited amounts of data available for research present challenges that are hampering the success of the algorithms. One of the two novel contributions in this paper is to exploit databases from multiple languages for acoustic feature selection. Since a large number of features can be extracted from speech, given the small amounts of training data available, effective data selection is critical for success. Our proposed multi-lingual method was effective at selecting better features than the baseline algorithms, which significantly improved the depression assessment accuracy. The second contribution of the paper is to extract text-based features for depression assessment and use a novel algorithm to fuse the text- and speech-based classifiers which further boosted the performance.ArticlePublication Open Access Depression screening from voice samples of patients affected by parkinson’s disease(S. Karger AG, 2019-05-01) Özkanca, Yasin Serdar; Öztürk, M. G.; Ekmekci, Merve Nur; Atkins, D. C.; Demiroğlu, Cenk; Ghomi, R. H.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Özkanca, Yasin Serdar; Ekmekci, Merve NurDepression is a common mental health problem leading to significant disability worldwide. It is not only common but also commonly co-occurs with other mental and neurological illnesses. Parkinson's disease (PD) gives rise to symptoms directly impairing a person's ability to function. Early diagnosis and detection of depression can aid in treatment, but diagnosis typically requires an interview with a health provider or a structured diagnostic questionnaire. Thus, unobtrusive measures to monitor depression symptoms in daily life could have great utility in screening depression for clinical treatment. Vocal biomarkers of depression are a potentially effective method of assessing depression symptoms in daily life, which is the focus of the current research. We have a database of 921 unique PD patients and their self-assessment of whether they felt depressed or not. Voice recordings from these patients were used to extract paralinguistic features, which served as inputs to machine learning and deep learning techniques to predict depression. The results are presented here, and the limitations are discussed given the nature of the recordings which lack language content. Our models achieved accuracies as high as 0.77 in classifying depressed and nondepressed subjects accurately using their voice features and PD severity. We found depression and severity of PD had a correlation coefficient of 0.3936, providing a valuable feature when predicting depression from voice. Our results indicate a clear correlation between feeling depressed and PD severity. Voice may be an effective digital biomarker to screen for depression among PD patients.Conference ObjectPublication Metadata only Natural language features for detection of Alzheimer's disease in conversational speech(IEEE, 2014) Khodabakhsh, Ali; Kuşçuoğlu, Serhan; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali; Kuşçuoğlu, SerhanAutomatic monitoring of the patients with Alzheimer's disease and diagnosis of the disease in early stages can have a significant impact on the society. Here, we investigate an automatic diagnosis approach through the use of features derived from transcriptions of conversations with the subjects. As opposed to standard tests that are mostly focused on memory recall, spontaneous conversations are carried with the subjects in informal settings. Features extracted from the transcriptions of the conversations could discriminate between healthy people and patients with high reliability. Although the results are preliminary and patients were in later stages of Alzheimer's disease, results indicate the potential use of the proposed natural language based features in the early stages of the disease also. Moreover, the data collection process employed here can be done inexpensively by call center agents in a real-life application using automatic speech recognition systems (ASR) which are known to have very high accuracies in recent years. Thus, the investigated features hold the potential to make it low-cost and convenient to diagnose the disease and monitor the diagnosed patients over time.Conference ObjectPublication Metadata only NatiQ: An end-to-end text-to-speech system for arabic(Association for Computational Linguistics (ACL), 2022) Abdelali, A.; Durrani, N.; Demiroğlu, Cenk; Dalvi, F.; Mubarak, H.; Darwish, K.; Electrical & Electronics Engineering; DEMİROĞLU, CenkNatiQ is end-to-end text-to-speech system for Arabic. Our speech synthesizer uses an encoder-decoder architecture with attention. We used both tacotron-based models (tacotron-1 and tacotron-2) and the faster transformer model for generating mel-spectrograms from characters. We concatenated Tacotron1 with the WaveRNN vocoder, Tacotron2 with the WaveGlow vocoder and ESPnet transformer with the parallel wavegan vocoder to synthesize waveforms from the spectrograms. We used in-house speech data for two voices: 1) neutral male “Hamza”- narrating general content and news, and 2) expressive female “Amina”narrating children story books to train our models. Our best systems achieve an average Mean Opinion Score (MOS) of 4.21 and 4.40 for Amina and Hamza respectively.The objective evaluation of the systems using word and character error rate (WER and CER) as well as the response time measured by real-time factor favored the end-to-end architecture ESPnet.NatiQ demo is available online at https://tts.qcri.org.Conference ObjectPublication Open Access Multi-lingual depression-level assessment from conversational speech using acoustic and text features(International Speech Communication Association, 2018) Özkanca, Yasin Serdar; Demiroğlu, Cenk; Besirli, A.; Çelik, S.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Özkanca, Yasin SerdarDepression is a common mental health problem around the world with a large burden on economies, well-being, hence productivity, of individuals. Its early diagnosis and treatment are critical to reduce the costs and even save lives. One key aspect to achieve that goal is to use voice technologies and monitor depression remotely and relatively inexpensively using automated agents. Although there has been efforts to automatically assess depression levels from audiovisual features, use of transcriptions along with the acoustic features has emerged as a more recent research venue. Moreover, difficulty in data collection and the limited amounts of data available for research are also challenges that are hampering the success of the algorithms. One of the novel contributions in this paper is to exploit the databases from multiple languages for feature selection. Since a large number of features can be extracted from speech and given the small amounts of training data available, effective data selection is critical for success. Our proposed multi-lingual method was effective at selecting better features and significantly improved the depression assessment accuracy. We also use text-based features for assessment and propose a novel strategy to fuse the text- and speech-based classifiers which further boosted the performance.Conference ObjectPublication Metadata only Gauss karışım modeli tabanlı konuşmacı belirleme sistemlerinde klasik MAP uyarlanması yönteminin performans analizi(IEEE, 2010) Erdoğan, A.; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, CenkGaussian mixture models (GMM) is one of the most commonly used methods in text-independent speaker identification systems. In this paper, performance of the GMM approach has been measured with different parameters and settings. Voice activity detection (VAD) component has been found to have a significant impact on the performance. Therefore, VAD algorithms that are robust to background noise have been proposed. Significant differences in performance have been observed between male and female speakers and GSM/PSTN channels. Moreover, single-stream GMM approach has been found to perform significantly better than the multi-stream GMM approach. It has been observed under all conditions that data duration is critical for good performance.Conference ObjectPublication Metadata only Gauss karışım modeli tabanlı konuşmacı doğrulama sistemlerinde kişiye ve kanala uyarlanmada klasik MAP tabanlı yöntemlerin performans analizi(IEEE, 2011) Koşunda, Serol; Yeşil, Fatih; Ayazoğlu, Yaprak; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Koşunda, Serol; Yeşil, Fatih; Ayazoğlu, YaprakIn this paper, performance of Gaussian mixture models (GMM) based algorithms implemented in Speech Processing Laboratory at Ozyegin University, within NIST SRE2004 and 2006 database was reported. Gaussian mixture models (GMM) is one of the most commonly used methods in text-independent speaker verification systems. In this paper, performance of the GMM approach has been measured with different parameters and settings. It has also been observed that eigenchannel-MAP and JFA methods both have increased the performance of the system against session variability which is one of the most challenging problem in text-independent speaker verification systems.Conference ObjectPublication Metadata only Parkinson’s disease diagnosis using machine learning and voice(IEEE, 2018) Wroge, T. J.; Özkanca, Yasin Serdar; Demiroğlu, Cenk; Si, D.; Atkins, D. C.; Ghomi, R. H.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Özkanca, Yasin SerdarBiomarkers derived from human voice can offer in-sight into neurological disorders, such as Parkinson's disease (PD), because of their underlying cognitive and neuromuscular function. PD is a progressive neurodegenerative disorder that affects about one million people in the the United States, with approximately sixty thousand new clinical diagnoses made each year [1]. Historically, PD has been difficult to quantity and doctors have tended to focus on some symptoms while ignoring others, relying primarily on subjective rating scales [2]. Due to the decrease in motor control that is the hallmark of the disease, voice can be used as a means to detect and diagnose PD. With advancements in technology and the prevalence of audio collecting devices in daily lives, reliable models that can translate this audio data into a diagnostic tool for healthcare professionals would potentially provide diagnoses that are cheaper and more accurate. We provide evidence to validate this concept here using a voice dataset collected from people with and without PD. This paper explores the effectiveness of using supervised classification algorithms, such as deep neural networks, to accurately diagnose individuals with the disease. Our peak accuracy of 85% provided by the machine learning models exceed the average clinical diagnosis accuracy of non-experts (73.8%) and average accuracy of movement disorder specialists (79.6% without follow-up, 83.9% after follow-up) with pathological post-mortem examination as ground truth [3].