Electrical & Electronics Engineering
Permanent URI for this collectionhttps://hdl.handle.net/10679/44
Browse
Browsing by Institution Author "DEMİROĞLU, Cenk"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
ReviewPublication Open Access Automatic detection of attachment style in married couples through conversation analysis(Springer, 2023-05-31) Koçak, Tuğçe Melike; Dibek, B. Ç.; Polat, Esma Nafiye; Kafesçioğlu, Nilüfer; Demiroğlu, Cenk; Electrical & Electronics Engineering; Psychology; KAFESCİOĞLU, Nilüfer; DEMİROĞLU, Cenk; Koçak, Tuğçe Melike; Polat, Esma NafiyeAnalysis of couple interactions using speech processing techniques is an increasingly active multi-disciplinary field that poses challenges such as automatic relationship quality assessment and behavioral coding. Here, we focused on the prediction of individuals’ attachment style using interactions of recently married (1–15 months) couples. For low-level acoustic feature extraction, in addition to the frame-based acoustic features such as mel-frequency cepstral coefficients (MFCCs) and pitch, we used the turn-based i-vector features that are the commonly used in speaker verification systems. Sentiments, positive and negative, of the dialog turns were also automatically generated from transcribed text and used as features. Feature and score fusion algorithms were used for low-level acoustic features and text features. Even though score and feature fusion algorithms performed similar, predictions with score fusion were more consistent when couples have known each other for a longer period of time.ArticlePublication Metadata only Deep learning-based speaker-adaptive postfiltering with limited adaptation data for embedded text-to-speech synthesis systems(Elsevier, 2023-06) Eren, Eray; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Eren, ErayEnd-to-end (e2e) speech synthesis systems have become popular with the recent introduction of text-to-spectrogram conversion systems, such as Tacotron, that use encoder–decoder-based neural architectures. Even though those sequence-to-sequence systems can produce mel-spectrograms from the letters without a text processing frontend, they require substantial amounts of well-manipulated, labeled audio data that have high SNR and minimum amounts of artifacts. These data requirements make it difficult to build end-to-end systems from scratch, especially for low-resource languages. Moreover, most of the e2e systems are not designed for devices with tiny memory and CPU resources. Here, we investigate using a traditional deep neural network (DNN) for acoustic modeling together with a postfilter that improves the speech features produced by the network. The proposed architectures were trained with the relatively noisy, multi-speaker, Wall Street Journal (WSJ) database and tested with unseen speakers. The thin postfilter layer was adapted with minimal data to the target speaker for testing. We investigated several postfilter architectures and compared them with both objective and subjective tests. Fully-connected and transformer-based architectures performed the best in subjective tests. The novel adversarial transformer-based architecture with adaptive discriminator loss performed the best in the objective tests. Moreover, it was faster than the other architectures both in training and inference. Thus, our proposed lightweight transformer-based postfilter architecture significantly improved speech quality and efficiently adapted to new speakers with few shots of data and a hundred training iterations, making it computationally efficient and suitable for scalability.ArticlePublication Open Access Depression screening from voice samples of patients affected by parkinson’s disease(S. Karger AG, 2019-05-01) Özkanca, Yasin Serdar; Öztürk, M. G.; Ekmekci, Merve Nur; Atkins, D. C.; Demiroğlu, Cenk; Ghomi, R. H.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Özkanca, Yasin Serdar; Ekmekci, Merve NurDepression is a common mental health problem leading to significant disability worldwide. It is not only common but also commonly co-occurs with other mental and neurological illnesses. Parkinson's disease (PD) gives rise to symptoms directly impairing a person's ability to function. Early diagnosis and detection of depression can aid in treatment, but diagnosis typically requires an interview with a health provider or a structured diagnostic questionnaire. Thus, unobtrusive measures to monitor depression symptoms in daily life could have great utility in screening depression for clinical treatment. Vocal biomarkers of depression are a potentially effective method of assessing depression symptoms in daily life, which is the focus of the current research. We have a database of 921 unique PD patients and their self-assessment of whether they felt depressed or not. Voice recordings from these patients were used to extract paralinguistic features, which served as inputs to machine learning and deep learning techniques to predict depression. The results are presented here, and the limitations are discussed given the nature of the recordings which lack language content. Our models achieved accuracies as high as 0.77 in classifying depressed and nondepressed subjects accurately using their voice features and PD severity. We found depression and severity of PD had a correlation coefficient of 0.3936, providing a valuable feature when predicting depression from voice. Our results indicate a clear correlation between feeling depressed and PD severity. Voice may be an effective digital biomarker to screen for depression among PD patients.ArticlePublication Open Access Depression-level assessment from multi-lingual conversational speech data using acoustic and text features(Springer Nature, 2020-11-17) Demiroğlu, Cenk; Besirli, A.; Özkanca, Yasin Sedar; Celik, S.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Özkanca, Yasin SedarDepression is a widespread mental health problem around the world with a significant burden on economies. Its early diagnosis and treatment are critical to reduce the costs and even save lives. One key aspect to achieve that goal is to use technology and monitor depression remotely and relatively inexpensively using automated agents. There has been numerous efforts to automatically assess depression levels using audiovisual features as well as text-analysis of conversational speech transcriptions. However, difficulty in data collection and the limited amounts of data available for research present challenges that are hampering the success of the algorithms. One of the two novel contributions in this paper is to exploit databases from multiple languages for acoustic feature selection. Since a large number of features can be extracted from speech, given the small amounts of training data available, effective data selection is critical for success. Our proposed multi-lingual method was effective at selecting better features than the baseline algorithms, which significantly improved the depression assessment accuracy. The second contribution of the paper is to extract text-based features for depression assessment and use a novel algorithm to fuse the text- and speech-based classifiers which further boosted the performance.Conference ObjectPublication Metadata only Developing session-based personalized accommodation recommender system by using LSTM(IEEE, 2022) Can, Y. S.; Erkut, H.; Giritli, E. B.; Kutluay, H.; Buyukoguz, K.; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, CenkTourism sector has been transformed by the advances in the Internet technology. Users can search for information and can select their destination from various alternatives by themselves, which brings the need for personal recommender methods. Personalized recommender system development is a complex topic. Demographic information, series of user clicks, and interactions and hotel features are examined to offer the appropriate set of hotels. Since the user interactions, clicks and hotel history is a time series data, Long Short-Term Memory models is a perfect fit to recommend a set of hotels from this data. In this study, we proposed a session-based accommodation recommender system that uses LSTM and achieved promising results.Conference ObjectPublication Metadata only Parkinson’s disease diagnosis using machine learning and voice(IEEE, 2018) Wroge, T. J.; Özkanca, Yasin Serdar; Demiroğlu, Cenk; Si, D.; Atkins, D. C.; Ghomi, R. H.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Özkanca, Yasin SerdarBiomarkers derived from human voice can offer in-sight into neurological disorders, such as Parkinson's disease (PD), because of their underlying cognitive and neuromuscular function. PD is a progressive neurodegenerative disorder that affects about one million people in the the United States, with approximately sixty thousand new clinical diagnoses made each year [1]. Historically, PD has been difficult to quantity and doctors have tended to focus on some symptoms while ignoring others, relying primarily on subjective rating scales [2]. Due to the decrease in motor control that is the hallmark of the disease, voice can be used as a means to detect and diagnose PD. With advancements in technology and the prevalence of audio collecting devices in daily lives, reliable models that can translate this audio data into a diagnostic tool for healthcare professionals would potentially provide diagnoses that are cheaper and more accurate. We provide evidence to validate this concept here using a voice dataset collected from people with and without PD. This paper explores the effectiveness of using supervised classification algorithms, such as deep neural networks, to accurately diagnose individuals with the disease. Our peak accuracy of 85% provided by the machine learning models exceed the average clinical diagnosis accuracy of non-experts (73.8%) and average accuracy of movement disorder specialists (79.6% without follow-up, 83.9% after follow-up) with pathological post-mortem examination as ground truth [3].