Person:
DEMİROĞLU, Cenk

Loading...
Profile Picture

Email Address

Birth Date

WoSScopusGoogle ScholarORCID

Name

Job Title

First Name

Cenk

Last Name

DEMİROĞLU

Publication Search Results

Now showing 1 - 10 of 42
  • Placeholder
    Conference paperPublication
    LIG at MediaEval 2015 multimodal person discovery in broadcast TV task
    (CEUR-WS, 2015) Budnik, M.; Safadi, B.; Besacier, L.; Quénot, G.; Khodabakhsh, Ali; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali
    In this working notes paper the contribution of the LIG team (partnership between Univ. Grenoble Alpes and Ozyegin University) to the Multimodal Person Discovery in Broadcast TV task in MediaEval 2015 is presented. The task focused on unsupervised learning techniques. Two different approaches were submitted by the team. In the first one, new features for face and speech modalities were tested. In the second one, an alternative way to calculate the distance between face tracks and speech segments is presented. It also had a competitive MAP score and was able to beat the baseline.
  • ArticlePublicationOpen Access
    Hybrid statistical/unit-selection Turkish speech synthesis using suffix units
    (Springer International Publishing, 2016-12) Demiroğlu, Cenk; Güner, Ekrem; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Güner, Ekrem
    Unit selection based text-to-speech synthesis (TTS) has been the dominant TTS approach of the last decade. Despite its success, unit selection approach has its disadvantages. One of the most significant disadvantages is the sudden discontinuities in speech that distract the listeners (Speech Commun 51:1039-1064, 2009). The second disadvantage is that significant expertise and large amounts of data is needed for building a high-quality synthesis system which is costly and time-consuming. The statistical speech synthesis (SSS) approach is a promising alternative synthesis technique. Not only that the spurious errors that are observed in the unit selection system are mostly not observed in SSS but also building voice models is far less expensive and faster compared to the unit selection system. However, the resulting speech is typically not as natural-sounding as speech that is synthesized with a high-quality unit selection system. There are hybrid methods that attempt to take advantage of both SSS and unit selection systems. However, existing hybrid methods still require development of a high-quality unit selection system. Here, we propose a novel hybrid statistical/unit selection system for Turkish that aims at improving the quality of the baseline SSS system by improving the prosodic parameters such as intonation and stress. Commonly occurring suffixes in Turkish are stored in the unit selection database and used in the proposed system. As opposed to existing hybrid systems, the proposed system was developed without building a complete unit selection synthesis system. Therefore, the proposed method can be used without collecting large amounts of data or utilizing substantial expertise or time-consuming tuning that is typically required in building unit selection systems. Listeners preferred the hybrid system over the baseline system in the AB preference tests.
  • Placeholder
    Conference paperPublication
    Gauss karışım modeli tabanlı konuşmacı belirleme sistemlerinde klasik MAP uyarlanması yönteminin performans analizi
    (IEEE, 2010) Erdoğan, A.; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk
    Gaussian mixture models (GMM) is one of the most commonly used methods in text-independent speaker identification systems. In this paper, performance of the GMM approach has been measured with different parameters and settings. Voice activity detection (VAD) component has been found to have a significant impact on the performance. Therefore, VAD algorithms that are robust to background noise have been proposed. Significant differences in performance have been observed between male and female speakers and GSM/PSTN channels. Moreover, single-stream GMM approach has been found to perform significantly better than the multi-stream GMM approach. It has been observed under all conditions that data duration is critical for good performance.
  • Placeholder
    Book ChapterPublication
    Analysis of speech-based measures for detecting and monitoring Alzheimer’s disease
    (Springer Science+Business Media, 2014) Khodabakhsh, Ali; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali
    Automatic diagnosis of the Alzheimer’s disease as well as monitoring of the diagnosed patients can make significant economic impact on societies. We investigated an automatic diagnosis approach through the use of speech based features. As opposed to standard tests, spontaneous conversations are carried and recorded with the subjects. Speech features could discriminate between healthy people and the patients with high reliability. Although the patients were in later stages of Alzheimer’s disease, results indicate the potential of speech-based automated solutions for Alzheimer’s disease diagnosis. Moreover, the data collection process employed here can be done inexpensively by call center agents in a real-life application. Thus, the investigated techniques hold the potential to significantly reduce the financial burden on governments and Alzheimer’s patients.
  • Placeholder
    Conference paperPublication
    Eklemeli̇ di̇ller i̇çi̇n düşük bellekli̇ melez i̇stati̇sti̇ksel/bi̇ri̇m seçmeli̇ MKS si̇stemi̇
    (IEEE, 2012) Guner, Ekrem; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Guner, Ekrem
    The HMM-based TTS (HTS) approach has been increasingly getting more attention from the TTS research community. One of the advantage is the lack of spurious errors that are observed in the unit selection scheme. Another advantage of the HTS system is the small memory footprint requirement which makes it attractive for embedded devices. Here, we propose a novel hybrid statistical unit selection TTS system for agglutinative languages that aims at improving the quality of the baseline HTS system while keeping the memory footprint small. The intelligibility and quality scores of the baseline system are comparable to the MOS scores of English reported in the Blizzard Challenge tests. Listeners preferred the hybrid system over the baseline system in the A/B preference tests.
  • Placeholder
    ArticlePublication
    Eigenvoice speaker adaptation with minimal data for statistical speech synthesis systems using a MAP approach and nearest-neighbors
    (IEEE, 2014-12) Mohammadi, Amir; Sarfjoo, Seyyed Saeed; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Mohammadi, Amir; Sarfjoo, Seyyed Saeed
    Statistical speech synthesis (SSS) systems have the ability to adapt to a target speaker with a couple of minutes of adaptation data. Developing adaptation algorithms to further reduce the number of adaptation utterances to a few seconds of data can have substantial effect on the deployment of the technology in real-life applications such as consumer electronics devices. The traditional way to achieve such rapid adaptation is the eigenvoice technique which works well in speech recognition but known to generate perceptual artifacts in statistical speech synthesis. Here, we propose three methods to alleviate the quality problems of the baseline eigenvoice adaptation algorithm while allowing speaker adaptation with minimal data. Our first method is based on using a Bayesian eigenvoice approach for constraining the adaptation algorithm to move in realistic directions in the speaker space to reduce artifacts. Our second method is based on finding pre-trained reference speakers that are close to the target speaker and utilizing only those reference speaker models in a second eigenvoice adaptation iteration. Both techniques performed significantly better than the baseline eigenvoice method in the objective tests. Similarly, they both improved the speech quality in subjective tests compared to the baseline eigenvoice method. In the third method, tandem use of the proposed eigenvoice method with a state-of-the-art linear regression based adaptation technique is found to improve adaptation of excitation features.
  • Placeholder
    Conference paperPublication
    Sesli̇ yanıt si̇stemi̇ çaǧrı akışında di̇lbi̇lgi̇si̇ tabanlı Türkçe konuşma tanıma si̇stemi̇ tanıtımı
    (IEEE, 2012) Karagöz, Gün; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Karagöz, Gün
    Bu bildiride, çağrı merkezleri için kullanılan sesli yanıt sisteminde dilbilgisi-tabanlı Türkçe konuşma tanıma sistemi anlatılmaktadır. Yapılan çalışmada bir telekomünikasyon kurumunun çağrı merkezi sisteminin örneklemesi gerçekleştirilmiştir.
  • Conference paperPublicationOpen Access
    Multi-lingual depression-level assessment from conversational speech using acoustic and text features
    (International Speech Communication Association, 2018) Özkanca, Yasin Serdar; Demiroğlu, Cenk; Besirli, A.; Çelik, S.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Özkanca, Yasin Serdar
    Depression is a common mental health problem around the world with a large burden on economies, well-being, hence productivity, of individuals. Its early diagnosis and treatment are critical to reduce the costs and even save lives. One key aspect to achieve that goal is to use voice technologies and monitor depression remotely and relatively inexpensively using automated agents. Although there has been efforts to automatically assess depression levels from audiovisual features, use of transcriptions along with the acoustic features has emerged as a more recent research venue. Moreover, difficulty in data collection and the limited amounts of data available for research are also challenges that are hampering the success of the algorithms. One of the novel contributions in this paper is to exploit the databases from multiple languages for feature selection. Since a large number of features can be extracted from speech and given the small amounts of training data available, effective data selection is critical for success. Our proposed multi-lingual method was effective at selecting better features and significantly improved the depression assessment accuracy. We also use text-based features for assessment and propose a novel strategy to fuse the text- and speech-based classifiers which further boosted the performance.
  • ReviewPublicationOpen Access
    Automatic detection of attachment style in married couples through conversation analysis
    (Springer, 2023-05-31) Koçak, Tuğçe Melike; Dibek, B. Ç.; Polat, Esma Nafiye; Kafesçioğlu, Nilüfer; Demiroğlu, Cenk; Electrical & Electronics Engineering; Psychology; KAFESCİOĞLU, Nilüfer; DEMİROĞLU, Cenk; Koçak, Tuğçe Melike; Polat, Esma Nafiye
    Analysis of couple interactions using speech processing techniques is an increasingly active multi-disciplinary field that poses challenges such as automatic relationship quality assessment and behavioral coding. Here, we focused on the prediction of individuals’ attachment style using interactions of recently married (1–15 months) couples. For low-level acoustic feature extraction, in addition to the frame-based acoustic features such as mel-frequency cepstral coefficients (MFCCs) and pitch, we used the turn-based i-vector features that are the commonly used in speaker verification systems. Sentiments, positive and negative, of the dialog turns were also automatically generated from transcribed text and used as features. Feature and score fusion algorithms were used for low-level acoustic features and text features. Even though score and feature fusion algorithms performed similar, predictions with score fusion were more consistent when couples have known each other for a longer period of time.
  • Placeholder
    ArticlePublication
    Spoofing attacks to i-vector based voice verification systems using statistical speech synthesis with additive noise and countermeasure
    (IEEE, 2016) Özbay, Mustafa Caner; Khodabakhsh, Ali; Mohammadi, Amir; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Özbay, Mustafa Caner; Khodabakhsh, Ali; Mohammadi, Amir
    Even though improvements in the speaker verification (SV) technology with i-vectors increased their real-life deployment, their vulnerability to spoofing attacks is a major concern. Here, we investigated the effectiveness of spoofing attacks with statistical speech synthesis systems using limited amount of adaptation data and additive noise. Experiment results show that effective spoofing is possible using limited adaptation data. Moreover, the attacks get substantially more effective when noise is intentionally added to synthetic speech. Training the SV system with matched noise conditions does not alleviate the problem. We propose a synthetic speech detector (SSD) that uses session differences in i-vectors for counterspoofing. The proposed SSD had less than 0.5% total error rate in most cases for the matched noise conditions. For the mismatched noise conditions, missed detection rate further decreased but total error increased which indicates that some calibration is needed for mismatched noise conditions.