Browsing by Author "Khodabakhsh, Ali"

Now showing 1 - 14 of 14

Metadata only
Analysis of speech-based measures for detecting and monitoring Alzheimer’s disease
(Springer Science+Business Media, 2014) Khodabakhsh, Ali; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali
Automatic diagnosis of the Alzheimer’s disease as well as monitoring of the diagnosed patients can make significant economic impact on societies. We investigated an automatic diagnosis approach through the use of speech based features. As opposed to standard tests, spontaneous conversations are carried and recorded with the subjects. Speech features could discriminate between healthy people and the patients with high reliability. Although the patients were in later stages of Alzheimer’s disease, results indicate the potential of speech-based automated solutions for Alzheimer’s disease diagnosis. Moreover, the data collection process employed here can be done inexpensively by call center agents in a real-life application. Thus, the investigated techniques hold the potential to significantly reduce the financial burden on governments and Alzheimer’s patients.
Metadata only
Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance
(IEEE, 2016-04) Wu, Z.; Leon, P. L. de; Demiroğlu, Cenk; Khodabakhsh, Ali; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali
In this paper, we present a systematic study of the vulnerability of automatic speaker verification to a diverse range of spoofing attacks. We start with a thorough analysis of the spoofing effects of five speech synthesis and eight voice conversion systems, and the vulnerability of three speaker verification systems under those attacks. We then introduce a number of countermeasures to prevent spoofing attacks from both known and unknown attackers. Known attackers are spoofing systems whose output was used to train the countermeasures, while an unknown attacker is a spoofing system whose output was not available to the countermeasures during training. Finally, we benchmark automatic systems against human performance on both speaker verification and spoofing detection tasks.
Metadata only
Detection of Alzheimer's disease using prosodic cues in conversational speech
(IEEE, 2014) Khodabakhsh, Ali; Kuşçuoğlu, Serhan; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali; Kuşçuoğlu, Serhan
Automatic diagnosis of the Alzheimer's disease as well as monitoring of the diagnosed patients can make significant economic impact on societies. We investigated an automatic diagnosis approach through the use of speech based features. As opposed to standard tests that are mostly focused on memory recall, spontaneous conversations are carried with the subjects in informal settings. Prosodic speech features extracted from speech could discriminate between healthy people and the patients with high reliability. Although the patients were in later stages of Alzheimer's disease, results indicate the potential of speech-based automated solutions for Alzheimer's disease diagnosis. Moreover, the data collection process employed here can be done inexpensively by call center agents in a real-life application. Thus, the investigated techniques hold the potential to significantly reduce the financial burden on governments and Alzheimer' patients.
Open Access
Evaluation of linguistic and prosodic features for detection of Alzheimer’s disease in Turkish conversational speech
(Springer Science+Business Media, 2015-12) Khodabakhsh, Ali; Yesil, Fatih; Guner, Ekrem; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali; Yesil, Fatih; Guner, Ekrem
Automatic diagnosis and monitoring of Alzheimer’s disease can have a significant impact on society as well as the well-being of patients. The part of the brain cortex that processes language abilities is one of the earliest parts to be affected by the disease. Therefore, detection of Alzheimer’s disease using speech-based features is gaining increasing attention. Here, we investigated an extensive set of features based on speech prosody as well as linguistic features derived from transcriptions of Turkish conversations with subjects with and without Alzheimer’s disease. Unlike most standardized tests that focus on memory recall or structured conversations, spontaneous unstructured conversations are conducted with the subjects in informal settings. Age-, education-, and gender-controlled experiments are performed to eliminate the effects of those three variables. Experimental results show that the proposed features extracted from the speech signal can be used to discriminate between the control group and the patients with Alzheimer’s disease. Prosodic features performed significantly better than the linguistic features. Classification accuracy over 80% was obtained with three of the prosodic features, but experiments with feature fusion did not further improve the classification performance.
Metadata only
LIG at MediaEval 2015 multimodal person discovery in broadcast TV task
(CEUR-WS, 2015) Budnik, M.; Safadi, B.; Besacier, L.; Quénot, G.; Khodabakhsh, Ali; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali
In this working notes paper the contribution of the LIG team (partnership between Univ. Grenoble Alpes and Ozyegin University) to the Multimodal Person Discovery in Broadcast TV task in MediaEval 2015 is presented. The task focused on unsupervised learning techniques. Two different approaches were submitted by the team. In the first one, new features for face and speech modalities were tested. In the second one, an alternative way to calculate the distance between face tracks and speech segments is presented. It also had a competitive MAP score and was able to beat the baseline.
Open Access
Medical podcasting in Iran; pilot, implementation and attitude evaluation
(Tehran University of Medical Sciences, 2013) Heydarpour, P; Hafezi-Nejad, N.; Khodabakhsh, Ali; Khosravi, M.; Khoshkish, S.; Sadeghian, M.; Samavat, B.; Faturechi, A.; Pasalar, P.; Dehpour, A. R.; Khodabakhsh, Ali
Podcasting has become a popular means of transferring knowledge in higher education through making lecture contents available to students at their convenience. Accessing courses on media players provides students with enhanced learning opportunities. Development of teaching methods able to cope with ever-changing nature of medicine is crucial to train the millennium students. Pharmacology education in Tehran University of Medical Sciences has been based on lectures so far; our aim was to implement a pilot study to evaluate the advantages and disadvantages of offering the course contents as podcasts as well as evaluating whether such program can be feasible in our educational program. 46% of students downloaded the podcast according to our download center. 48% favored usage of both internet and DVD-ROM concurrently. Overall 96% of students perceived that podcasting had a positive impact on their learning in pharmacology course. Our results indicate that most of attendants proposed the positive yields of podcasting despite low usage of it, mainly as a pre-class preparing tool.
Metadata only
Natural language features for detection of Alzheimer's disease in conversational speech
(IEEE, 2014) Khodabakhsh, Ali; Kuşçuoğlu, Serhan; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali; Kuşçuoğlu, Serhan
Automatic monitoring of the patients with Alzheimer's disease and diagnosis of the disease in early stages can have a significant impact on the society. Here, we investigate an automatic diagnosis approach through the use of features derived from transcriptions of conversations with the subjects. As opposed to standard tests that are mostly focused on memory recall, spontaneous conversations are carried with the subjects in informal settings. Features extracted from the transcriptions of the conversations could discriminate between healthy people and patients with high reliability. Although the results are preliminary and patients were in later stages of Alzheimer's disease, results indicate the potential use of the proposed natural language based features in the early stages of the disease also. Moreover, the data collection process employed here can be done inexpensively by call center agents in a real-life application using automatic speech recognition systems (ASR) which are known to have very high accuracies in recent years. Thus, the investigated features hold the potential to make it low-cost and convenient to diagnose the disease and monitor the diagnosed patients over time.
Metadata only
OCR-aided person annotation and label propagation for speaker modeling in TV shows
(IEEE, 2016) Budnik, M.; Besacier, L.; Khodabakhsh, Ali; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali
In this paper, we present an approach for minimizing human effort in manual speaker annotation. Label propagation is used at each iteration of an active learning cycle. More precisely, a selection strategy for choosing the most suitable speech track to be labeled is proposed. Four different selection strategies are evaluated and all the tracks in a corresponding cluster are gathered using agglomerative clustering in order to propagate human annotations. To further reduce the manual labor required, an optical character recognition system is used to bootstrap annotations. At each step of the cycle, annotations are used to build speaker models. The quality of the generated speaker models is evaluated at each step using an i-vector based speaker identification system. The presented approach shows promising results on the REPERE corpus with a minimum amount of human effort for annotation.
Metadata only
Persica: A Persian corpus for multi-purpose text mining and natural language processing
(IEEE, 2012) Eghbalzadeh, H.; Hosseini, B; Khadivi, S.; Khodabakhsh, Ali; Khodabakhsh, Ali
Lack of multi-application text corpus despite of the surging text data is a serious bottleneck in the text mining and natural language processing especially in Persian language. This paper presents a new corpus for NEWS articles analysis in Persian called Persica. NEWS analysis includes NEWS classification, topic discovery and classification, trend discovery, category classification and many more procedures. Dealing with NEWS has special requirements. First of all it needs a valid and NEWS-content-enriched corpus to perform the experiments. Our Approach is based on a modified category classification and data normalization over Persian NEWS articles which has led to creation of a multipurpose Persian corpus which shows reasonable results in text mining outcomes. In the literature, regarding to our knowledge there are few Persian corpuses but none of them have Persian NEWS time trend characteristics. Empirical results on our benchmark indicate that in addition to reducing the problem dimensions and useless content, Persica keeps admissible validity and reliability in comparison with standard corpuses in the literature.
Metadata only
Postprocessing synthetic speech with a complex cepstrum vocoder for spoofing phase-based synthetic speech detectors
(IEEE, 2017-06) Demiroğlu, Cenk; Buyuk, O.; Khodabakhsh, Ali; Maia, R.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali
State-of-the-art speaker verification systems are vulnerable to spoofing attacks. To address the issue, high-performance synthetic speech detectors (SSDs) for existing spoofing methods have been proposed. Phase-based SSDs that exploit the fact that most of the parametric speech coders use minimum-phase filters are particularly successful when synthetic speech is generated with a parametric vocoder. Here, we propose a new attack strategy to spoof phase-based SSDs with the objective of increasing the security of voice verification systems by enabling the development of more generalized SSDs. As opposed to other parametric vocoders, the complex cepstrum approach uses mixed-phase filters, which makes it an ideal candidate for spoofing the phase-based SSDs. We propose using a complex cepstrum vocoder as a postprocessor to existing techniques to spoof the speaker verification system as well as the phase-based SSDs. Once synthetic speech is generated with a speech synthesis or a voice conversion technique, for each synthetic speech frame, a natural frame is selected from a training database using a spectral distance measure. Then, complex cepstrum parameters of the natural frame are used for resynthesizing the synthetic frame. In the proposed method, complex cepstrum-based resynthesis is used as a postprocessor. Hence, it can be used in tandem with any synthetic speech generator. Experimental results showed that the approach is successful at spoofing four phase-based SSDs across nine parametric attack algorithms. Moreover, performance at spoofing the speaker verification system did not substantially degrade compared to the case when no postprocessor is employed.
Metadata only
SAS : A speaker verification spoofing database containing diverse attacks
(IEEE, 2015) Wu, Z.; Khodabakhsh, Ali; Demiroğlu, Cenk; Yamagishi, J.; Saito, D.; Toda, T.; King, S.; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali
This paper presents the first version of a speaker verification spoofing and anti-spoofing database, named SAS corpus. The corpus includes nine spoofing techniques, two of which are speech synthesis, and seven are voice conversion. We design two protocols, one for standard speaker verification evaluation, and the other for producing spoofing materials. Hence, they allow the speech synthesis community to produce spoofing materials incrementally without knowledge of speaker verification spoofing and anti-spoofing. To provide a set of preliminary results, we conducted speaker verification experiments using two state-of-the-art systems. Without any anti-spoofing techniques, the two systems are extremely vulnerable to the spoofing attacks implemented in our SAS corpus.
Metadata only
Spoofing and anti-spoofing techniques for text-independent speaker verification systems
(2015-10) Khodabakhsh, Ali; Demiroğlu, Cenk; Demiroğlu, Cenk; Özgür, A.; Şensoy, Murat; Department of Electrical and Electronics Engineering; Khodabakhsh, Ali
There has been substantial progress in the speaker verification field in recent years. I-vector based approach in particular received significant attention due to its high performance. Improvements in the verification technology also led to concerns about spoofing attacks to which the i-vector based methods are vulnerable. Here, we first investigated the vulnerability of an i-vector based verification system to attacks using statistical speech synthesis (SSS) with a particular focus on the case where the attacker has only a very limited amount of data from the target speaker. However, it is well-known that speech that is generated with SSS is easy to detect using features that are extracted from the magnitude or the phase spectrum. Therefore, for more effective attacks, we propose a hybrid statistical/concatenative synthesis approach and show that hybrid synthesis significantly increases the false alarm rate in the verification system compared to the baseline statistical synthesis method. Moreover, proposed hybrid synthesis makes detecting synthetic speech more difficult even when very limited amount of original speech recordings are available to the attacker. To further increase the effectiveness of the attacks, we propose a linear regression method that transforms synthetic features into more natural features. An interpolation approach is proposed to combine the regression and hybrid synthesis methods which is shown to provide the best spoofing performance. Furthermore, we investigated the effectiveness of spoofing attacks with statistical speech synthesis systems when there is additive noise. Experiment results show that the attacks get substantially more effective when noise is added to synthetic speech. We also propose a synthetic speech detector that uses session differences in i-vectors to detect between synthetic and natural speech. We experimentally show that the detector has less than 0.5% total error rate in most cases for the matched noise conditions. As a third contribution, we present our participation in generation of the first version of speaker verification spoofing and anti-spoofing database, named SAS corpus. The corpus includes nine spoofing techniques, two of which are speech synthesis, and seven are voice conversion. Two protocols were designed, one for standard speaker verification evaluation, and the other for producing spoofing materials. Hence, they allow the speech synthesis community to produce spoofing materials incrementally without knowledge of speaker verification spoofing and anti-spoofing. To provide a set of preliminary results, we conducted speaker verification experiments using two state-of-the-art systems. Without any anti-spoofing techniques, these two systems are extremely vulnerable to the spoofing attacks implemented in our SAS corpus. This work later gave birth to the first automatic speaker verification spoofing and countermeasures challenge. In our participation in this challenge, we investigated three algorithms that weigh likelihood-ratio scores of individual frames in Gaussian mixture model based detectors, phonemes, and sound-classes depending on how much information they carry. The proposed methods learn to detect both short-time and long-time artifacts which make them more reliable compared to a baseline system that treats all frames and phonemes with equal weight. Significant improvement over the baseline system has been obtained for known attack methods that were used in training the detectors. However, improvement with unknown attack types was not substantial.
Metadata only
Spoofing attacks to i-vector based voice verification systems using statistical speech synthesis with additive noise and countermeasure
(IEEE, 2016) Özbay, Mustafa Caner; Khodabakhsh, Ali; Mohammadi, Amir; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Özbay, Mustafa Caner; Khodabakhsh, Ali; Mohammadi, Amir
Even though improvements in the speaker verification (SV) technology with i-vectors increased their real-life deployment, their vulnerability to spoofing attacks is a major concern. Here, we investigated the effectiveness of spoofing attacks with statistical speech synthesis systems using limited amount of adaptation data and additive noise. Experiment results show that effective spoofing is possible using limited adaptation data. Moreover, the attacks get substantially more effective when noise is intentionally added to synthetic speech. Training the SV system with matched noise conditions does not alleviate the problem. We propose a synthetic speech detector (SSD) that uses session differences in i-vectors for counterspoofing. The proposed SSD had less than 0.5% total error rate in most cases for the matched noise conditions. For the mismatched noise conditions, missed detection rate further decreased but total error increased which indicates that some calibration is needed for mismatched noise conditions.
Metadata only
Spoofing voice verification systems with statistical speech synthesis using limited adaptation data
(Elsevier, 2017-03) Khodabakhsh, Ali; Mohammadi, Amir; Demiroğlu, Cenk; Electrical & Electronics Engineering; DEMİROĞLU, Cenk; Khodabakhsh, Ali; Mohammadi, Amir
State-of-the-art speaker verification systems are vulnerable to spoofing attacks using speech synthesis. To solve the issue, high-performance synthetic speech detectors (SSDs) for attack methods have been proposed recently. Here, as opposed to developing new detectors, we investigate new attack strategies. Investigating new techniques that are specifically tailored for spoofing attacks that can spoof the voice verification system and are difficult to detect is expected to increase the security of voice verification systems by enabling the development of better detectors. First, we investigated the vulnerability of an i-vector based verification system to attacks using statistical speech synthesis (SSS), with a particular focus on the case where the attacker has only a very limited amount of data from the target speaker. Even with a single adaptation utterance, the false alarm rate was found to be 23%. Still, SSS-generated speech is easy to detect (Wu et al., 2015a, 2015b), which dramatically reduces its effectiveness. For more effective attacks with limited data, we propose a hybrid statistical/concatenative synthesis approach and show that hybrid synthesis significantly increases the false alarm rate in the verification system compared to the baseline SSS method. Moreover, proposed hybrid synthesis makes detecting synthetic speech more difficult compared to SSS even when very limited amount of original speech recordings are available to the attacker. To further increase the effectiveness of the attacks, we propose a linear regression method that transforms synthetic features into more natural features. Even though the regression approach is more effective at spoofing the detectors, it is not as effective as the hybrid synthesis approach in spoofing the verification system. An interpolation approach is proposed to combine the linear regression and hybrid synthesis methods, which is shown to provide the best spoofing performance in most cases.