Postprocessing synthetic speech with a complex cepstrum vocoder for spoofing phase-based synthetic speech detectors

Demiroğlu, Cenk; Buyuk, O.; Khodabakhsh, Ali; Maia, R.

dc.contributor.author	Demiroğlu, Cenk
dc.contributor.author	Buyuk, O.
dc.contributor.author	Khodabakhsh, Ali
dc.contributor.author	Maia, R.
dc.date.accessioned	2017-06-17T13:54:23Z
dc.date.available	2017-06-17T13:54:23Z
dc.date.issued	2017-06
dc.identifier.issn	1932-4553
dc.identifier.uri	http://ieeexplore.ieee.org/document/7862791/
dc.identifier.uri	http://hdl.handle.net/10679/5372
dc.description	Due to copyright restrictions, the access to the full text of this article is only available via subscription.
dc.description.abstract	State-of-the-art speaker verification systems are vulnerable to spoofing attacks. To address the issue, high-performance synthetic speech detectors (SSDs) for existing spoofing methods have been proposed. Phase-based SSDs that exploit the fact that most of the parametric speech coders use minimum-phase filters are particularly successful when synthetic speech is generated with a parametric vocoder. Here, we propose a new attack strategy to spoof phase-based SSDs with the objective of increasing the security of voice verification systems by enabling the development of more generalized SSDs. As opposed to other parametric vocoders, the complex cepstrum approach uses mixed-phase filters, which makes it an ideal candidate for spoofing the phase-based SSDs. We propose using a complex cepstrum vocoder as a postprocessor to existing techniques to spoof the speaker verification system as well as the phase-based SSDs. Once synthetic speech is generated with a speech synthesis or a voice conversion technique, for each synthetic speech frame, a natural frame is selected from a training database using a spectral distance measure. Then, complex cepstrum parameters of the natural frame are used for resynthesizing the synthetic frame. In the proposed method, complex cepstrum-based resynthesis is used as a postprocessor. Hence, it can be used in tandem with any synthetic speech generator. Experimental results showed that the approach is successful at spoofing four phase-based SSDs across nine parametric attack algorithms. Moreover, performance at spoofing the speaker verification system did not substantially degrade compared to the case when no postprocessor is employed.
dc.description.sponsorship	TÜBİTAK
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.relation	info:turkey/grantAgreement/TUBITAK/112E160
dc.relation	info:turkey/grantAgreement/TUBITAK/115E803
dc.relation.ispartof	IEEE Journal of Selected Topics in Signal Processing
dc.rights	restrictedAccess
dc.title	Postprocessing synthetic speech with a complex cepstrum vocoder for spoofing phase-based synthetic speech detectors	en_US
dc.type	Article	en_US
dc.peerreviewed	yes
dc.publicationstatus	published	en_US
dc.contributor.department	Özyeğin University
dc.contributor.authorID	(ORCID 0000-0002-6160-3169 & YÖK ID 144947) Demiroğlu, Cenk
dc.contributor.ozuauthor	Demiroğlu, Cenk
dc.identifier.volume	11
dc.identifier.issue	4
dc.identifier.startpage	671
dc.identifier.endpage	683
dc.identifier.wos	WOS:000401343600008
dc.identifier.doi	10.1109/JSTSP.2017.2673807
dc.subject.keywords	Spoofing
dc.subject.keywords	Speaker verification
dc.subject.keywords	Synthetic speech detection
dc.subject.keywords	Complex cepstrum
dc.subject.keywords	Speech synthesis
dc.subject.keywords	Voice conversion
dc.identifier.scopus	SCOPUS:2-s2.0-85021700270
dc.contributor.ozugradstudent	Khodabakhsh, Ali
dc.contributor.authorMale	2