Publication:
Postprocessing synthetic speech with a complex cepstrum vocoder for spoofing phase-based synthetic speech detectors

dc.contributor.authorDemiroğlu, Cenk
dc.contributor.authorBuyuk, O.
dc.contributor.authorKhodabakhsh, Ali
dc.contributor.authorMaia, R.
dc.contributor.departmentElectrical & Electronics Engineering
dc.contributor.ozuauthorDEMİROĞLU, Cenk
dc.contributor.ozugradstudentKhodabakhsh, Ali
dc.date.accessioned2017-06-17T13:54:23Z
dc.date.available2017-06-17T13:54:23Z
dc.date.issued2017-06
dc.descriptionDue to copyright restrictions, the access to the full text of this article is only available via subscription.
dc.description.abstractState-of-the-art speaker verification systems are vulnerable to spoofing attacks. To address the issue, high-performance synthetic speech detectors (SSDs) for existing spoofing methods have been proposed. Phase-based SSDs that exploit the fact that most of the parametric speech coders use minimum-phase filters are particularly successful when synthetic speech is generated with a parametric vocoder. Here, we propose a new attack strategy to spoof phase-based SSDs with the objective of increasing the security of voice verification systems by enabling the development of more generalized SSDs. As opposed to other parametric vocoders, the complex cepstrum approach uses mixed-phase filters, which makes it an ideal candidate for spoofing the phase-based SSDs. We propose using a complex cepstrum vocoder as a postprocessor to existing techniques to spoof the speaker verification system as well as the phase-based SSDs. Once synthetic speech is generated with a speech synthesis or a voice conversion technique, for each synthetic speech frame, a natural frame is selected from a training database using a spectral distance measure. Then, complex cepstrum parameters of the natural frame are used for resynthesizing the synthetic frame. In the proposed method, complex cepstrum-based resynthesis is used as a postprocessor. Hence, it can be used in tandem with any synthetic speech generator. Experimental results showed that the approach is successful at spoofing four phase-based SSDs across nine parametric attack algorithms. Moreover, performance at spoofing the speaker verification system did not substantially degrade compared to the case when no postprocessor is employed.
dc.description.sponsorshipTÜBİTAK
dc.identifier.doi10.1109/JSTSP.2017.2673807
dc.identifier.endpage683
dc.identifier.issn1932-4553
dc.identifier.issue4
dc.identifier.scopus2-s2.0-85021700270
dc.identifier.startpage671
dc.identifier.urihttp://hdl.handle.net/10679/5372
dc.identifier.volume11
dc.identifier.wos000401343600008
dc.language.isoengen_US
dc.peerreviewedyes
dc.publicationstatuspublisheden_US
dc.publisherIEEEen_US
dc.relationinfo:turkey/grantAgreement/TUBITAK/112E160
dc.relationinfo:turkey/grantAgreement/TUBITAK/115E803
dc.relation.ispartofIEEE Journal of Selected Topics in Signal Processing 
dc.rightsinfo:eu-repo/semantics/restrictedAccess
dc.subject.keywordsSpoofing
dc.subject.keywordsSpeaker verification
dc.subject.keywordsSynthetic speech detection
dc.subject.keywordsComplex cepstrum
dc.subject.keywordsSpeech synthesis
dc.subject.keywordsVoice conversion
dc.titlePostprocessing synthetic speech with a complex cepstrum vocoder for spoofing phase-based synthetic speech detectorsen_US
dc.typeArticleen_US
dspace.entity.typePublication
relation.isOrgUnitOfPublication7b58c5c4-dccc-40a3-aaf2-9b209113b763
relation.isOrgUnitOfPublication.latestForDiscovery7b58c5c4-dccc-40a3-aaf2-9b209113b763

Files

Collections