Comparison of text-independent speaker verification systems in a multi-class, semi-automatic detection scenario
Type : Master's thesis
Publication Status : unpublished
Access : restrictedAccess
Performance of the speaker veri cation systems is typically measured based on their binary decision accuracy. Soft outputs of the systems are used mostly for calibration or multiple system combination purposes. However, in speaker veri cation applications where close to 100% accuracy is required, such as the systems that are used in the call centers of nance companies, it is not possible to rely on the binary decisions of the existing veri cation systems. Still, in such cases, multi-class veri cation outputs (for example, high, medium and low veri cation score) returned by the speaker veri cation systems can be used by a human agent to either reduce the veri cation time and/or increase the veri cation accuracy compared to a human-only scenario. In this thesis, an overview of a speaker veri cation system is given explaining in detail the algorithms that are implemented. Particularly the details about a classi- er, GDA, which was rstly used by us for a veri cation purpose are given. It does relatively better job than state of the art algorithms for non-linear data like in our case. In the experiments section, some of the most popular speaker veri cation systems are compared in terms of the classical performance metric used in the literature. Then, multi-class output performance of them is compared when a human agent is assumed to be in the veri cation loop. Performance is measured by the reduction in the number of questions used by the human agent for verifying the identity of the caller without compromising the security. Experiments are performed using the NIST 2006 and 2008 databases. Eight and one conversation sides (5 minutes each) enrollment data and 1 side and 10 seconds veri cation data conditions are used.
Date : 2013-06
Share this page