Speaker recognition is the task of automatically recognizing who is speaking by identifying an unknown speaker among several known speakers using speaker-specific information included in speech waves. Speaker Recognition system exists any time speakers are unknown and their identities are important. It makes the machine identification of participants in meetings, conferences, or conversations possible.

Speaker recognition task can be text-independent and text-dependent. By text-independent, we mean that the recognition procedure should work for any text in either training or testing. This is different problem than text-dependent recognition, where the text in both training and testing is the same or is known. Speaker recognition also can be classified into two further categories, close-set and open-set problems. The close-set problem is to identify a speaker from a set of N known speakers. While open-set problem is to decide whether the speaker of an unknown testing utterance belongs to a set of N speakers. There are two basic tasks in Speaker Recognition: Speaker Identification and Speaker Verification. For Speaker Identification the system should decide the unknown speaker's identity among N known speakers while for speaker verification the system should decide whether the unknown speaker's identity is the right as his claim. It is a binary decision problem (accept or reject). And Speaker Verification can also be thought as a special case of the open-set problem.

Feature extraction is the front-end of the speaker recognition system. The goal is to obtain descriptions or models of a speaker's patterns in feature space which can be used to identify the speaker of a test utterance. Features that are often used in the literature of speaker recognition include: Mel Frequency Cepstrum (MFCC), Linear Prediction Coefficient Ceptrum (LPCCEP), Delta cepstrum etc.


Click on the image to get a closer look

The speaker recognition task falls under the general problem of pattern classification. The Maximum A Posteriori (MAP) probability techniques are applied to speaker recognition. Both the nonparametric and parametric models are used in speaker recognition tasks. Nearest neighbor and vector quantization modeling are most common nonparametric models used in speaker recognition tasks. Guassian Mixture Models (GMM) are the representative parametric models and widely used in the speaker recognition tasks.

In the area of the speaker identification, we are interested in identifying speakers from free headset microphones. Many current speaker ID systems require user to use a close talking microphone, which is not realistic for many applications such as a meeting with many participants. We have developed a speaker ID system for the meeting room application (reference: MULTIMODAL MEETING TRACKER PDF). The system has been combined with other modalities, such as face recognition and color appearance identification, to identify meeting participants (reference: MULTIMODAL PEOPLE ID FOR A MULTIMEDIA MEETING BROWSER PDF). Our current research focus is to address challenges caused by open environment, the distance requirement and the non-cooperativeness of the speaker. We are approaching problems from both signal processing and speaker identification viewpoints. Initial experimental results are very encouraging. The work will be published soon.

We have a Reading Group of Speech Recognition and Multimodal People Identification.

Useful Link :

to topback to top
Site maintained by: Céline Morel