Product Information

C-DAC Logo


The system consists of three main modules; Signal processing, Training and Testing. Firstly, for both training and testing purpose, the digitized speech is sent for Acoustic Feature Extraction and Voicing Detection. The feature vectors are composed of 12 lowest Mel-Frequency Cepstral Coefficients (MFCC). Then a pitch extraction procedure is used to extract MFCC features frames only for the detected voiced regions, which are then used for speaker data training and testing purposes. In between, a pitch data frequency distribution is also prepared for each speaker, within normal human voicing range (generally 80 to 420 Hz). Later, in testing, this distribution is used for Pitch Based Dynamic Pruning (PBDP) of unlikely speakers before Matching.

During training process, individual speaker models or Codebooks are created by clustering the training feature vectors into few numbers of related clusters known as Code-vectors using the well known unsupervised Vector Quantization based clustering algorithm. Then weights are assigned to all the code-vectors by using a Speaker Discriminative Weighting scheme, such that code-vectors having higher discriminating power are assigned with the larger weights and vice-versa. In testing module, firstly a list of Most Likely speakers is created by PBDP. Then final matching scores are calculated between those speakers’ weighted codebooks and the voiced MFCC frames of the test speech signal. The codebook that maximizes the similarity measure (with highest matching score) is the best matching codebook and hence is the identified Speaker.

Use Cases
  • Voice biometry based Office attendance
  • Remote vote casting via telephone calls
  • E-commerce (purchase of goods)
  • Secured access to mobiles, handhelds
  • Door Access Control in smart homes
Salient Features
  • Easy to use: speech is behavioral biometric, easily available, user friendly and less intrusive
  • High acceptability: low cost, less storage space, compact for small electronic devices/handhelds
  • Text & language Independent: no specific text, accepts any valid utterance of varying length in any language
  • Less interaction time: performs well with only 2 minutes of enrolment speech and 5 sec of test speech.

Technical Specifications

  • Application uses Voice Biometric, i.e. no need to carry keys/badges/access cards or remember passwords / PINs.
  • Speech is remotely accessible, so same technology can be used for remote authentication via telephone.
  • Method is scalable for recognizing multiple speakers or verifying same speaker across audios of different languages.



Platform Required (if any)


  • Microsoft Windows XP professional and above

For remote access

  • Linux Operating System
  • Asterisk Gateway Interface (AGI)
  • PHP
  • MySQL


  • Standard desktops one good quality noise cancelling microphone

For remote access

  • ISDN-PRI / E1 Channel
  • Asterisk Server


Contact Details for Techno Commercial Information

Advance Signal Processing Group, Speech Processing Section

Mr. Joyanta Basu

Email: joyanta.basu[at]cdac[dot]in

Phone No.: 033-2357-9846, Ext: 226 (O)