Syllabus  |   Lectures  |   Downloads  |   FAQ  |   Ask a question  |  
Course Co-ordinated by IIT Kanpur
Coordinators
 
Dr. Rajesh M. Hegde
IIT Kanpur

 

Download Syllabus in PDF format



Untitled Document

  1. The Speech Production mechanism.

    1. Physiological and Mathematical Model.

    2. Relating the physiological and mathematical model.

    3. Categorization of Speech Sounds based on the source-system and the articulatory model.

  2. Basic Speech Signal Processing Concepts.

    1. Discrete time speech signals, relevant properties of the fast Fourier transform and Z-transform for speech recognition, convolution, linear and non linear filter banks.

    2. Spectral estimation of speech using the Discrete Fourier transform.

    3. Pole-zero modeling of speech and linear prediction (LP) analysis of speech.

    4. Homomorphic speech signal de convolution, real and complex cepstrum, application of cepstral analysis to speech signals.

  3. The Speech Recognition Front End.

    1. Feature extraction for speech recognition, Static and dynamic features for speech recognition, robustness issues, discrimination in the feature space, feature selection.

    2. Mel frequency cepstral co-efficients (MFCC), Linear prediction cepstral coefficients (LPCC), Perceptual LPCC.

  4. Distance measures for comparing speech patterns.

    1. Log spectral distance, cepstral distances, weighted cepstral distances, distances for linear and warped scales.

    2. Dynamic Time Warping for Isolated Word Recognition.

  5. Statistical models for speech recognition.

    1. Vector quantization models and applications in speaker recognition.

    2. Gaussian mixture modeling for speaker and speech recognition.

    3. Discrete and Continuous Hidden Markov modeling for isolated word and continuous speech recognition.

  6. Using the HTK toolkit for building a simple speech recognition system.

 

Sl. No.

Lecture Topic

No.of Hours

1

The Speech Production mechanism.

  1. Physiological and Mathematical Model.

  2. Relating the physiological and mathematical model.

  3. Categorization of Speech Sounds based on the source-system and the articulatory model.

05

2

Speech Signal Processing Concepts.

  1. Discrete time speech signals, relevant properties of the fast Fourier transform and Z-transform for speech recognition, convolution, linear and non linear filter banks.

  2. Spectral estimation of speech using the Discrete Fourier transform.

  3. Pole-zero modeling of speech and linear prediction (LP) analysis of speech.

  4. Homomorphic speech signal de convolution, real and complex cepstrum, application of cepstral analysis to speech signals.

10

3

The Speech Recognition Front End.

  1. Feature extraction for speech recognition, Static and dynamic features for speech recognition, robustness issues, discrimination in the feature space, feature selection.

  2. Mel frequency cepstral co-efficients (MFCC), Linear prediction cepstral coefficients (LPCC), Perceptual LPCC.

06

4

Distance measures for comparing speech patterns : Log spectral distance, cepstral distances, weighted cepstral distances, distances for linear and warped scales.

03

5

Dynamic Time Warping for Isolated Word Recognition.

03

6

Statistical models for speech recognition.

  1. Vector quantization models and applications in speaker recognition. 

  2. Gaussian mixture modeling for speaker and speech recognition.

  3. Discrete and Continuous Hidden Markov modeling for isolated word and continuous speech recognition.

10

7

Using the HTK toolkit for building a simple speech recognition system.

03

 

Total

40

  1. There are no mandatory pre-requisites for the course.

  2. It is designed to be self contained. However a course on Signals and Systems and Digital Signal Processing will come in handy for  appreciating the course.


  1. Discrete-Time Speech Signal Processing: Principles and Practice,  Thomas F. Quatieri,  Cloth, 816 pp.  ISBN: 013242942X Published: OCT 29, 2001.

  2. Fundamentals of Speech Recognition, L. Rabiner and B. Juang,     Prentice-Hall SignalProcessing Series, Pages: 507, Year of Publication: 1993, ISBN:0-13-015157-2.  

  3. Speech and Audio Signal Processing: Processing and perception of speech and music    B. Gold and N. Morgan, Wiley 2000, ISBN: 0-471-35154-7.   

  4. Corpus-Based Methods in Language and Speech Processing, Steve Young et. al editors, 234 pages, Kluwer, ISBN  0-7923-4463-4.

  5. Discrete Time Processing of Speech Signals, JR Deller, JG Proakis, JH Hansen,    Year of Publication: 1993, ISBN:0023283017.

  6. Hidden Markov Models for Speech Recognition, XD Huang, Y Ariki, MA Jack, Edinburgh University Press.

  7. Digital Processing of Speech Signals, LR Rabiner and RW Schafer, Pearson Education.


  1. The HTK toolkit for speech recognition
        http://htk.eng.cam.ac.uk/.

  2. The Sphinx toolkit for speech recognition
        http://cmusphinx.sourceforge.net/html/cmusphinx.php.




Important: Please enable javascript in your browser and download Adobe Flash player to view this site
Site Maintained by Web Studio, IIT Madras. Contact Webmaster: nptel@iitm.ac.in