11-751 Fall 2008

Speech Recognition and Understanding

  • Instructors:
    • Alex Waibel (waibel@cs.cmu.edu) Office: Room 203, 407 S. Craig Street
    • Ian Lane (ianlane@cs.cmu.edu) Office: Room 221, 407 S. Craig Street
    • Teaching assistant: Wilson Tam (yct@cs.cmu.edu) Office: Room 209, 407 S. Craig Street
  • Times: Monday and Wednesday, 4:30pm - 5:50pm
  • Place: Doherty Hall 1209
  • First Class: Monday August 25th
  • Grading: Homework 30%, Exam 40%, Term Project 30%
  • Final: TBA

Course Description

The technology to allow humans to communicate with machines by speech and the technology to enable machines to understand when humans communicate with each other is rapidly maturing. This course provides an introduction to the theoretical background as well as the experimental practice that has made the field what it is today. We will cover theoretical foundations, essential algorithms, major approaches, experimental strategies and current state-of-the-art systems and will introduce the participants to ongoing work in representation, algorithms and interface design. The course will be completed by a brief overview of multilingual speech recognition dealing with various languages.

This course is primarily for graduate students in LTI, CS, Robotics, ECE, HCI, Psychology, or Computational Linguistics. Others by prior permission of instructor. No prior experience with speech recognition is necessary. The course is suitable for graduate students with some background in computer science and electrical engineering, as well as for advanced undergraduates.

The course involves written and programming assignments. Some reading of papers may also be required.


Course Schedule

DateTopic Required Reading
M Aug 25Introduction slides
W Aug 27ASR: The Big Picture slides[1]
M Sep 1 No Class (Labor Day)
W Sep 3 Template Based Recognition, DTW slides[2 (optional), 3]
M Sep 8 Signal Processing I slides [4]
W Sep 10Signal Processing II slides
M Sep 15Pattern Recognition and Classification slides
W Sep 17Introduction to Term Projects slides (project intro.)
M Sep 22No Class (Interspeech 2008)
W Sep 24No Class (Interspeech 2008)
M Sep 29Speaker Identification and Classification slides [9 (optional), 10]
W Oct 1 Hidden Markov Models I slides [5,6]
M Oct 6 Hidden Markov Models II slides [convergence proof]
W Oct 8 Acoustic Modeling I slides
M Oct 13Acoustic Modeling II
W Oct 15 No Class
M Oct 20Language Modeling I [7,8]
W Oct 22Language Modeling II
M Oct 27Search I
W Oct 29Search II
M Nov 3 Special Issues in Speech Recognition: Adaptation
W Nov 5 Special Issues in Speech Recognition: Discriminative Training
M Nov 10Weighted Finite State Transducers for ASR
W Nov 12No Class
M Nov 17Natural Language Processing and Spoken Dialog Systems
W Nov 19Media Archiving
M Nov 24Spoken Language Translation
W Nov 26No Class (Thanksgiving Holiday)
M Dec 1 Recap and Q&A
W Dec 3 Student Project Presentations I
M Dec 8 Student Project Presentations II
W Dec 10
F Dec 12Final Exam
M Dec 15Final Report

Required Reading

  • [1] Steve Young, "Large vocabulary continuous speech recognition: A review," Tech. Rep., Engineering Dept., Cambridge Univ. Cambridge, U.K., 1996.
  • [2] Hiroaki Sakoe and Seibi Chiba, "Dynamic Programming Algorithm Optimization for Spoken Word Recognition," IEEE Transcations on Acoustics, Speech, Signal Processing vol. 26 (1), pp. 43-49, February, 1978.
  • [3] Hermann Ney, "The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition," IEEE Transcations on Acoustics, Speech, Signal Processing vol. 32 (2), pp. 263-271, April, 1984.
  • [4] Ronald Schafer and Lawrence Rabiner, "Digital Representations of Speech Signals," Proc. IEEE vol 63(4) 662-667, April, 1975.
  • [5] Lawrence Rabiner, “Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition,” In Proc. IEEE, vol. 77, no. 2, pp. 257-285, 1989.
  • [6] Biing-Hwang Juang and Lawrence Rabiner, “Hidden Markov models for speech recognition,” Technometrics, vol. 33, pp. 251–272, Aug. 1991.
  • [7] R. Rosenfeld, “Two decades of statistical language modeling: Where do we go from here?” presented at the Workshop-2000 Spoken Lang. Reco. Understanding, Summit, NJ, Feb. 2000.
  • [8] Jerome Bellegarda, "Statistical language model adaptation: Review and perspectives," Speech Communication, vol. 42, pp. 93-108, 2004.
  • [9] G. R. Doddington, "Speaker recognition - Identifying people by their voices," In Proc. IEEE, vol. 73, pp. 1651–1664, Nov. 1985.
  • [10] D. Reynolds, "An overview of automatic speaker recognition technology," In Proc. IEEE ICASSP, 2002.

Textbooks and References

  • Xuedong Huang, Alex Acero and Hsiao-wuen Hon, Spoken Language Processing, Prentice Hall PTR, NJ, 2001
  • Jelinek, Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA, 1997
  • Rabiner and Juang, Fundamentals of Speech Recognition, Prentice Hall Signal Processing Series, Englewood Cliffs, NJ, 1993
  • Rabiner and Schafer, Digital Processing of Speech Signals, Prentice-Hall PTR, NJ, 1978
  • Waibel and Lee, Readings in Speech Recognition, Morgan Kaufman Publishers, San Mateo, CA, 1990

Homework Assignment

Attachments