11-751 Fall 2008: Speech Recognition and Understanding

  • Instructors:
    • Alex Waibel (waibel@cs.cmu.edu) Office: Room 203, 407 S. Craig Street
    • Ian Lane (ianlane@cs.cmu.edu) Office: Room 221, 407 S. Craig Street
    • Wilson Tam (yct@cs.cmu.edu) Office: Room 209, 407 S. Craig Street [Teaching assistant]
  • Times: Monday and Wednesday, 4:30pm - 5:50pm
  • Place: Doherty Hall 1209
  • First Class: Monday August 25th
  • Grading: Homework 30%, Exam 40%, Term Project 30%
  • Final: Friday, December 12, Newell-Simon Hall 3002, 10:00-13:00

11-751 Term-Projects


Course Description

The technology to allow humans to communicate with machines by speech and the technology to enable machines to understand when humans communicate with each other is rapidly maturing. This course provides an introduction to the theoretical background as well as the experimental practice that has made the field what it is today. We will cover theoretical foundations, essential algorithms, major approaches, experimental strategies and current state-of-the-art systems and will introduce the participants to ongoing work in representation, algorithms and interface design. The course will be completed by a brief overview of multilingual speech recognition dealing with various languages.

This course is primarily for graduate students in LTI, CS, Robotics, ECE, HCI, Psychology, or Computational Linguistics. Others by prior permission of instructor. No prior experience with speech recognition is necessary. The course is suitable for graduate students with some background in computer science and electrical engineering, as well as for advanced undergraduates.

The course involves written and programming assignments. Some reading of papers may also be required.


Course Schedule

DateTopic Required Reading
M Aug 25Introduction slides
W Aug 27ASR: The Big Picture slides[1]
M Sep 1 No Class (Labor Day)
W Sep 3 Template Based Recognition, DTW slides[2 (optional), 3]
M Sep 8 Signal Processing I slides [4]
W Sep 10Signal Processing II slides
M Sep 15Pattern Recognition and Classification slides
W Sep 17Pattern Recognition and Classification II slides (Introduction to Term Projects)
M Sep 22No Class (Interspeech 2008)
W Sep 24No Class (Interspeech 2008)
M Sep 29Speaker Identification and Classification slides [9 (optional), 10]
W Oct 1 Hidden Markov Models I slides [5,6]
M Oct 6 Hidden Markov Models II slides [convergence proof]
W Oct 8 Acoustic Modeling I slides
M Oct 13Acoustic Modeling II slides
W Oct 15 No Class
M Oct 20Language Modeling I slides [7,8]
W Oct 22Language Modeling II slides
M Oct 27Search I slides
W Oct 29Search II slides
M Nov 3 No Class
W Nov 5 Special Issues in Speech Recognition: Discriminative Training slides
M Nov 10Special Issues in Speech Recognition: Adaptation slides
W Nov 12Spoken Language Understanding slides
M Nov 17Weighted Finite State Transducers for ASR slides
W Nov 19No Class
M Nov 24Spoken Language Translation slides
W Nov 26Thanksgiving Holiday
M Dec 1 Applications of Speaker Recognition slides
W Dec 3 Recap and Q&A slides
M Dec 8 No Class (term projects and exam preparation)
W Dec 10No Class (term projects and exam preparation)
F Dec 12Final Exam: Newell-Simon Hall 3002 (10:00 – 13:00)
M Dec 15Student Project Presentations
T Dec 16Term Project Report Due

Required Reading

  • [1] Steve Young, "Large vocabulary continuous speech recognition: A review," Tech. Rep., Engineering Dept., Cambridge Univ. Cambridge, U.K., 1996.
  • [2] Hiroaki Sakoe and Seibi Chiba, "Dynamic Programming Algorithm Optimization for Spoken Word Recognition," IEEE Transcations on Acoustics, Speech, Signal Processing vol. 26 (1), pp. 43-49, February, 1978.
  • [3] Hermann Ney, "The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition," IEEE Transcations on Acoustics, Speech, Signal Processing vol. 32 (2), pp. 263-271, April, 1984.
  • [4] Ronald Schafer and Lawrence Rabiner, "Digital Representations of Speech Signals," Proc. IEEE vol 63(4) 662-667, April, 1975.
  • [5] Lawrence Rabiner, “Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition,” In Proc. IEEE, vol. 77, no. 2, pp. 257-285, 1989.
  • [6] Biing-Hwang Juang and Lawrence Rabiner, “Hidden Markov models for speech recognition,” Technometrics, vol. 33, pp. 251–272, Aug. 1991.
  • [7] R. Rosenfeld, “Two decades of statistical language modeling: Where do we go from here?” presented at the Workshop-2000 Spoken Lang. Reco. Understanding, Summit, NJ, Feb. 2000.
  • [8] Jerome Bellegarda, "Statistical language model adaptation: Review and perspectives," Speech Communication, vol. 42, pp. 93-108, 2004.
  • [9] G. R. Doddington, "Speaker recognition - Identifying people by their voices," In Proc. IEEE, vol. 73, pp. 1651–1664, Nov. 1985.
  • [10] D. Reynolds, "An overview of automatic speaker recognition technology," In Proc. IEEE ICASSP, 2002.

Textbooks and References

  • Xuedong Huang, Alex Acero and Hsiao-wuen Hon, Spoken Language Processing, Prentice Hall PTR, NJ, 2001
  • Jelinek, Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA, 1997
  • Rabiner and Juang, Fundamentals of Speech Recognition, Prentice Hall Signal Processing Series, Englewood Cliffs, NJ, 1993
  • Rabiner and Schafer, Digital Processing of Speech Signals, Prentice-Hall PTR, NJ, 1978
  • Waibel and Lee, Readings in Speech Recognition, Morgan Kaufman Publishers, San Mateo, CA, 1990

Homework Assignment

Attachments