11-751 Fall 2008: Speech Recognition and Understanding
- Instructors:
- Alex Waibel (waibel@cs.cmu.edu) Office: Room 203, 407 S. Craig Street
- Ian Lane (ianlane@cs.cmu.edu) Office: Room 221, 407 S. Craig Street
- Wilson Tam (yct@cs.cmu.edu) Office: Room 209, 407 S. Craig Street [Teaching assistant]
- Times: Monday and Wednesday, 4:30pm - 5:50pm
- Place: Doherty Hall 1209
- First Class: Monday August 25th
- Grading: Homework 30%, Exam 40%, Term Project 30%
- Final: Friday, December 12, Newell-Simon Hall 3002, 10:00-13:00
11-751 Term-Projects
Course Description
The technology to allow humans to communicate with machines by speech and the technology to enable machines to understand when humans communicate with each other is rapidly maturing. This course provides an introduction to the theoretical background as well as the experimental practice that has made the field what it is today. We will cover theoretical foundations, essential algorithms, major approaches, experimental strategies and current state-of-the-art systems and will introduce the participants to ongoing work in representation, algorithms and interface design. The course will be completed by a brief overview of multilingual speech recognition dealing with various languages.
This course is primarily for graduate students in LTI, CS, Robotics, ECE, HCI, Psychology, or Computational Linguistics. Others by prior permission of instructor. No prior experience with speech recognition is necessary. The course is suitable for graduate students with some background in computer science and electrical engineering, as well as for advanced undergraduates.
The course involves written and programming assignments. Some reading of papers may also be required.
Course Schedule
| Date | Topic | Required Reading | |
| M Aug 25 | Introduction | slides | |
| W Aug 27 | ASR: The Big Picture | slides | [1] |
| M Sep 1 | |||
| W Sep 3 | Template Based Recognition, DTW | slides | [2 (optional), 3] |
| M Sep 8 | Signal Processing I | slides | [4] |
| W Sep 10 | Signal Processing II | slides | |
| M Sep 15 | Pattern Recognition and Classification | slides | |
| W Sep 17 | Pattern Recognition and Classification II | slides (Introduction to Term Projects) | |
| M Sep 22 | |||
| W Sep 24 | |||
| M Sep 29 | Speaker Identification and Classification | slides | [9 (optional), 10] |
| W Oct 1 | Hidden Markov Models I | slides | [5,6] |
| M Oct 6 | Hidden Markov Models II | slides [convergence proof] | |
| W Oct 8 | Acoustic Modeling I | slides | |
| M Oct 13 | Acoustic Modeling II | slides | |
| W Oct 15 | | ||
| M Oct 20 | Language Modeling I | slides | [7,8] |
| W Oct 22 | Language Modeling II | slides | |
| M Oct 27 | Search I | slides | |
| W Oct 29 | Search II | slides | |
| M Nov 3 | |||
| W Nov 5 | Special Issues in Speech Recognition: Discriminative Training | slides | |
| M Nov 10 | Special Issues in Speech Recognition: Adaptation | slides | |
| W Nov 12 | Spoken Language Understanding | slides | |
| M Nov 17 | Weighted Finite State Transducers for ASR | slides | |
| W Nov 19 | |||
| M Nov 24 | Spoken Language Translation | slides | |
| W Nov 26 | |||
| M Dec 1 | Applications of Speaker Recognition | slides | |
| W Dec 3 | Recap and Q&A | slides | |
| M Dec 8 | |||
| W Dec 10 | |||
| F Dec 12 | Final Exam: Newell-Simon Hall 3002 (10:00 – 13:00) | ||
| M Dec 15 | Student Project Presentations | ||
| T Dec 16 | Term Project Report Due |
Required Reading
- [1] Steve Young, "Large vocabulary continuous speech recognition: A review," Tech. Rep., Engineering Dept., Cambridge Univ. Cambridge, U.K., 1996.
- [2] Hiroaki Sakoe and Seibi Chiba, "Dynamic Programming Algorithm Optimization for Spoken Word Recognition," IEEE Transcations on Acoustics, Speech, Signal Processing vol. 26 (1), pp. 43-49, February, 1978.
- [3] Hermann Ney, "The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition," IEEE Transcations on Acoustics, Speech, Signal Processing vol. 32 (2), pp. 263-271, April, 1984.
- [4] Ronald Schafer and Lawrence Rabiner, "Digital Representations of Speech Signals," Proc. IEEE vol 63(4) 662-667, April, 1975.
- [5] Lawrence Rabiner, “Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition,” In Proc. IEEE, vol. 77, no. 2, pp. 257-285, 1989.
- [6] Biing-Hwang Juang and Lawrence Rabiner, “Hidden Markov models for speech recognition,” Technometrics, vol. 33, pp. 251–272, Aug. 1991.
- [7] R. Rosenfeld, “Two decades of statistical language modeling: Where do we go from here?” presented at the Workshop-2000 Spoken Lang. Reco. Understanding, Summit, NJ, Feb. 2000.
- [8] Jerome Bellegarda, "Statistical language model adaptation: Review and perspectives," Speech Communication, vol. 42, pp. 93-108, 2004.
- [9] G. R. Doddington, "Speaker recognition - Identifying people by their voices," In Proc. IEEE, vol. 73, pp. 1651–1664, Nov. 1985.
- [10] D. Reynolds, "An overview of automatic speaker recognition technology," In Proc. IEEE ICASSP, 2002.
Textbooks and References
- Xuedong Huang, Alex Acero and Hsiao-wuen Hon, Spoken Language Processing, Prentice Hall PTR, NJ, 2001
- Jelinek, Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA, 1997
- Rabiner and Juang, Fundamentals of Speech Recognition, Prentice Hall Signal Processing Series, Englewood Cliffs, NJ, 1993
- Rabiner and Schafer, Digital Processing of Speech Signals, Prentice-Hall PTR, NJ, 1978
- Waibel and Lee, Readings in Speech Recognition, Morgan Kaufman Publishers, San Mateo, CA, 1990
Homework Assignment
- [Assignment one] Due on September 17th before class
- [Assignment two] [solution] Due on October 15th before class
- [Assignment three] [LM datasets] [SRILM download] [SRILM manual] Due on November 5th before class
- Use ngram-count for LM estimation
- Use ngram for perplexity evaluation
- [solution]
- [Assignment four] [word lattice for problem 4] Due on December 1st before class
- [solution]
Attachments
- 08-25-2008.pdf (2.0 MB) -
Slides from lecture 1
, added by ianlane on 08/26/08 10:45:16. - Young_1996.pdf (265.1 kB) -
Large vocabulary continuous speech recognition: A review
, added by ianlane on 08/27/08 12:59:47. - Sakoe_1978.pdf (5.3 MB) -
Dynamic Programming Algorithm Optimization for Spoken Word Recognition
, added by ianlane on 08/27/08 13:00:39. - Rabiner_1989.pdf (2.2 MB) -
Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition
, added by ianlane on 08/27/08 13:17:49. - Juang_1991.pdf (2.5 MB) -
Hidden Markov models for speech recognition
, added by ianlane on 08/27/08 13:18:14. - Bellegarda_2004.pdf (344.8 kB) -
Statistical language model adaptation: Review and perspectives
, added by ianlane on 08/27/08 13:19:04. - Rosenfeld_2000.pdf (93.9 kB) -
Two decades of statistical language modeling: Where do we go from here?
, added by ianlane on 08/27/08 13:19:51. - 08-27-2008.pdf (421.5 kB) -
Slides from lecture 2
, added by ianlane on 08/31/08 16:21:49. - Ney_1984.pdf (7.0 MB) -
The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition
, added by ianlane on 09/03/08 10:24:19. - 09-03-2008.pdf (121.9 kB) - added by yct on 09/03/08 19:19:45.
- Schafer_1975.pdf (12.4 MB) -
Digital Representations of Speech Signals
, added by ianlane on 09/07/08 20:35:24. - assignment1.pdf (92.7 kB) -
assignment one
, added by yct on 09/08/08 11:22:46. - 09-08-2008.pdf (0.9 MB) -
Lecture 4: Signal processing for speech
, added by yct on 09/08/08 19:21:49. - 09-10-2008.pdf (2.6 MB) -
robust speech recognition
, added by yct on 09/13/08 12:43:48. - 09-15-2008.pdf (0.7 MB) -
Lecture 5: Pattern Recognition and Classification
, added by ianlane on 09/16/08 22:32:00. - 09-17-2008.pdf (331.7 kB) -
Lecture 6: Pattern Recognition and Classification
, added by ianlane on 09/17/08 23:44:02. - Doddington_1985.pdf (1.7 MB) -
Speaker Recognition - Identifying People by their Voices
, added by ianlane on 09/17/08 23:55:48. - Reynolds_2002.pdf (87.4 kB) -
An Overview of Automatic Speaker Recognition Technology
, added by ianlane on 09/17/08 23:56:07. - 09-29-2008.pdf (1.5 MB) -
speaker recognition and verification
, added by yct on 09/30/08 13:16:39. - hwk2.pdf (47.6 kB) -
homework 2
, added by yct on 10/06/08 14:24:02. - Project_Introduction.pdf (308.6 kB) -
Term Project Introduction (09-17-2008)
, added by ianlane on 10/06/08 15:24:30. - hmm2.pdf (96.0 kB) -
hidden markov model (II)
, added by yct on 10/06/08 21:13:30. - hmm1.pdf (107.5 kB) -
hidden Markov model (I)
, added by yct on 10/06/08 21:13:53. - em.pdf (26.0 kB) -
convergence proof for baum-welch training
, added by yct on 10/07/08 14:11:38. - 10-08-2008.pdf (409.8 kB) -
Lecture 11: Acoustic Modeling I
, added by ianlane on 10/08/08 19:44:38. - swb_data.tgz (399.8 kB) -
data sets for assignment 3 on LM
, added by yct on 10/16/08 18:53:17. - 10-13-2008.pdf (0.7 MB) -
Lecture 12: Acoustic Modeling II
, added by ianlane on 10/19/08 06:47:57. - 10-20-2008.pdf (161.7 kB) -
LM (part 1)
, added by yct on 10/20/08 20:48:59. - 10-22-2008.pdf (145.3 kB) -
LM (part 2)
, added by yct on 10/22/08 19:03:50. - 10-27-2008.pdf (214.6 kB) -
search (part 1)
, added by yct on 10/28/08 11:11:40. - 10-29-2008.pdf (139.8 kB) -
Search II
, added by yct on 10/29/08 23:53:13. - hwk3.1.pdf (45.3 kB) - added by yct on 11/03/08 19:05:54.
- 11-05-2008.pdf (156.0 kB) -
discriminative training
, added by yct on 11/05/08 22:28:06. - hwk2_sol.pdf (51.9 kB) - added by yct on 11/09/08 23:10:12.
- hwk2_sol.2.pdf (52.6 kB) - added by yct on 11/10/08 00:41:23.
- june08open_1_e6.lat (40.0 kB) -
word lattice for homework 4
, added by yct on 11/10/08 15:10:34. - 11-10-2008.pdf (0.7 MB) -
Special Problems in Speech Recognition: Adaptation and Multi-System Combination
, added by ianlane on 11/11/08 08:55:32. - hwk4.3.pdf (54.9 kB) - added by yct on 11/12/08 23:48:29.
- 11-12-2008.pdf (1.1 MB) -
Spoken Language Understanding
, added by ianlane on 11/13/08 10:57:50. - 11-17-2008.pdf (0.7 MB) -
Weighted Finite State Transducers for ASR
, added by ianlane on 11/19/08 21:38:25. - hwk3_sol.pdf (67.4 kB) - added by yct on 12/02/08 18:34:45.
- 12-01-2008.pdf (1.1 MB) -
Applications of Speaker Recognition
, added by ianlane on 12/03/08 13:01:45. - 11-24-2008.pdf (6.5 MB) -
Spoken Language Translation
, added by ianlane on 12/03/08 15:04:24. - 12-03-2008.pdf (94.8 kB) -
Recap and Q&A
, added by ianlane on 12/03/08 18:00:04. - hwk4_sol.pdf (71.9 kB) - added by yct on 12/07/08 18:48:15.
