= 11-751 Fall 2008: Speech Recognition and Understanding = * '''Instructors:''' * [http://www.cs.cmu.edu/~ahw/ Alex Waibel] (waibel@cs.cmu.edu) Office: Room 203, 407 S. Craig Street * [http://www.cs.cmu.edu/~ianlane Ian Lane] (ianlane@cs.cmu.edu) Office: Room 221, 407 S. Craig Street * [http://www.cs.cmu.edu/~yct/ Wilson Tam] (yct@cs.cmu.edu) Office: Room 209, 407 S. Craig Street '''[Teaching assistant]''' * '''Times:''' Monday and Wednesday, 4:30pm - 5:50pm * '''Place:''' Doherty Hall 1209 * '''First Class:''' Monday August 25th * '''Grading:''' Homework 30%, Exam 40%, Term Project 30% * '''Final:''' Friday, December 12, Newell-Simon Hall 3002, 10:00-13:00 ---- == [wiki:Projects 11-751 Term-Projects] == ---- == [wiki:CourseDescription Course Description] == The technology to allow humans to communicate with machines by speech and the technology to enable machines to understand when humans communicate with each other is rapidly maturing. This course provides an introduction to the theoretical background as well as the experimental practice that has made the field what it is today. We will cover theoretical foundations, essential algorithms, major approaches, experimental strategies and current state-of-the-art systems and will introduce the participants to ongoing work in representation, algorithms and interface design. The course will be completed by a brief overview of multilingual speech recognition dealing with various languages. This course is primarily for graduate students in LTI, CS, Robotics, ECE, HCI, Psychology, or Computational Linguistics. Others by prior permission of instructor. No prior experience with speech recognition is necessary. The course is suitable for graduate students with some background in computer science and electrical engineering, as well as for advanced undergraduates. The course involves written and programming assignments. Some reading of papers may also be required. ---- == Course Schedule == ||'''Date'''||'''Topic'''|| ||'''Required Reading'''|| ||M Aug 25||Introduction ||[attachment:08-25-2008.pdf?format=raw slides]|| || ||W Aug 27||ASR: The Big Picture ||[attachment:08-27-2008.pdf?format=raw slides]||[[attachment:Young_1996.pdf?format=raw 1]]|| ||M Sep 1 ||~~No Class~~ (Labor Day) || || || ||W Sep 3 ||Template Based Recognition, DTW ||[attachment:09-03-2008.pdf?format=raw slides]||[[attachment:Sakoe_1978.pdf?format=raw 2] (optional), [attachment:Ney_1984.pdf?format=raw 3]]|| ||M Sep 8 ||Signal Processing I || [attachment:09-08-2008.pdf?format=raw slides] ||[[attachment:Schafer_1975.pdf?format=raw 4]]|| ||W Sep 10||Signal Processing II || [attachment:09-10-2008.pdf?format=raw slides] || || ||M Sep 15||Pattern Recognition and Classification || [attachment:09-15-2008.pdf?format=raw slides] || || ||W Sep 17||Pattern Recognition and Classification II ||[attachment:09-17-2008.pdf?format=raw slides] ([attachment:Project_Introduction.pdf?format=raw Introduction to Term Projects]) || || ||M Sep 22||~~No Class~~ ([http://www.interspeech2008.org Interspeech 2008]) || || || ||W Sep 24||~~No Class~~ ([http://www.interspeech2008.org Interspeech 2008]) || || || ||M Sep 29||Speaker Identification and Classification || [attachment:09-29-2008.pdf?format=raw slides]|| [[attachment:Doddington_1985.pdf?format=raw 9] (optional), [attachment:Doddington_1985.pdf?format=raw 10]]|| ||W Oct 1 ||Hidden Markov Models I || [attachment:hmm1.pdf?format=raw slides] ||[[attachment:Rabiner_1989.pdf?format=raw 5],[attachment:Juang_1991.pdf?format=raw 6]]|| ||M Oct 6 ||Hidden Markov Models II || [attachment:hmm2.pdf?format=raw slides] [[attachment:em.pdf?format=raw convergence proof]]|| || ||W Oct 8 ||Acoustic Modeling I || [attachment:10-08-2008.pdf?format=raw slides] || || ||M Oct 13||Acoustic Modeling II || [attachment:10-13-2008.pdf?format=raw slides] || || ||W Oct 15|| ~~No Class~~ || || || ||M Oct 20||Language Modeling I || [attachment:10-20-2008.pdf?format=raw slides] ||[[attachment:Rosenfeld_2000.pdf?format=raw 7],[attachment:Bellegarda_2004.pdf?format=raw 8]]|| ||W Oct 22||Language Modeling II || [attachment:10-22-2008.pdf?format=raw slides] || || ||M Oct 27||Search I || [attachment:10-27-2008.pdf?format=raw slides] || || ||W Oct 29||Search II || [attachment:10-29-2008.pdf?format=raw slides] || || ||M Nov 3 ||~~No Class~~|| || || ||W Nov 5 ||Special Issues in Speech Recognition: Discriminative Training|| [attachment:11-05-2008.pdf?format=raw slides] || || ||M Nov 10||Special Issues in Speech Recognition: Adaptation|| [attachment:11-10-2008.pdf?format=raw slides] || || ||W Nov 12||Spoken Language Understanding|| [attachment:11-12-2008.pdf?format=raw slides] || || ||M Nov 17||Weighted Finite State Transducers for ASR|| [attachment:11-17-2008.pdf?format=raw slides] || || ||W Nov 19||~~No Class~~|| || || ||M Nov 24||Spoken Language Translation|| [attachment:11-24-2008.pdf?format=raw slides] || || ||W Nov 26||~~Thanksgiving Holiday~~ || || || ||M Dec 1 ||Applications of Speaker Recognition|| [attachment:12-01-2008.pdf?format=raw slides] || || ||W Dec 3 ||Recap and Q&A|| [attachment:12-03-2008.pdf?format=raw slides] || || ||M Dec 8 ||~~No Class~~ (term projects and exam preparation)|| || || ||W Dec 10||~~No Class~~ (term projects and exam preparation)|| || || ||F Dec 12||'''Final Exam:''' Newell-Simon Hall 3002 (10:00 – 13:00) || || || ||M Dec 15||Student Project Presentations|| || || ||T Dec 16||Term Project Report Due|| || || ---- == Required Reading == * [[attachment:Young_1996.pdf?format=raw 1]] Steve Young, "Large vocabulary continuous speech recognition: A review," Tech. Rep., Engineering Dept., Cambridge Univ. Cambridge, U.K., 1996. * [[attachment:Sakoe_1978.pdf?format=raw 2]] Hiroaki Sakoe and Seibi Chiba, "Dynamic Programming Algorithm Optimization for Spoken Word Recognition," IEEE Transcations on Acoustics, Speech, Signal Processing vol. 26 (1), pp. 43-49, February, 1978. * [[attachment:Ney_1984.pdf?format=raw 3]] Hermann Ney, "The Use of a One-Stage Dynamic Programming Algorithm for Connected Word Recognition," IEEE Transcations on Acoustics, Speech, Signal Processing vol. 32 (2), pp. 263-271, April, 1984. * [[attachment:Schafer_1975.pdf?format=raw 4]] Ronald Schafer and Lawrence Rabiner, "Digital Representations of Speech Signals," Proc. IEEE vol 63(4) 662-667, April, 1975. * [[attachment:Rabiner_1989.pdf?format=raw 5]] Lawrence Rabiner, “Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition,” In Proc. IEEE, vol. 77, no. 2, pp. 257-285, 1989. * [[attachment:Juang_1991.pdf?format=raw 6]] Biing-Hwang Juang and Lawrence Rabiner, “Hidden Markov models for speech recognition,” Technometrics, vol. 33, pp. 251–272, Aug. 1991. * [[attachment:Rosenfeld_2000.pdf?format=raw 7]] R. Rosenfeld, “Two decades of statistical language modeling: Where do we go from here?” presented at the Workshop-2000 Spoken Lang. Reco. Understanding, Summit, NJ, Feb. 2000. * [[attachment:Bellegarda_2004.pdf?format=raw 8]] Jerome Bellegarda, "Statistical language model adaptation: Review and perspectives," Speech Communication, vol. 42, pp. 93-108, 2004. * [[attachment:Doddington_1985.pdf?format=raw 9]] G. R. Doddington, "Speaker recognition - Identifying people by their voices," In Proc. IEEE, vol. 73, pp. 1651–1664, Nov. 1985. * [[attachment:Doddington_1985.pdf?format=raw 10]] D. Reynolds, "An overview of automatic speaker recognition technology," In Proc. IEEE ICASSP, 2002. ---- === [wiki:References Textbooks and References] === * Xuedong Huang, Alex Acero and Hsiao-wuen Hon, Spoken Language Processing, Prentice Hall PTR, NJ, 2001 * Jelinek, Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA, 1997 * Rabiner and Juang, Fundamentals of Speech Recognition, Prentice Hall Signal Processing Series, Englewood Cliffs, NJ, 1993 * Rabiner and Schafer, Digital Processing of Speech Signals, Prentice-Hall PTR, NJ, 1978 * Waibel and Lee, Readings in Speech Recognition, Morgan Kaufman Publishers, San Mateo, CA, 1990 ---- == Homework Assignment == * [[attachment:assignment1.pdf?format=raw Assignment one]] Due on September 17th before class * [[attachment:hwk2.pdf?format=raw Assignment two]] [[attachment:hwk2_sol.pdf?format=raw solution]] Due on October 15th before class * [[attachment:hwk3.1.pdf?format=raw Assignment three]] [[attachment:swb_data.tgz?format=raw LM datasets]] [[http://www.speech.sri.com/projects/srilm/download.html SRILM download]] [[http://www.speech.sri.com/projects/srilm/manpages SRILM manual]] Due on November 5th before class * Use ngram-count for LM estimation * Use ngram for perplexity evaluation * [[attachment:hwk3_sol.pdf?format=raw solution]] * [[attachment:hwk4.3.pdf?format=raw Assignment four]] [[attachment:june08open_1_e6.lat?format=raw word lattice for problem 4]] Due on December 1st before class * [[attachment:hwk4_sol.pdf?format=raw solution]]