Automatic speech processing
EE-554
The course will take place in Room INF 019.
Students joining the course online can join through the following Zoom link
https://idiap-ch.zoom.us/j/2732524500
No password is needed.
Lab exercises:
- Python versions are recommended, where available, and are most actively maintained
- Octave exercises have been updated and confirmed to work with Octave version 6.4.0 in October 2023
- Matlab exercises are provided as is, but have not been updated in a long time
Suggested Text Books
L. R. Rabiner and B-H Juang. Fundamentals of Speech Recognition. Prentice Hall 1993
B. Gold, N. Morgan and D. Ellis. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley 2011.
X. Huang, A. Acero and H-W Hon. Spoken Language Processing: A guide to theory, algorithm and system development. Prentice Hall, 2001.
L. R. Rabiner and R. Schafer. Theory and Applications Digital Speech Processing. Pearson. 2010
L. R. Rabiner and R. Schafer. Digitial Processing of Speech Signals. Prentice Hall, 1978.
P. Taylor. Text-to-Speech Synthesis. Cambridge University Press, 2011.
B. Schuller and A. Batliner, Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing.
- News forum (Forum)
- Discussion forum (Forum)
- Additional videos and lectures can be found on the ISCA SCOOT platform (URL)
- Questions sets (Folder)
- ASP Exam 2022-2023 (Folder)
- ASP Exam 2023-2024 (File)
- A Course in Phonetics by Peter Ladefoged and Keith Johnson (URL)
Week 1 (Sep 12, 2024)
- Introduction lecture slides (File)
- Lecture 1 recap questions (File)
- International Phonetics Alphabet (URL)
- Tones listening test (Folder)
- Critical band listening test (Folder)
- Decibel scale listening test (Folder)
- Equal loudness listening test (Folder)
- Introduction lecture recording - Sep 21, 2023 (URL)
- Introduction lecture Part 2 recording - Sep 28, 2023 (URL)
- INTRODUCTION LECTURE 2024-2025 RECORDING (Sep 12, 2024) (URL)
Week 2 (Sep 19, 2024)
Speech Signal Processing part
- Speech signal analysis lecture slides (File)
- Lecture 2 - recap questions (File)
- Audacity Instructions (File)
- Speech signal processing exercise in Python (Jupyter notebook) (File)
- Speech signal processing exercise in MATLAB (File)
- Speech signal processing exercise in OCTAVE (updated) (File)
- Sampling and Quantization Lecture Recording (Oct 5, 2023) (URL)
- Time Domain Analysis Lecture Recording (October 5, 2023) (URL)
- SPEECH ACQUISTION and TIME DOMAIN ANALYSIS - WEEK 2 LECTURE -PART 1 (Sep 19, 2024) (URL)
- TIME DOMAIN ANALYSIS AND FREQUENCY DOMAIN ANALYSIS - WEEK 2 LECTURE - PART 2 (Sep 19, 2024) (URL)
Week 3 (Sep 26, 2024)
Source-system decoding, speech coding and feature extraction.
- Source-system decomposition, Speech coding and Feature extraction lecture slides (File)
- From frequency to quefrency: a history of the cepstrum by Oppenheim and Schafer (URL)
- Linear Prediction - A tutorial review by John Makhoul (URL)
- Lecture 3 recap questions (File)
- Speech Signal Acquisition and Analysis part Take Away Questions (File)
- Freq. domain analysis and Source-system decomposition Lecture recording - Oct 12, 2023 (URL)
- SPEECH ANALYSIS EXERCISE and SOURCE-SYSTEM DECOMPOSITION - WEEK 3 LECTURE (Sep, 26, 2024) (URL)
Week 4 (Oct 3, 2024)
- Speech signal acquisition and processing Take Away Q&A recording (Oct 19, 2023) (URL)
- LINEAR PREDICTION BASED SOURCE-SYSTEM DECOMPOSITION, SPEECH CODING AND FEATURE VECTOR REPRESENTATION - LECTURE 4 - OCT 3, 2024 (URL)
- SUMMARY SHEET FOR SPEECH SIGNAL ANALYSIS PART (File)
Week 5 (Oct 10, 2024)
Statistical pattern recognition basics.
- Machine learning for speech processing (File)
- Statistical pattern recognition exercise in MATLAB (File)
- Statistical pattern recognition exercise in OCTAVE (updated) (File)
- SIGNAL ANALYSIS SUMMARY AND MACHINE LEARNING FOR SPEECH PROCESSING LECTURE RECORDING (Oct 10, 2024) (URL)
Week 6 (Oct 17, 2024)
This course presents an overview on feature/representation learning
- Machine learning for speech processing (copy) (File)
- HMM exercise in Python (Jupyter notebook) (File)
- HMM exercise in OCTAVE (updated) (File)
- HMM exercise in MATLAB (File)
- A tutorial on hidden Markov models and selected applications in speech recognition by Rabiner (URL)
- A Gentle Tutorial of the EM algorithm and its application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models by Jeff Bilmes (URL)
- Combining Probability Distributions: A Critique and an Annotated Bibliography by Genest and Zidek (URL)
- Combining multiple classifiers by averaging or by multiplying? by Tax et al. (URL)
- Feature/Representation Learning Overview (slides) (File)
- FEATURE/REPRESENTATION LEARNING OVERVIEW LECTURE RECORDING (Oct 17, 2024) (URL)
Week 7-9 (Oct 31 - Nov 14, 2024) - Automatic Speech Recognition
- Automatic Speech Recognition - Part 1 (File)
- Hidden markov model based speech recognition (File)
- Additional slides illustrating HMM-based Automatic Speech Recognition (File)
- Dynamic programming hands-on (File)
- Minimum edit distance (slides from Dan Jurafsky) (File)
- Automatic speech recognition - take away questions (File)
- LECTURE RECORDING AUTOMATIC SPEECH RECOGNITION - Part 1 (Oct 31, 2024) (URL)
- LECTURE RECORDING AUTOMATIC SPEECH RECOGNITION - PART 2 (Nov 7, 2024) (URL)
- LECTURE RECORDING AUTOMATIC SPEECH RECOGNITION - Part 3 (Nov 14, 2024) (URL)
- Whisper ASR Exercise (URL)
Week 10-11 (Nov 21 - Nov 28, 2024) Text-to-Speech Synthesis
- Text-to-speech synthesis Part 1 (File)
- Text-to-speech synthesis Part 2 (File)
- Neural TTS overview slides (File)
- XTTS TTS Exercise (URL)
- LECTURE RECORDING TEXT-TO-SPEECH SYNTHESIS PART 1 (Nov 21, 2024) (URL)
- LECTURE RECORDING TEXT-TO-SPEECH SYNTHESIS PART 2 (Nov 28, 2024) (URL)
Week 12 (Dec 5, 2024) Automatic Speaker Recognition
- Overview of Automatic Speaker Recognition (slides) (File)
- Automatic speaker recognition - Key take away questions (File)
- Low-dimensional speech representation based on Factor Analysis and its applications by Dehak and Shum (URL)
- AUTOMATIC SPEAKER RECOGNITION LECTURE RECORDING (Dec 5, 2024) (URL)
Week 13 (Dec 12, 2024) Paralinguistic Speech Processing
- ML for Speech Processing slides updated with Paralinguistic Speech Processing (File)
- Deep learning based Speech Emotion Recognition (SER) lecture slides (File)
- Emotion Recognition exercise in Python (Jupyter notebook) (File)
- INTRODUCTION TO PARALINGUISTIC SPEECH PROCESSING LECTURE RECORDING (Dec 12, 2024)() (URL)
Week 14 (Dec 19, 2024) Question-Answering
- AUTOMATIC SPEAKER RECOGNITION and QUESTION-ANSWERING LECTURE RECORDING (Dec 19, 2024) (URL)
- AUTOMATIC SPEECH PROCESSING EXAM SAMPLE QUESTION SET (Dec 19, 2024) (File)
Week 8 (Nov 9, 2023)
This lecture dealt with feature vector representation, statistical pattern recognition (Q&A) and sequence matching, starting with string matching using dynamic programming.
- Feature parametrization and statistical pattern recognition (Lecture recording Nov 9, 2023) (URL)
- Pattern recognition (Q&A) and introduction to sequence matching (lecture recording Nov 9, 2023) (URL)
- End-to-End Acoustic Modeling using Convolutional Neural Networks for HMM-based Automatic Speech Recognition by Palaz, Magimai-Doss and Collobert (File)
Week 9 (Nov 16, 2023)
Hidden Markov model based speech recognition
Week 9 (Nov 16, 2023)
- Introduction to Automatic Speech Recognition (File)
- Sequence matching - Nov 16, 2023 first part lecture recording (URL)
- ASR formulation, Discrete Markov model, and Language Modeling - Nov 16, 2023 second part lecture recording (URL)
Week 10-11 (Nov 23-Nov 30, 2023)
Continuation of lecture on automatic speech recognition
- Introduction to Automatic Speech Recognition (copy) (File)
- Hidden markov model based speech recognition (copy) (File)
- Additional slides illustrating HMM-based Automatic Speech Recognition (copy) (File)
- Language Modeling, Knowledge-based ASR and Instance-based ASR lecture (Nov 23, 2023) - Part 1 (URL)
- HMM-based ASR Introduction lecture (Nov 23, 2023) - Part 2 (URL)
- HMM ASR fundamentals and HMM ASR Training (Nov 30, 2023) - Part 3 (URL)
- Practical aspects of HMM-based ASR (Nov 30, 2023) - Part 4 (URL)
Week 12-13 (Dec 7-Dec14, 2023)
- ASR Question-Answering and Speech Synthesis Intro - lecture recording (Dec 7, 2023) (URL)
- Text-to-speech synthesis - Part 2 - Lecture recording (Dec 14, 2023)) (URL)
- Text-to-speech synthesis Recap Questions (File)
Week 14 (Dec 21, 2023)
- Automatic Speaker Recognition Lecture Recording (Dec 21, 2023) (URL)
- Automatic Speech Processing Exam Sample Questions (Exam on Jan 20, 2024) (File)
- Wu et al.,"Spoofing and countermeasures for speaker verification: A survey", Speech Communication, 2015 (URL)
Week 13 (Dec 14, 2023)
(a) Automatic speaker recognition continued
(b) An overview of paralinguistic speech processing
Week 14 (Dec 21, 2023)
Question-Answering