Automatic speech processing

EE-554

This file is part of the content downloaded from Automatic speech processing.

Course summary

The course will take place in Room INF 019.

Students joining the course online can join through the following Zoom link

https://idiap-ch.zoom.us/j/2732524500

No password is needed.

Lab exercises:

- Python versions are recommended, where available, and are most actively maintained

- Octave exercises have been updated and confirmed to work with Octave version 6.4.0 in October 2023

- Matlab exercises are provided as is, but have not been updated in a long time

Suggested Text Books

L. R. Rabiner and B-H Juang. Fundamentals of Speech Recognition. Prentice Hall 1993

B. Gold, N. Morgan and D. Ellis. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley 2011.

X. Huang, A. Acero and H-W Hon. Spoken Language Processing: A guide to theory, algorithm and system development. Prentice Hall, 2001.

L. R. Rabiner and R. Schafer. Theory and Applications Digital Speech Processing. Pearson. 2010

L. R. Rabiner and R. Schafer. Digitial Processing of Speech Signals. Prentice Hall, 1978.

P. Taylor. Text-to-Speech Synthesis. Cambridge University Press, 2011.

B. Schuller and A. Batliner, Computational Paralinguistics: Emotion, Affect and Personality in Speech and Language Processing.

Week 1 (Sep 12, 2024)

Week 2 (Sep 19, 2024)

Speech Signal Processing part

Week 3 (Sep 26, 2024)

Source-system decoding, speech coding and feature extraction.

Week 4 (Oct 3, 2024)

Discussion of answers to take away questions.

Week 5 (Oct 10, 2024)

Statistical pattern recognition basics.

Week 6 (Oct 17, 2024)

This course presents an overview on feature/representation learning

Week 7-9 (Oct 31 - Nov 14, 2024) - Automatic Speech Recognition

Week 10-11 (Nov 21 - Nov 28, 2024) Text-to-Speech Synthesis

Week 12 (Dec 5, 2024) Automatic Speaker Recognition

Week 13 (Dec 12, 2024) Paralinguistic Speech Processing

Week 14 (Dec 19, 2024) Question-Answering

Week 8 (Nov 9, 2023)

This lecture dealt with feature vector representation, statistical pattern recognition (Q&A) and sequence matching, starting with string matching using dynamic programming.

Week 9 (Nov 16, 2023)

Hidden Markov model based speech recognition

Week 9 (Nov 16, 2023)

In this week, we dealt with sequence matching (string matching and matching two speech utterances through dynamic programming), automatic speech recognition problem formulation, and discrete Markov model, and modeling of the set of hypotheses using DMM.

Week 10-11 (Nov 23-Nov 30, 2023)

Continuation of lecture on automatic speech recognition

Week 12-13 (Dec 7-Dec14, 2023)

Text-to-speech synthesis

Week 14 (Dec 21, 2023)

a) Automatic speaker recognition

b) Question-answering

c) Release of exam question set

Week 13 (Dec 14, 2023)

(a) Automatic speaker recognition continued

(b) An overview of paralinguistic speech processing

Week 14 (Dec 21, 2023)

Question-Answering