Course Materials
Labs
Lab1
The Jupyter notebook of the Lab1 assignment is available in the JupyterHub platform at this address: jhub.istart.pt. Each group has an account to log in and the password will be provided during the first lab session. The Lab1 assignment due date is Sunday, May 5th, 23:59. The deliverables must be submitted inside the JupyterHub platform.Lab2
The Lab2 assignment is available in the GitLab of the RNL at this address: https://gitlab.rnl.tecnico.ulisboa.pt/ist-slp/slp24/speech-processing-lab-2. Students may be able to login through Fênix OAuth. Alternatively, it can be downloaded from here. The Lab2 assignment due date is Sunday, May 19th, 23:59. The deliverables must be submitted via Fênix.Lab3
The Jupyter notebook of the Lab3 assignment is available at Google Colab. The assignment due date is Sunday, June 2nd, 23:59. The deliverables must be submitted via Fênix.
Lectures
Lecture01 (April 16th) - Introduction, Speech Production and Perception
- LCO notes on SLP Course Logistics
- LCO slides on SLP Course Logistics
- LCO notes on Introduction to the SLP Course
- LCO slides on Introduction to the SLP Course
- LCO notes on Speech Production, Perception and Phonetics
- LCO slides on Speech Production, Perception and Phonetics
- Jurafsky 2022, Chapter 28: Phonetics, pdf
- 28.1 Speech Sounds and Phonetic Transcription
- 28.2 Articulatory Phonetics
- 28.3 Prosody
- 28.1 Speech Sounds and Phonetic Transcription
Lecture02 (April 18th) - Speech Signal Representations
- LCO notes on Speech Signal Representations
- LCO slides on Speech Signal Representations
- Backstrom 2022, Chapter 3: Basic Representations
- 3.3. Waveform
- 3.7. Autocorrelation and autocovariance
- 3.8. The cepstrum, mel-cepstrum and mel-frequency cepstral coefficients (MFCCs)
- 3.10. Fundamental frequency (F0)
- 3.11. Zero-crossing rate
- 3.12. Deltas and Delta-deltas
- Jurafsky 2022, Chapter 16: Automatic Speech Recognition and Text-to-Speech, pdf
- 16.1 The Automatic Speech Recognition Task
- 16.2 Feature Extraction for ASR: Log Mel Spectrum
Lecture03 (April 23rd) - Model of Speech Production
- LCO notes on Model of Speech Production
- LCO slides on Model of Speech Production
- Optional reading: Huang 2001, Chapter 5: Digital Signal Processing, pdf
Lecture04 (April 30th) - Speech Classification 1
- Alberto Abad slides on Speech Classification 1
- Backstrom 2022, Chapter 3: Basic Representations
- 3.5. Signal energy, loudness and decibel
- 3.6. Spectrogram and the STFT
- 3.8. The cepstrum, mel-cepstrum and mel-frequency cepstral coefficients (MFCCs)
- 3.10. Fundamental frequency (F0)
- 3.11. Zero-crossing rate
- 3.12. Deltas and Delta-deltas
- 3.14. Jitter and shimmer
Lecture05 (May 2nd) - Speech Classification 2
- Alberto Abad slides on Speech Classification 2
- Backstrom 2022, Chapter 5. Modelling tools in speech processing:
- 5.5 Gaussian mixture model (GMM)
- 5.6 Neural networks
- Backstrom 2022, Chapter 8. Recognition tasks in speech processing
- 8.1. Voice Activity Detection (VAD)
- 8.4. Speaker Recognition and Verification
- 8.5. Speaker Diarization
- 8.6. Paralinguistic speech processing
Lecture06 (May 7th) - Automatic Speech Recognition
- Alberto Abad slides on Classical methods for Automatic Speech Recognition
- Jurafsky & Martin [version 2007], Chapter 9 Automatic Speech Recognition
Lecture07 (May 9th) - Deep Learning for Sequence Processing
- BM slides on Deep Learning for Sequence Processing Fundamentals
- The illustrated Transformer
- Understanding and coding the self-attention mechanism of large language models from scratch
- Jurafsky and Martin (3rd ed. Draft, 2024), Speech and Language Processing - Chapter on Transformers and Large Language Models
- Jurafsky and Martin (3rd ed. Draft, 2024), Speech and Language Processing - Chapter on Fine-tuning and Masked Language Models
Lecture08 (May 14th) - Deep Speech Models
- BM slides on Deep Speech Models
- Sequence Modeling With CTC
- Jurafsky and Martin (3rd ed. Draft, 2024), Speech and Language Processing - Chapter on Fine-tuning and Masked Language Models
- Jurafsky and Martin (3rd ed. Draft, 2024), Speech and Language Processing - Chapter on Automatic Speech Recognition and TTS
Lecture09 (May 16th) - Speech Synthesis
- LCO notes on Speech Synthesis
- LCO slides on Speech Synthesis
- Backstrom 2022, Chapter 9: Speech Synthesis
- Backstrom 2022, Chapter 5: Modelling tools in speech processing
- Backstrom 2022, Chapter 6: Evaluation of speech processing methods
Lecture10 (May 21st) - Dialogue Systems I
- BM slides covering modular (task-oriented) dialogue systems, and evaluation of dialogue systems
- Jurafsky and Martin (3rd ed. Draft, 2024), Speech and Language Processing - Chapter on Chatbots & Dialogue Systems
Lecture11 (May 28th) - Dialogue Systems II
- BM slides on end-to-end dialogue systems
- Jurafsky and Martin (3rd ed. Draft, 2024), Speech and Language Processing - Chapter on Chatbots & Dialogue Systems
Previous Year Exams
- 2022-23 Model Exam: Text Solution
- 2022-23 First Exam: Text Solution
- 2022-23 Second Exam: Text Solution