Course Materials

Labs

Lab1

The Jupyter notebook of the Lab1 assignment is available in the JupyterHub platform at this address: jhub.istart.pt. Each group has an account to log in and the password will be provided during the first lab session. The Lab1 assignment due date is Sunday, May 5th, 23:59. The deliverables must be submitted inside the JupyterHub platform.

Lab2

The Lab2 assignment is available in the GitLab of the RNL at this address: https://gitlab.rnl.tecnico.ulisboa.pt/ist-slp/slp24/speech-processing-lab-2. Students may be able to login through Fênix OAuth. Alternatively, it can be downloaded from here. The Lab2 assignment due date is Sunday, May 19th, 23:59. The deliverables must be submitted via Fênix.

Lab3

The Jupyter notebook of the Lab3 assignment is available at Google Colab. The assignment due date is Sunday, June 2nd, 23:59. The deliverables must be submitted via Fênix.

Lectures

Lecture01 (April 16th) - Introduction, Speech Production and Perception

Lecture02 (April 18th) - Speech Signal Representations

  • LCO notes on Speech Signal Representations
  • LCO slides on Speech Signal Representations
  • Backstrom 2022, Chapter 3: Basic Representations
    • 3.3. Waveform
    • 3.7. Autocorrelation and autocovariance
    • 3.8. The cepstrum, mel-cepstrum and mel-frequency cepstral coefficients (MFCCs)
    • 3.10. Fundamental frequency (F0)
    • 3.11. Zero-crossing rate
    • 3.12. Deltas and Delta-deltas
  • Jurafsky 2022, Chapter 16: Automatic Speech Recognition and Text-to-Speech, pdf
    • 16.1 The Automatic Speech Recognition Task
    • 16.2 Feature Extraction for ASR: Log Mel Spectrum

Lecture03 (April 23rd) - Model of Speech Production

Lecture04 (April 30th) - Speech Classification 1

  • Alberto Abad slides on Speech Classification 1
  • Backstrom 2022, Chapter 3: Basic Representations
    • 3.5. Signal energy, loudness and decibel
    • 3.6. Spectrogram and the STFT
    • 3.8. The cepstrum, mel-cepstrum and mel-frequency cepstral coefficients (MFCCs)
    • 3.10. Fundamental frequency (F0)
    • 3.11. Zero-crossing rate
    • 3.12. Deltas and Delta-deltas
    • 3.14. Jitter and shimmer

Lecture05 (May 2nd) - Speech Classification 2

  • Alberto Abad slides on Speech Classification 2
  • Backstrom 2022, Chapter 5. Modelling tools in speech processing:
    • 5.5 Gaussian mixture model (GMM)
    • 5.6 Neural networks
  • Backstrom 2022, Chapter 8. Recognition tasks in speech processing
    • 8.1. Voice Activity Detection (VAD)
    • 8.4. Speaker Recognition and Verification
    • 8.5. Speaker Diarization
    • 8.6. Paralinguistic speech processing

Lecture06 (May 7th) - Automatic Speech Recognition

Lecture07 (May 9th) - Deep Learning for Sequence Processing

Lecture08 (May 14th) - Deep Speech Models

Lecture09 (May 16th) - Speech Synthesis

Lecture10 (May 21st) - Dialogue Systems I

Lecture11 (May 28th) - Dialogue Systems II

Previous Year Exams


This Year Exams