Sumários

Session L5

22 abril 2022, 15:30 João Godinho Ribeiro

Reinforcement learning.


13. Exploration vs. exploitation

22 abril 2022, 09:30 Francisco Melo

Exploration vs. exploitation (Chap. 9):

  • The prediction problem (9.1)
  • Prediction with complete information: weighted majority and exponentially weighted averager (9.2)
  • Adversarial bandits and EXP3 (9.4)
  • Stochastic multi-armed bandits and UCB (9.3)
  • Applications:

    • Monte Carlo tree search
    • TD-Gammon
    • Alpha-Go
Wrap-up.


13. Exploration vs. exploitation

21 abril 2022, 15:00 Francisco Melo

Exploration vs. exploitation (Chap. 9):

  • The prediction problem (9.1)
  • Prediction with complete information: weighted majority and exponentially weighted averager (9.2)
  • Adversarial bandits and EXP3 (9.4)
  • Stochastic multi-armed bandits and UCB (9.3)
  • Applications:

    • Monte Carlo tree search
    • TD-Gammon
    • Alpha-Go
Wrap-up.


Session L5

21 abril 2022, 08:30 Diogo Filipe de Sousa Carvalho

Reinforcement learning.


Session L5

20 abril 2022, 14:00 José Alberto Rodrigues Pereira Sardinha

Reinforcement learning.