Sumários

2. Finite Markov decision problems

20 outubro 2021, 09:30 Francisco Melo

  • Finite Markov decision processes (Chap. 3)
    • The agent-environment interface
    • Goals and rewards
    • Policies and value functions
    • Optimality and optimal policies
  • Dynamic programming (Chap. 4)
    • Policy evaluation (prediction)


1. Introduction. Multi-armed bandits

13 outubro 2021, 09:30 Francisco Melo

  • Introduction (Chap. 1)
  • Multi-armed bandits (Chap. 2):
    • Action-valued methods
    • Incremental implementation
    • Optimistic initial values
    • Upper-confidence bound heuristic
    • Gradient bandit algorithms


Not Taught.

6 outubro 2021, 09:30 Francisco Melo

There was no class due to lack of a room.


Not Taught.

29 setembro 2021, 09:30 Francisco Melo

There was no class due to lack of a room.