Sumários
2. Finite Markov decision problems
20 outubro 2021, 09:30 • Francisco Melo
- Finite Markov decision processes (Chap. 3)
- The agent-environment interface
- Goals and rewards
- Policies and value functions
- Optimality and optimal policies
- Dynamic programming (Chap. 4)
- Policy evaluation (prediction)
1. Introduction. Multi-armed bandits
13 outubro 2021, 09:30 • Francisco Melo
- Introduction (Chap. 1)
- Multi-armed bandits (Chap. 2):
- Action-valued methods
- Incremental implementation
- Optimistic initial values
- Upper-confidence bound heuristic
- Gradient bandit algorithms