Sumários · Aprendizagem por Reforço

pt en

Sumários

2. Finite Markov decision problems

20 outubro 2021, 09:30 • Francisco Melo

Finite Markov decision processes (Chap. 3)

The agent-environment interface
Goals and rewards
Policies and value functions
Optimality and optimal policies

Dynamic programming (Chap. 4)

Policy evaluation (prediction)

1. Introduction. Multi-armed bandits

13 outubro 2021, 09:30 • Francisco Melo

Introduction (Chap. 1)
Multi-armed bandits (Chap. 2):
- Action-valued methods
- Incremental implementation
- Optimistic initial values
- Upper-confidence bound heuristic
- Gradient bandit algorithms

Not Taught.

6 outubro 2021, 09:30 • Francisco Melo

There was no class due to lack of a room.

Not Taught.

29 setembro 2021, 09:30 • Francisco Melo

There was no class due to lack of a room.