Sumários
13. Exploration vs. exploitation
22 abril 2022, 09:30 • Francisco Melo
Exploration vs. exploitation (Chap. 9):
- The prediction problem (9.1)
- Prediction with complete information: weighted majority and exponentially weighted averager (9.2)
- Adversarial bandits and EXP3 (9.4)
- Stochastic multi-armed bandits and UCB (9.3)
Applications:
- Monte Carlo tree search
- TD-Gammon
- Alpha-Go
13. Exploration vs. exploitation
21 abril 2022, 15:00 • Francisco Melo
Exploration vs. exploitation (Chap. 9):
- The prediction problem (9.1)
- Prediction with complete information: weighted majority and exponentially weighted averager (9.2)
- Adversarial bandits and EXP3 (9.4)
- Stochastic multi-armed bandits and UCB (9.3)
Applications:
- Monte Carlo tree search
- TD-Gammon
- Alpha-Go