Project 3 - MCTS
Dealing with a flat reward landscape.
In an overwhelming majority of playouts, the reward function will just return 0 because Sir. Uthgard lost the game. It doesn't matter if he died immediately, or if he was almost getting the final chest but ran out of time, the reward will be 0. Given the stochastic nature of MCTS, and the gigantic number of possible states (aproximated by 25!), MCTS will have lots of problems finding out what are the best actions to win the game.
Now that the problem is identified, how to deal with this?
a) One of the solutions, as pointed out in the slides is to used Biased Playouts with good heuristics to help guide the playouts towards the few victory states. However, this might still be problematic if your heuristic is not good enough, or because there is still a stochastic nature in the Biased Playout.
b) Change the reward function so that the game stops having a flat reward landscape. Your reward landscape should now look like this:
Dealing with stochastic action effects
Project 2 - Goal BoundingThe description for the Goal Bounding technique is not in the Book. You can find a generic description of GoalBounding in the following paper: http://www.cs.du.edu/~sturtevant/papers/goalpruning.pdf
- The initial part of the algorithm, which corresponds to the initialization (lines 2 - 10) is already done by me in the IAJMenuItem.cs file (you will find this class under the Editor folder).
- Line 11, (the initial for) is also already done. You just need to implement the Dijkstra Floodfill and the calculation of the bounding boxes (lines 13-22).
- The calculation/update of the BoundingBoxes can be performed by two distinct approaches
- After Dijkstra Floodfill finishes (as seen in Algorithm 1)
- While Diskstra Foodfill is running, as soon as a node is placed in the closed set.
- Due to Dijkstra's properties, both approaches are equivalent. I think the second is simpler to implement, but it is up to you.
- Lines 16--20 might seem confusing, but they are just updating the bounding box for all defined axis. In our case we are using two axis, x and z. Anyway, this update is already implemented in the method UpdateBounds of the class Bounds. Feel free to use it.
The implementation of this is rather straightforward, whenever a new noderecord is generated or updated, the corresponding color is determined by the color of the parent, except for the initial red/blue/green noderecords, because their parent does not have a color (the initial node is colourless).
Project 1 Errata:In the slides regarding RVO Movement Algorithm,
if(tc > 0) //future collision
timePenalty = w/time
if(tc > 0) //future collision
timePenalty = w/tc
and the line :
penalty = distancePenalty + timePenaltyshould be:
penalty = distancePenalty + maximumTimePenalty
These changes have been updated in the slides.
Also, you can find an implementation for the TimeToCollision method under the MathHelper class.