Evaluating Demonstrations (the Good, the Bad and the Worse)

Description

Poster and Code for the project in Reinforcement Learning course of the MSc in Artificial Intelligence at the University of Amsterdam. Joint project of Gabriele Bani, Andrii Skliar, Gabriele Cesa and Davide Belli

Main Idea

Using single human demonstration has been shown to outperform humans and beat state of the art models in hard exploration problems [Learning Montezuma's Revenge from a Single Demonstration].

However, it takes an experienced professional to provide good demonstration to the model, which might be impossible in real problems. It might also be difficult to obtain optimal demonstrations. Can we still learn optimal policies from sub-optimal demonstrations?

Approach

Basic idea: divide the trajectory in n splits. Train on the last one until convergence, then select the previous split. Repeat until the first split, so to learn from increasingly difficult exploration problems.

Results

Figure: Returns over episodes in Maze (left), MounainCar (middle) and LunarLander (right).

Non optimal demonstrations can lead to optimal results, but better demonstrations lead to better learning and give more reliable
In Maze, using bad demonstrations rather than suboptimal ones results in a better final policy because of a higher degree of exploration.
With more complex environments, we expect demonstrations to allow for a much faster training than training from scratch.
The current implementation is very sensitive to hyperparameter choices; there is a need for a more automatic and reliable version of the backward algorithm to overcome this issue.

Copyright

This project is distributed under the MIT license. This was developed as part of the Reinforcement Learning course taught by Herke van Hoof at the University of Amsterdam. Please follow the UvA regulations governing Fraud and Plagiarism in case you are a student.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RL_Poster.pdf		RL_Poster.pdf
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating Demonstrations (the Good, the Bad and the Worse)

Description

Main Idea

Approach

Results

Copyright

About

Releases

Packages

Contributors 4

Languages

License

gabriele-bani/rl-demonstrations

Folders and files

Latest commit

History

Repository files navigation

Evaluating Demonstrations (the Good, the Bad and the Worse)

Description

Main Idea

Approach

Results

Copyright

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages