Environment PongNoFrameskip-v4
- Demystify RL algorithms by providing minimal, pytorch object-oriented implementations and it's accompanying pseudocode
and explanation
- I also provide quick explanations on typical
tricky manipulations, like.squeeze()
(at the end of repo)
- I also provide quick explanations on typical
- Support my theory notes
- Practice implementing algorithms
- Accompanying theory notes 📕
- Minimal and Object Oriented code for simple (Semi Gradient Sarsa) and state of the art algorithms (PPO-Clip)
- Understandable and intuitive logging via experience tracking
- Easy reward and loss plotting
- Hyperparameter Tuning (in < 40 lines of code for all algorithms)*
- Intuitive terminal interface (in < 50 lines)
- These implementations aren't supposed to be used in research, but for full-transparency learning.
As so, no testing capabilities or pre-trained models are provided.
Algorithm | Lines of Code | Verified Environments |
Semi Gradient Sarsa | < 100 | CartPole-v1 |
Reinforce with Baseline | < 100 | CartPole-v1 |
Deep Deterministic Policy Gradient (DDPG) | ~ 150 | HalfCheetah-v2 , Pendulum-v1 |
N Step Actor Critic | < 150 | CartPole-v1 , LunarLander-v2 |
Double Deep Q Network (DDQN) | < 200 | CartPole-v1 , LunarLander-v2 , PongNoFrameskip-v4 |
Proximal Policy Optimization (PPO) | < 200 | CartPole-v1 , LunarLander-v2 , Pendulum-v1 |
For a complete description run
pyhton run.py -h
python run.py --algo <algo> --env <env>
├─config.txt >> Contains agent configuration
├─log.txt >> Stdout output (useful for customization)
└─results.csv >> CSV of Rewards and Loss
For example running
python run.py --algo ddqn --env CartPole-v1
will yield
├─config.txt >> Contains agent configuration
├─log.txt >> Stdout output (useful for customization)
└─results.csv >> CSV of Rewards and Loss
- Running again does not overwrite, but appends new experiments
folder - The files begin to be written as soon as the experiment starts. Hence interrupting via
will still yield plottable results. - You can also delete last experiment by running the same command with
This is mostly helpful if you plan on adding different environments. It uses optuna to run several hyperparameter combinations and picks the best.
python run.py --algo ddpg --env Pendulum-v1 --optimize --n-trials 100
├─config.txt >> Contains agent configuration
├─log.txt >> Stdout output (useful for customization)
└─results.csv >> CSV of Rewards and Loss
- You can cancel it with CTRL+C
- For simplicity, all hyperparameter suggestions are done in
method. I'll leave the tweaking around for you.
Used for plotting losses and rewards
python run.py --algo <algo> --env <env> --plot <experiment>
Ex: python run.py --algo ppo --env CarPole-v1 --plot experiment_6
Would open
can be ommited and it will use latest experience for specified algorithm and environment.
- put pseudocde image into every folder
- run every algorithm and put some graphs
- Add at least 2 environments per main algorithm
- unify all
- consistent signature across all algorithms
- Add Atari onto DQN
- add logging
- To csv
- explain folder structure
- experiment manager
- add no test disclaimer
- separate plotting from main execution (use intermidiary csv)
- vectorized environments. (Check
) - unit tests to verify all algos work (take a look at
) - Allow for video saving
- Allow for model saving
- Create
class withrun_episode
- In case you run into ROM license troubles when running
, run
pip install "gym[atari,accept-rom-license]"
Be aware that this accepts the ROM license for you.
- This "squeezes" dim=1 of the array. Useful for when you're working with slightly different tensor shapes. Ex: tensor of dims [32,1].squeeze() -> [32,]
- Kind of the reverse of
, as it adds in one dimension. Ex: tensor of dims [32,].unsqueeze(1) -> [32, 1]
- Kind of the reverse of
- Like
but you could do weirder manipulations (change several dimensions). Ex tensor of shape [2,2] .view(4,1) -> [4,1]
- Like
- Detaches tensor from gradient calculations. Useful when you want to make predicitons, without backpropagating. ( Ex: DDQN target network)