tensorpack/examples/A3C-Gym at master · PeisenZhao/tensorpack

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
atari_wrapper.py		atari_wrapper.py
common.py		common.py
simulator.py		simulator.py
train-atari.py		train-atari.py

README.md

A3C code and models for Atari games in gym

Multi-GPU version of the A3C algorithm in Asynchronous Methods for Deep Reinforcement Learning, with <500 lines of code.

Results of the same code trained on 47 different Atari games were uploaded on OpenAI Gym. You can see them in my gym page. Most of them are the best reproducible results on gym.

To train on an Atari game:

./train-atari.py --env Breakout-v0 --gpu 0

In each iteration it trains on a batch of 128 new states. The speed is about 6~10 iterations/s on 1 GPU plus 12+ CPU cores. With 2 TitanX + 20+ CPU cores, by setting SIMULATOR_PROC=240, PREDICT_BATCH_SIZE=30, PREDICTOR_THREAD_PER_GPU=6, it can improve to 16 it/s (2K images/s). Note that the network architecture is larger than what's used in the original paper.

The uploaded models are all trained with 4 GPUs for about 2 days. But on simple games like Breakout, you can get good performance within several hours. Also note that multi-GPU doesn't give you obvious speedup here, because the bottleneck in this implementation is not computation but data.

Some practicical notes:

Prefer Python 3.
Occasionally, processes may not get terminated completely. It is suggested to use systemd-run to run any multiprocess Python program to get a cgroup dedicated for the task.
Training with a significant slower speed (e.g. on CPU) will result in very bad score, probably because of the slightly off-policy implementation.

To test a model:

Download models from model zoo.

Watch the agent play: ./train-atari.py --task play --env Breakout-v0 --load Breakout-v0.npz

Generate gym submissions: ./train-atari.py --task gen_submit --load Breakout-v0.npz --env Breakout-v0 --output output_dir

Models are available for the following atari environments (click to watch videos of my agent):


AirRaid	Alien	Amidar	Assault
Asterix	Asteroids	Atlantis	BankHeist
BattleZone	BeamRider	Berzerk	Breakout
Carnival	Centipede	ChopperCommand	CrazyClimber
DemonAttack	DoubleDunk	ElevatorAction	FishingDerby
Frostbite	Gopher	Gravitar	IceHockey
Jamesbond	JourneyEscape	Kangaroo	Krull
KungFuMaster	MsPacman	NameThisGame	Phoenix
Pong	Pooyan	Qbert	Riverraid
RoadRunner	Robotank	Seaquest	SpaceInvaders
StarGunner	Tennis	Tutankham	UpNDown
VideoPinball	WizardOfWor	Zaxxon

Note that atari game settings in gym (AtariGames-v0) are quite different from DeepMind papers, so the scores are not comparable. The most notable differences are:

Each action is randomly repeated 2~4 times.
Inputs are RGB instead of greyscale.
An episode is limited to 10000 steps.
Lost of live is not end of episode.

Also see the DQN implementation here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A3C-Gym

A3C-Gym

README.md

A3C code and models for Atari games in gym

To train on an Atari game:

To test a model:

Files

A3C-Gym

Directory actions

More options

Directory actions

More options

Latest commit

History

A3C-Gym

Folders and files

parent directory

README.md

A3C code and models for Atari games in gym

To train on an Atari game:

To test a model: