Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuned DoubleDQN with prioritized experience replay #302

Merged
merged 9 commits into from
Oct 1, 2018

Conversation

muupan
Copy link
Member

@muupan muupan commented Aug 31, 2018

Merge #301 first.

This PR improves train_dqn_ale.py:

  • use tuned DoubleDQN as better default settings (used in the Double DQN paper)
  • add --prioritized to enable prioritized experience replay
  • remove --activation and --use-sdl

@muupan muupan changed the title Use prioritized replay Tuned DoubleDQN with prioritized experience replay Aug 31, 2018
@muupan
Copy link
Member Author

muupan commented Aug 31, 2018

Below are preliminary results with --max-episode-len 4500. Results with --max-episode-len 27000 are not ready yet. Prioritized experience replay helps, and lr/4 stabilizes training. Results of Seaquest looks weird, though.

asterixnoframeskip-v4
beamridernoframeskip-v4
breakoutnoframeskip-v4
qbertnoframeskip-v4
seaquestnoframeskip-v4
spaceinvadersnoframeskip-v4

muupan added a commit to muupan/chainerrl that referenced this pull request Sep 15, 2018
@muupan
Copy link
Member Author

muupan commented Sep 18, 2018

Now the results with --max-episode-len 27000 are ready (except DQN). BeamRider's scores are affected by 4500 vs 27000.

Commands:

  • DQN (before this PR, only 10M steps): examples/ale/train_dqn_ale.py --env {env_id}
  • Tuned DoubleDQN + prioritized replay lr/4 (after this PR): examples/ale/train_dqn_ale.py --eval-interval 1000000 --prioritized --lr 6.25e-5 --env {env_id}
  • Tuned DoubleDQN + prioritized replay (after this PR): examples/ale/train_dqn_ale.py --eval-interval 1000000 --prioritized --env {env_id}
  • Tuned DoubleDQN (after this PR): examples/ale/train_dqn_ale.py --eval-interval 1000000 --env {env_id}

Each configuration was ran with three different random seeds. The figures below show the average evaluation scores with confidence bounds are shown.

asterixnoframeskip-v4
beamridernoframeskip-v4
breakoutnoframeskip-v4
qbertnoframeskip-v4
seaquestnoframeskip-v4
spaceinvadersnoframeskip-v4

@muupan
Copy link
Member Author

muupan commented Oct 1, 2018

I'll merge it because it's reviewed and approved by @prabhatnagarajan

@muupan muupan merged commit 24b1b1c into chainer:master Oct 1, 2018
Copy link
Contributor

@prabhatnagarajan prabhatnagarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants