Tuned DoubleDQN with prioritized experience replay #302

muupan · 2018-08-31T07:28:21Z

Merge #301 first.

This PR improves train_dqn_ale.py:

use tuned DoubleDQN as better default settings (used in the Double DQN paper)
add --prioritized to enable prioritized experience replay
remove --activation and --use-sdl

muupan · 2018-08-31T08:13:49Z

Below are preliminary results with --max-episode-len 4500. Results with --max-episode-len 27000 are not ready yet. Prioritized experience replay helps, and lr/4 stabilizes training. Results of Seaquest looks weird, though.

muupan · 2018-09-18T08:50:32Z

Now the results with --max-episode-len 27000 are ready (except DQN). BeamRider's scores are affected by 4500 vs 27000.

Commands:

DQN (before this PR, only 10M steps): examples/ale/train_dqn_ale.py --env {env_id}
Tuned DoubleDQN + prioritized replay lr/4 (after this PR): examples/ale/train_dqn_ale.py --eval-interval 1000000 --prioritized --lr 6.25e-5 --env {env_id}
Tuned DoubleDQN + prioritized replay (after this PR): examples/ale/train_dqn_ale.py --eval-interval 1000000 --prioritized --env {env_id}
Tuned DoubleDQN (after this PR): examples/ale/train_dqn_ale.py --eval-interval 1000000 --env {env_id}

Each configuration was ran with three different random seeds. The figures below show the average evaluation scores with confidence bounds are shown.

muupan · 2018-10-01T04:48:08Z

I'll merge it because it's reviewed and approved by @prabhatnagarajan

prabhatnagarajan

LGTM.

muupan added 7 commits August 31, 2018 15:53

Remove --use-sdl since it is no longer used

6c300ff

Remove --activation since relu is used almost always

54159cb

Add doubledqn arch

c2acd6e

Add --prioritized to use prioritized experience replay

e9a0e8b

Add --lr

864ef71

Use settings for tuned DoubleDQN as default

79a873f

Use --target-update-interval 30000 following tuned DoubleDQN

9bf522c

muupan changed the title ~~Use prioritized replay~~ Tuned DoubleDQN with prioritized experience replay Aug 31, 2018

Fix a grammatical error

5d30efc

Merge branch 'master' into replicate-prioritized-replay-example

a62d7b9

muupan added a commit to muupan/chainerrl that referenced this pull request Sep 15, 2018

Follow chainer#302

a27d4fe

muupan merged commit 24b1b1c into chainer:master Oct 1, 2018

prabhatnagarajan reviewed Oct 1, 2018

View reviewed changes

prabhatnagarajan mentioned this pull request Oct 22, 2018

Adds N-step learning for DQN-based agents. #317

Merged

muupan added this to the v0.5 milestone Nov 13, 2018

muupan added the example label Nov 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tuned DoubleDQN with prioritized experience replay #302

Tuned DoubleDQN with prioritized experience replay #302

muupan commented Aug 31, 2018 •

edited

Loading

muupan commented Aug 31, 2018 •

edited

Loading

muupan commented Sep 18, 2018 •

edited

Loading

muupan commented Oct 1, 2018

prabhatnagarajan left a comment

Tuned DoubleDQN with prioritized experience replay #302

Tuned DoubleDQN with prioritized experience replay #302

Conversation

muupan commented Aug 31, 2018 • edited Loading

muupan commented Aug 31, 2018 • edited Loading

muupan commented Sep 18, 2018 • edited Loading

muupan commented Oct 1, 2018

prabhatnagarajan left a comment

Choose a reason for hiding this comment

muupan commented Aug 31, 2018 •

edited

Loading

muupan commented Aug 31, 2018 •

edited

Loading

muupan commented Sep 18, 2018 •

edited

Loading