Pass env_id to replay buffer methods to correctly support batch training #442

muupan · 2019-04-12T16:57:23Z

~~Merge #443 before this PR.~~

Current replay buffers with num_steps > 1 or episodic are not correct in batch training because they cannot know which env a given transition came from.

This PR adds the env_id argument to two methods of replay buffers: append and stop_current_episode. From env_id replay buffers can know which env a given transition came from and which env's episode is stopped.

TODO:

check how it affects scores with n-step return and batch training

to correctly handles when episodes end in batch training

muupan · 2019-04-18T05:49:17Z

I checked the effect of the bug fixed by this PR. I added the --n-step-return option to examples/ale/train_dqn_batch_ale.py and run it with --n-step-return 1 and --n-step-return 3, using 3 different random seeds:

python3 examples/ale/train_dqn_batch_ale.py --num-envs 8 --n-step-return 1 --steps 10000000 --env SpaceInvadersNoFrameskip-v4

python3 examples/ale/train_dqn_batch_ale.py --num-envs 8 --n-step-return 3 --steps 10000000 --env SpaceInvadersNoFrameskip-v4

Before this PR, --n-step-return 3 completely failed. After this PR, --n-step-return 3 learned faster than --n-step-return 1 as expected.

prabhatnagarajan · 2019-05-22T15:38:13Z

tests/test_replay_buffer.py

+        # It should have:
+        #   - 4 transitions from env_id=1
+        #   - 5 transitions from env_id=2
+        self.assertEqual(len(rbuf), 9)


Shouldn't we have 3 transitions? One from env_id=1 and five from env_id=2?

Four from env_id=1: (s_0, s_4), (s_1, s_4), (s_2, s_4), and (s_3, s_4), since the transition to s_4 is terminal.

prabhatnagarajan

Looks good to me. Merge at will.

prabhatnagarajan · 2019-05-23T05:04:55Z

tests/test_replay_buffer.py

+        # It should have:
+        #   - 4 transitions from env_id=1
+        #   - 5 transitions from env_id=2
+        self.assertEqual(len(rbuf), 9)


Pass env_id to replay buffer methods

ff611cd

to correctly handles when episodes end in batch training

muupan changed the title ~~Pass env_id to replay buffer methods to correctly handles when episodes end in batch training~~ Pass env_id to replay buffer methods to correctly support batch training Apr 12, 2019

prabhatnagarajan self-requested a review April 12, 2019 16:58

muupan changed the title ~~Pass env_id to replay buffer methods to correctly support batch training~~ [WIP] Pass env_id to replay buffer methods to correctly support batch training Apr 12, 2019

muupan added 5 commits April 13, 2019 02:29

Fix style

0818519

Merge branch 'master' into add-env-id-to-replay-buffer

9fe594c

Pass env_id in DDPG as well

e1a0c42

Fix

53c3005

Merge branch 'vector-frame-stack' into add-env-id-to-replay-buffer

dcbc121

muupan added the bug label Apr 15, 2019

Add --n-step-return

9cb81d9

muupan changed the title ~~[WIP] Pass env_id to replay buffer methods to correctly support batch training~~ Pass env_id to replay buffer methods to correctly support batch training Apr 18, 2019

Merge branch 'master' into add-env-id-to-replay-buffer

703cee0

prabhatnagarajan suggested changes May 22, 2019

View reviewed changes

chainer deleted a comment from muupan May 23, 2019

prabhatnagarajan approved these changes May 23, 2019

View reviewed changes

prabhatnagarajan merged commit 350b257 into chainer:master May 23, 2019

muupan deleted the add-env-id-to-replay-buffer branch May 23, 2019 05:14

muupan added this to the v0.7 milestone Jun 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass env_id to replay buffer methods to correctly support batch training #442

Pass env_id to replay buffer methods to correctly support batch training #442

muupan commented Apr 12, 2019 •

edited

Loading

muupan commented Apr 18, 2019

prabhatnagarajan May 22, 2019

muupan May 23, 2019 •

edited

Loading

prabhatnagarajan May 23, 2019

prabhatnagarajan left a comment

prabhatnagarajan May 23, 2019

Pass env_id to replay buffer methods to correctly support batch training #442

Pass env_id to replay buffer methods to correctly support batch training #442

Conversation

muupan commented Apr 12, 2019 • edited Loading

muupan commented Apr 18, 2019

prabhatnagarajan May 22, 2019

Choose a reason for hiding this comment

muupan May 23, 2019 • edited Loading

Choose a reason for hiding this comment

prabhatnagarajan May 23, 2019

Choose a reason for hiding this comment

prabhatnagarajan left a comment

Choose a reason for hiding this comment

prabhatnagarajan May 23, 2019

Choose a reason for hiding this comment

muupan commented Apr 12, 2019 •

edited

Loading

muupan May 23, 2019 •

edited

Loading