-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass env_id to replay buffer methods to correctly support batch training #442
Pass env_id to replay buffer methods to correctly support batch training #442
Conversation
to correctly handles when episodes end in batch training
I checked the effect of the bug fixed by this PR. I added the
Before this PR, |
# It should have: | ||
# - 4 transitions from env_id=1 | ||
# - 5 transitions from env_id=2 | ||
self.assertEqual(len(rbuf), 9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we have 3 transitions? One from env_id=1
and five from env_id=2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Four from env_id=1
: (s_0, s_4), (s_1, s_4), (s_2, s_4), and (s_3, s_4), since the transition to s_4 is terminal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Merge at will.
# It should have: | ||
# - 4 transitions from env_id=1 | ||
# - 5 transitions from env_id=2 | ||
self.assertEqual(len(rbuf), 9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
Merge #443 before this PR.Current replay buffers with num_steps > 1 or episodic are not correct in batch training because they cannot know which env a given transition came from.
This PR adds the
env_id
argument to two methods of replay buffers:append
andstop_current_episode
. Fromenv_id
replay buffers can know which env a given transition came from and which env's episode is stopped.TODO: