-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code to collect demonstrations from an agent. #468
Code to collect demonstrations from an agent. #468
Conversation
examples/ale/collect_demos_ale.py
Outdated
help='Random seed [0, 2 ** 31)') | ||
parser.add_argument('--gpu', type=int, default=0, | ||
help='GPU to use, set to -1 if no GPU.') | ||
parser.add_argument('--demo', action='store_true', default=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
demo
is not used. (always true
in this script)
examples/ale/collect_demos_ale.py
Outdated
|
||
opt.setup(q_func) | ||
|
||
rbuf = replay_buffer.ReplayBuffer(10 ** 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on a situation, but I think we should clarify that opt/rbuf
are dummies. (but agent
requires them)
CI fails 😢 |
clip_rewards=False) | ||
env.seed(int(args.seed)) | ||
# Randomize actions like epsilon-greedy | ||
env = chainerrl.wrappers.RandomizeAction(env, 0.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this randomization intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We wouldn't want to collect demonstrations that are identical in a deterministic env.
gpu="$1" | ||
|
||
# Chainer 4 does not support open_pickle_dataset_writer, which is used by the demonstration collection example. | ||
pickle_writer_support=$(python -c "import chainer; from distutils.version import StrictVersion; print(1 if StrictVersion(chainer.__version__) >= StrictVersion('5.0.0') else 0)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On some environment this line may not work because users might use python3
instead of python
.
I think now this line can be removed because we no longer support Chainer V4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the Chainer v4 support has been dropped yet. That will be done separately in a different PR.
Regarding your other comment, I'm following this: https://github.com/chainer/chainerrl/pull/204/files#diff-875175ed9e296e37b03562cf27435d32R11.
And wouldn't your python2
vs. python3
comment apply to all example tests? E.g. https://github.com/chainer/chainerrl/blob/master/examples_tests/atari/test_a2c.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're completely right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.