-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot reproduce the benchmark results of DQN (vanilla and PDD) on Breakout #983
Comments
@asiddharth I'll run Breakout on PDD-DQN again once I find the time / resources |
same thing here, old version works. |
@ZaneH1992 are you able to reproduce the results? |
How do your results compare to http://htmlpreview.github.io/?https://github.com/openai/baselines/blob/master/benchmarks_atari10M.htm? |
@christopherhesse The agent reached a score of 22.4 after 8.3M frames on the PDD-DQN, and a score of 28 on vanilla DQN after 10M frames. These scores are way below the ones listed in the link above. |
@asiddharth Hi, have you solve the problem with vanilla DQN? I'm also encounting the same problem here with all of the steps you mentioned |
Hi @DanielTakeshi ,
I am facing the same issue where the vanilla DQN and the PDD DQN agents are not learning as expected on BreakoutNoFrameskip-v4.
I copied over the hyper parameters and the exploration schedule mentioned above (in issue #672). I am running the experiments with this baselines commit.
Here is a list of the hyper parameters being used (I modified defaults.py) for PDD-DQN.
network='conv_only',
lr=1e-4,
buffer_size=int(1e6),
exploration_fraction=0.1,
exploration_final_eps=0.01,
train_freq=4,
learning_starts=80000,
target_network_update_freq=40000,
gamma=0.99,
prioritized_replay=True,
prioritized_replay_alpha=0.6,
checkpoint_freq=10000,
checkpoint_path=None,
dueling=True
I used the same hyperparameters but set dueling=False, prioritized_replay=False in defaults.py, and set double_q to false in build_graph.py for the vanilla DQN agent.
As mentioned in the readme, I also tried to reproduce results with commit (7bfbcf1), without a changing the hyperparameters. But I was not able to reproduce the results.
Would be really helpful if you could please let me know if I am doing anything wrong, and if any other hyper parameter combination is better.
Thanks!
Some results with the changed hyper parameters and code commit.
Results for PDD-DQN :
| % time spent exploring | 2 |
| episodes | 5.1e+04 |
| mean 100 episode reward | 22.1 |
| steps | 8.34e+06 |
Saving model due to mean reward increase: 21.5 -> 22.4
Results for vanilla DQN :
| % time spent exploring | 1 |
| episodes | 5.17e+04 |
| mean 100 episode reward | 23.2 |
| steps | 1.05e+07 |
(Highest score for vanilla DQN is 28 at this point)
Originally posted by @asiddharth in #672 (comment)
The text was updated successfully, but these errors were encountered: