Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce the benchmark results of DQN (vanilla and PDD) on Breakout #983

Open
asiddharth opened this issue Aug 9, 2019 · 6 comments

Comments

@asiddharth
Copy link

asiddharth commented Aug 9, 2019

Hi @DanielTakeshi ,
I am facing the same issue where the vanilla DQN and the PDD DQN agents are not learning as expected on BreakoutNoFrameskip-v4.

I copied over the hyper parameters and the exploration schedule mentioned above (in issue #672). I am running the experiments with this baselines commit.

Here is a list of the hyper parameters being used (I modified defaults.py) for PDD-DQN.
network='conv_only',
lr=1e-4,
buffer_size=int(1e6),
exploration_fraction=0.1,
exploration_final_eps=0.01,
train_freq=4,
learning_starts=80000,
target_network_update_freq=40000,
gamma=0.99,
prioritized_replay=True,
prioritized_replay_alpha=0.6,
checkpoint_freq=10000,
checkpoint_path=None,
dueling=True

I used the same hyperparameters but set dueling=False, prioritized_replay=False in defaults.py, and set double_q to false in build_graph.py for the vanilla DQN agent.

As mentioned in the readme, I also tried to reproduce results with commit (7bfbcf1), without a changing the hyperparameters. But I was not able to reproduce the results.

Would be really helpful if you could please let me know if I am doing anything wrong, and if any other hyper parameter combination is better.

Thanks!

Some results with the changed hyper parameters and code commit.
Results for PDD-DQN :

| % time spent exploring | 2 |
| episodes | 5.1e+04 |
| mean 100 episode reward | 22.1 |
| steps | 8.34e+06 |

Saving model due to mean reward increase: 21.5 -> 22.4

Results for vanilla DQN :

| % time spent exploring | 1 |
| episodes | 5.17e+04 |
| mean 100 episode reward | 23.2 |
| steps | 1.05e+07 |

(Highest score for vanilla DQN is 28 at this point)

Originally posted by @asiddharth in #672 (comment)

@DanielTakeshi
Copy link

@asiddharth I'll run Breakout on PDD-DQN again once I find the time / resources

@ZaneH1992
Copy link

ZaneH1992 commented Sep 4, 2019

same thing here, old version works.

@asiddharth
Copy link
Author

@ZaneH1992 are you able to reproduce the results?

@christopherhesse
Copy link
Contributor

@asiddharth
Copy link
Author

asiddharth commented Nov 2, 2019

@christopherhesse The agent reached a score of 22.4 after 8.3M frames on the PDD-DQN, and a score of 28 on vanilla DQN after 10M frames. These scores are way below the ones listed in the link above.

@Bowen-He
Copy link

@asiddharth Hi, have you solve the problem with vanilla DQN? I'm also encounting the same problem here with all of the steps you mentioned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants