Cannot reproduce the benchmark results of DQN (vanilla and PDD) on Breakout #983

asiddharth · 2019-08-09T08:52:31Z

Hi @DanielTakeshi ,
I am facing the same issue where the vanilla DQN and the PDD DQN agents are not learning as expected on BreakoutNoFrameskip-v4.

I copied over the hyper parameters and the exploration schedule mentioned above (in issue #672). I am running the experiments with this baselines commit.

Here is a list of the hyper parameters being used (I modified defaults.py) for PDD-DQN.
network='conv_only',
lr=1e-4,
buffer_size=int(1e6),
exploration_fraction=0.1,
exploration_final_eps=0.01,
train_freq=4,
learning_starts=80000,
target_network_update_freq=40000,
gamma=0.99,
prioritized_replay=True,
prioritized_replay_alpha=0.6,
checkpoint_freq=10000,
checkpoint_path=None,
dueling=True

I used the same hyperparameters but set dueling=False, prioritized_replay=False in defaults.py, and set double_q to false in build_graph.py for the vanilla DQN agent.

As mentioned in the readme, I also tried to reproduce results with commit (7bfbcf1), without a changing the hyperparameters. But I was not able to reproduce the results.

Would be really helpful if you could please let me know if I am doing anything wrong, and if any other hyper parameter combination is better.

Thanks!

Some results with the changed hyper parameters and code commit.
Results for PDD-DQN :

| % time spent exploring | 2 |
| episodes | 5.1e+04 |
| mean 100 episode reward | 22.1 |
| steps | 8.34e+06 |

Saving model due to mean reward increase: 21.5 -> 22.4

Results for vanilla DQN :

| % time spent exploring | 1 |
| episodes | 5.17e+04 |
| mean 100 episode reward | 23.2 |
| steps | 1.05e+07 |

(Highest score for vanilla DQN is 28 at this point)

Originally posted by @asiddharth in #672 (comment)

The text was updated successfully, but these errors were encountered:

DanielTakeshi · 2019-08-12T22:43:32Z

@asiddharth I'll run Breakout on PDD-DQN again once I find the time / resources

ZaneH1992 · 2019-09-04T09:18:52Z

same thing here, old version works.

asiddharth · 2019-09-09T16:04:55Z

@ZaneH1992 are you able to reproduce the results?

christopherhesse · 2019-10-25T22:49:32Z

How do your results compare to http://htmlpreview.github.io/?https://github.com/openai/baselines/blob/master/benchmarks_atari10M.htm?

asiddharth · 2019-11-02T14:34:00Z

@christopherhesse The agent reached a score of 22.4 after 8.3M frames on the PDD-DQN, and a score of 28 on vanilla DQN after 10M frames. These scores are way below the ones listed in the link above.

Bowen-He · 2021-02-22T17:09:17Z

@asiddharth Hi, have you solve the problem with vanilla DQN? I'm also encounting the same problem here with all of the steps you mentioned

This was referenced Nov 1, 2019

Cannot reproduce DQN Breakout baseline araffin/rl-baselines-zoo#49

Closed

Cannot reproduce the benchmark results of DQN on Breakout #672

Closed

Inclusion of baseline results araffin/rl-baselines-zoo#48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce the benchmark results of DQN (vanilla and PDD) on Breakout #983

Cannot reproduce the benchmark results of DQN (vanilla and PDD) on Breakout #983

asiddharth commented Aug 9, 2019 •

edited

Loading

DanielTakeshi commented Aug 12, 2019

ZaneH1992 commented Sep 4, 2019 •

edited

Loading

asiddharth commented Sep 9, 2019

christopherhesse commented Oct 25, 2019

asiddharth commented Nov 2, 2019 •

edited

Loading

Bowen-He commented Feb 22, 2021

Cannot reproduce the benchmark results of DQN (vanilla and PDD) on Breakout #983

Cannot reproduce the benchmark results of DQN (vanilla and PDD) on Breakout #983

Comments

asiddharth commented Aug 9, 2019 • edited Loading

Some results with the changed hyper parameters and code commit. Results for PDD-DQN :

| % time spent exploring | 2 | | episodes | 5.1e+04 | | mean 100 episode reward | 22.1 | | steps | 8.34e+06 |

Results for vanilla DQN :

| % time spent exploring | 1 | | episodes | 5.17e+04 | | mean 100 episode reward | 23.2 | | steps | 1.05e+07 |

DanielTakeshi commented Aug 12, 2019

ZaneH1992 commented Sep 4, 2019 • edited Loading

asiddharth commented Sep 9, 2019

christopherhesse commented Oct 25, 2019

asiddharth commented Nov 2, 2019 • edited Loading

Bowen-He commented Feb 22, 2021

asiddharth commented Aug 9, 2019 •

edited

Loading

Some results with the changed hyper parameters and code commit.
Results for PDD-DQN :

| % time spent exploring | 2 |
| episodes | 5.1e+04 |
| mean 100 episode reward | 22.1 |
| steps | 8.34e+06 |

| % time spent exploring | 1 |
| episodes | 5.17e+04 |
| mean 100 episode reward | 23.2 |
| steps | 1.05e+07 |

ZaneH1992 commented Sep 4, 2019 •

edited

Loading

asiddharth commented Nov 2, 2019 •

edited

Loading