-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ABR_Sim Results Replication #8
Comments
Thanks for sharing! :) |
I noticed there are several implemented baseline agents for the task
I am wondering if it possible to share the codes of these baseline agents? Thus I can learn how to implement the classical algorithms and use them for comparison in Park. Thanks a lot for your help! |
I will just copy&paste the agent's code here for the classical control since they are not too long.
and
|
Thanks a lot for the code! |
I had discussed with @hongzimao issues with replicating results on ABRSimEnv
This post doesn’t need a response, just posting here so others can learn from it.
I had initially had issues replicating results on the ABRSimEnv.
The A2C agent in the Park paper contains scores of around ~420+-210
I was able to replicate the scores on ABR using code from @hongzimao here: abr_agents.zip
Entropy Ratio | Average Episode Score and Standard Deviation for 100,000 actions
10.0: | 517.3681106430971 +- 405.73426203813045
5.0: | 524.5324282999072 +- 400.950983685324
I was able to reach similar results using the same parameters in an A2C agent from stable-baselines modified with entropy decay, and a vf_coef of 0.25 a2c_stable_baselines.zip
Entropy Ratio | Average Episode Score and Standard Deviation for 100,000 actions
10.0 | 441.72765 +- 343.60534
5.0 | 420.04653 +- 178.98197
However, I initially ran the same experiments with RMSProp (default parameters) for optimization and was not able to beat the robustMPC and buffer based heuristics.
Thanks for the help!!
The text was updated successfully, but these errors were encountered: