Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Success report and request for help #201

Open
bwanab opened this issue Aug 4, 2023 · 3 comments
Open

Success report and request for help #201

bwanab opened this issue Aug 4, 2023 · 3 comments

Comments

@bwanab
Copy link

bwanab commented Aug 4, 2023

As I mentioned in another issue, I've been working on training an AI Agent to play Othello/reversi. I wanted to report that I've had some pretty decent success using AlphaZero.jl. Much more than I was able to achieve with PYTorch, TensorFlow or Flux.jl. That's the good news. The not-so-good news is that while I've gotten a relatively good player, it's still not that great. It easily beats really bad players (like me) and can play 50/50 against a basic MinMax heuristic (translated from https://github.com/sadeqsheikhi/reversi_python_ai).

In my training, I've done around 25 iterations (the repository is here: https://git.sr.ht/~bwanab/AZ_Reversi.jl). The loss seems to have flatlined at around 10 iteration and very gradually slopes upward after that.

Are there any particular hyper-parameters that I should look at? One thing I tried that didn't seem to make much difference was making the net a little bigger by changing the number of blocks from 5 to 8.

@bwanab
Copy link
Author

bwanab commented Aug 7, 2023

Replying to myself, but I've found that by increasing the timeout when creating the AlphaZeroPlayer, the level of play gets much better. For example, in the case I gave above of playing 50/50 against the MinMax heuristic, using a 5 second timeout instead of the default 2 seconds, raises the level to more like 80/20. At 10 seconds, MinMax can't beat it.

If anybody has insight into this I'd love to hear it.

@jonathan-laurent
Copy link
Owner

Thanks for reporting on your experience! Tuning AlphaZero can be pretty hard indeed. Can I see some of the automatically generated metrics and graphs in your experiment?

@bwanab
Copy link
Author

bwanab commented Aug 12, 2023

benchmark_reward
loss

These are the ones that seem to have the most information in them to me, but that might be my ignorance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants