Success report and request for help #201

bwanab · 2023-08-04T19:04:07Z

As I mentioned in another issue, I've been working on training an AI Agent to play Othello/reversi. I wanted to report that I've had some pretty decent success using AlphaZero.jl. Much more than I was able to achieve with PYTorch, TensorFlow or Flux.jl. That's the good news. The not-so-good news is that while I've gotten a relatively good player, it's still not that great. It easily beats really bad players (like me) and can play 50/50 against a basic MinMax heuristic (translated from https://github.com/sadeqsheikhi/reversi_python_ai).

In my training, I've done around 25 iterations (the repository is here: https://git.sr.ht/~bwanab/AZ_Reversi.jl). The loss seems to have flatlined at around 10 iteration and very gradually slopes upward after that.

Are there any particular hyper-parameters that I should look at? One thing I tried that didn't seem to make much difference was making the net a little bigger by changing the number of blocks from 5 to 8.

bwanab · 2023-08-07T17:46:33Z

Replying to myself, but I've found that by increasing the timeout when creating the AlphaZeroPlayer, the level of play gets much better. For example, in the case I gave above of playing 50/50 against the MinMax heuristic, using a 5 second timeout instead of the default 2 seconds, raises the level to more like 80/20. At 10 seconds, MinMax can't beat it.

If anybody has insight into this I'd love to hear it.

jonathan-laurent · 2023-08-10T18:29:33Z

Thanks for reporting on your experience! Tuning AlphaZero can be pretty hard indeed. Can I see some of the automatically generated metrics and graphs in your experiment?

bwanab · 2023-08-12T13:33:40Z

These are the ones that seem to have the most information in them to me, but that might be my ignorance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Success report and request for help #201

Success report and request for help #201

bwanab commented Aug 4, 2023

bwanab commented Aug 7, 2023

jonathan-laurent commented Aug 10, 2023

bwanab commented Aug 12, 2023

Success report and request for help #201

Success report and request for help #201

Comments

bwanab commented Aug 4, 2023

bwanab commented Aug 7, 2023

jonathan-laurent commented Aug 10, 2023

bwanab commented Aug 12, 2023