Skip to content

Latest commit

 

History

History
57 lines (41 loc) · 1.93 KB

README.md

File metadata and controls

57 lines (41 loc) · 1.93 KB

In this project we designed a game AntiAZ, which is hard for AlphaZero. our high level idea is to define the goal of the game as the parity of a specially-designed hash code of the final position so that neural networks are unable to even understand what kind of final positions is win/lose.

alt text

The code is written in Python 3.6.

Minimax vs Random

  • Set the depth for minimax in agent.py by modifying MINIMAX_DEPTH. The default depth is 9
  • Run the following command
python agent.py
  • The output consists of two numbers: the number of games won by Random, and the number of games won by Minimax

Resnet

  • To generate the data for training, run the following in Python
from data import DATA
data = DATA(9) #board size is 9x9
data.gen("trn", 800000) #800000 positions for training
data.gen("vld", 100000) #100000 positions for validation
data.gen("tst", 100000) #100000 positions for testing
  • After generating the data, run the following command to train the resnet
python resnet.py

AlphaZero General

  • Go to directory azgeneral
  • To train the model, run
python main.py
  • To test the performance of the current best model with the random player, run
python pit.py
  • For testing, there are two configurable parameters in pit.py

    • Number of MCTS simulations per move, which can be set in line 23 (default: 24)
    • Number of games to play, which can be set in line 35 (default: 100)
  • The output format is the same as AlphaZero General, i.e. in the last line of the output, there is a tuple (W, L, D), where W stands for the number of games won by AlphaZero General, L for lost and D for drawn.

  • If you want to use the model trained by us, create a directory named temp in azgeneral and copy model/best.pth.tar to azgeneral/temp/