Skip to content

Latest commit

 

History

History
228 lines (200 loc) · 10.4 KB

README.md

File metadata and controls

228 lines (200 loc) · 10.4 KB

TD-Gammon


Backgammon


Table of Contents


Features

  • PyTorch implementation of TD-Gammon [1].
  • Test the trained agents against an open source implementation of the Backgammon game, GNU Backgammon.
  • Play against a trained agent via web gui

Installation

I used Anaconda3, with Python 3.6.8 (I tested only with the following configurations).

Create the conda environment:

$ conda create --name tdgammon python=3.6
$ source activate tdgammon
(tdgammon) $ git clone https://github.com/dellalibera/td-gammon.git

Install the environment gym-backgammon:

(tdgammon) $ git clone https://github.com/dellalibera/gym-backgammon.git
(tdgammon) $ cd gym-backgammon
(tdgammon) $ pip install -e .

Install the dependencies pytorch v1.2:

(tdgammon) $ pip install torch torchvision
(tdgammon) $ pip install tb-nightly

or

(tdgammon) $ cd td-gammon/
(tdgammon) $ pip install -r requirements.txt

Without Anaconda Environment

If you don't use Anaconda environment, run the following commands:

git clone https://github.com/dellalibera/td-gammon.git
pip3 install -r td-gammon/requirements.txt
git clone https://github.com/dellalibera/gym-backgammon.git
cd gym-backgammon/
pip3 install -e .

If you don't use Anaconda environment, in the commands below replace python with python3.

GNU Backgammon

To play against gnubg, you have to install gnubg.
NOTE: I installed gnubg on Ubuntu 18.04 (running on a Virtual Machine), with Python 2.7 (see next section to see how to interact with GNU Backgammon).

On Ubuntu:

sudo apt-get install gnubg

How to interact with GNU Backgammon using Python Script?

I used an http server that runs on the Guest machine (Ubuntu), to receive commands and interact with the gnubg program.
In this way, it's possible to send commands from the Host machine (in my case MacOS).

The file bridge.py should be executed on the Guest Machine (the machine where gnubg is installed).

On Ubuntu:

gnubg -t -p /path/to/bridge.py

It runs the gnubg with the command-line instead of using the graphical interface (-t) and evaluates a Python code file and exits (-p).
For a list of parameters of gnubg, run gnubg --help.

The python script bridge.py creates an http server, running on localhost:8001.
If you want to modify the host and the port, change the following line in bridge.py:

if __name__ == "__main__":
    HOST = 'localhost' # <-- YOUR HOST HERE
    PORT = 8001  # <-- YOUR PORT HERE
    run(host=HOST, port=PORT)

The file td_gammon/gnubg/gnubg_backgammon.py sends messages/commands to gnubg and parses the response.


Usage

Run python /path/to/main.py --help for a list of parameters.

Train TD-Network

To train a neural network with a single layer with 40 hidden units, for 100000 games/episodes and save the model every 10000, run the following command:

(tdgammon) $ python /path/to/main.py train --save_path ./saved_models/exp1 --save_step 10000 --episodes 100000 --name exp1 --type nn --lr 0.1 --hidden_units 40

Run python /path/to/main.py train --help for a list of parameters available for training.


Evaluate Agent(s)

To evaluate an already trained models, you have to options: evaluate models to play against each other or evaluate one model against gnubg.
Run python /path/to/main.py evaluate --help for a list of parameters available for evaluation.

Agent vs Agent

To evaluate two model to play against each other you have to specify the path where the models are saved with the corresponding number of hidden units.

(tdgammon) $ python /path/to/main.py evaluate --episodes 50 --hidden_units_agent0 40 --hidden_units_agent1 40 --type nn --model_agent0 path/to/saved_models/agent0.tar --model_agent1 path/to/saved_models/agent1.tar

Agent vs gnubg

To evaluate one model to play against gnubg, first you have to run gnubg with the script bridge as input.
On Ubuntu (or where gnubg is installed)

gnubg -t -p /path/to/bridge.py

Then run (to play vs gnubg at intermediate level for 100 games):

(tdgammon) $ python /path/to/main.py evaluate --episodes 50 --hidden_units_agent0 40 --type nn --model_agent0 path/to/saved_models/agent0.tar vs_gnubg --difficulty beginner --host GNUBG_HOST --port GNUBG_PORT

The hidden units (--hidden_units_agent0) of the model must be same of the loaded model (--model_agent0).


Web Interface

You can play against a trained agent via a web gui:

(tdgammon) $ python /path/to/main.py gui --host localhost --port 8002 --model path/to/saved_models/agent0.tar --hidden_units 40 --type nn

Then navigate to http://localhost:8002 in your browser:

Web Interface

Run python /path/to/main.py gui --help for a list of parameters available about the web gui.


Plot Wins

Instead of evaluating the agent during training (it can require some time especially if you evaluate against gnubg - difficulty world_class), you can load all the saved models in a folder, and evaluate each model (saved at different time during training) against one or more opponents.
The models in the directory should be of the same type (i.e the structure of the network should be the same for all the models in the same folder).

To plot the wins against gnubg, run on Ubuntu (or where gnubg is installed):

gnubg -t -p /path/to/bridge.py

In the example below the trained model is going to be evaluated against gnubg on two different difficulties levels - beginner and advanced:`

(tdgammon) $ python /path/to/main.py plot --save_path /path/to/saved_models/myexp --hidden_units 40 --episodes 10 --opponent random,gnubg --dst /path/to/experiments --type nn --difficulty beginner,advanced --host GNUBG_HOST --port GNUBG_PORT

To visualize the plots:

(tdgammon) $ tensorboard --logdir=runs/path/to/experiment/ --host localhost --port 8001

Run python /path/to/main.py plot --help for a list of parameters available about plotting.

Backgammon OpenAI Gym Environment

For a detailed description of the environment: gym-backgammon.


Bibliography, sources of inspiration, related works


License

MIT

Results

exp_20221230_1048_41_259706_100000.tar is a 40 neuron NN trained for 100k steps, wins about 73% of games against beginner, 50% of games against intermediate, and fewer against advanced and world class Took about 10 minutes to train with 16 processes, in hindsight, it actually wins about 55% of games, which is probably close enough

exp_20221230_1713_28_521264_1000000.tar is a 40 neuron NN trained for 1 million steps Took about 2.2 hours to train with 12 processes .57 against intermediate, .43 against advanced .33 against world-class .77 against beginner

eval_net is a 40 neuron NN. It is about .6 against intermediate and .4 against advanced.

exp_20221230_1028_54_571857_10000.tar .33 against intermediate exp_20221230_1048_41_259706_100000.tar is 539 / 1000 for intermediate

// 2023_01_03_20_10_49 exp_510000.tar;{'intermediate': 70, 'advanced': 54} //lets see how we do over 1k episodes on gnubg won 492/1000 for 49.2%