APS360 Summer 2021 Team Repo (Group 3)
Team members: John Lee, Kevin Karam, Morgan Tran, Joanne Tan
Baseline Model Link: https://colab.research.google.com/drive/1hBLG7VLoNU6S6vvo8Ff9rpU0ix5F1Wwj?usp=sharing
Main Model Link: https://colab.research.google.com/drive/1gLG2Td7MzgWcH2JUzr1E3Rst3kRTovYm?usp=sharing
The goal of this project is to generate Forsyth-Edwards Notation (FEN) descriptions for a particular chess game from a picture of a 2D-board. Machine learning is a reasonable approach for this project as it involves multi-classification of 12 chess pieces and properly outputting FEN. The motivation behind the project was based on an overall team interest in chess and how to effectively store and use the chessboard data (FEN) from games. With the recent rise in popularity in chess, this project has big implications on the chess community where people can easily take images from a professional game and quickly analyze it.
Link to full presentation here. For demo, please refer to timestamp 4:25 - 5:38
Quick reference:
- Taking screenshot from a video, we input the image into our model
- Our model outputs an image with bounding boxes and class predicitions of chess pieces it has detected
- Reading the text file generated by our model (running some code), we extract the bounding box predictions and convert to FEN
- Check our work: copying the generated FEN notation, paste it in lichess to recreate the chess board from the screenshot
- Ta-da! The generated board on lichess is the same as the screenshot :)
To evaluate our model our team needed to find a method that would be better than randomly guessing. The team decided to make a hand coded heuristic model that just identifies the piece and colour when given a square image of a chess piece. We did this by first setting aside a base set of “default” pieces to compare against. This base set was then converted to numpy arrays so we could easily compare them against the input. When the baseline model gets an input image, it converts it to a numpy array and gets the HSP value of the middle pixel. The HSP value helps us determine the input’s perceived brightness. We set a threshold value of 0.3 and if the image was over that, we guessed it to be white. Under the threshold resulted in guessingblack. The input numpy arrays were then compared against the base images pixel-by-pixel. The amount of matching pixels were stored and the model outputted the piece that matched the most. Purely guessing would theoretically give you an accuracy of 8.33%. Our baseline model ended up having an accuracy of 14%.
The final model used for our results is YOLOv5 (a variation of ‘You Only Look Once’ object detection model, pre-trained on COCO dataset). This model can be accessed through Pytorch and at the GitHub repository made by Ultralytics [4]. YOLOv5 has 9 different variations, each variation differing in the number of parameters and expected image resolution. Due to the computation limit of Google Colab and image resolution of our dataset, we decided to utilize YOLOv5x which has ~87 million parameters [4]. The YOLOv5 model is made of a backbone (trained on CSPDarknet) that does feature extraction, a neck (PANet) that does feature fusion, and a head (YOLO layer) that does the detection (Figure 3). YOLOv5 uses various convolutional layers, max/average pooling, bottlenecks, SPP, activation functions (LeakyReLU, SiLU, Mish, Swish), concatenations, and upsamplings [5][6][7].
Using a custom dataset, we trained the YOLOv5 model to identify all instances of a chess piece in a given image on a chess board. By using a SGD optimizer, a batch size of 15, and epoch of 100 (saving the best model out of 100 epochs), the model was able to achieve a 90% accuracy on the test set. The confidence and IOU (intersection over union) thresholds were set to 0.5. Refer to Appendix A for Google Colab notebooks for model code.
An object detection model was used instead of a regular classification model because we needed a way to locate the exact position of the chess piece detected on the board. Through a regular convolution model, we would have to teach the model to identify the grids on the board, and the infinite combinations of chess positions. Thus, it seemed much easier to use an object detection model and use the bounding boxes predicted to determine the position of the piece on the board instead; with the assumption that the image given to the model is of only the chess board with pieces (2D) in their respective squares.
Please refer to the logged training results at Weights and Biases
Overall the results of our model were surprisingly good. In our Precision plot, only the black rook had a low spike in precision, but generally, precision was at 1.00 precision at 0.845 confidence.
[1] L. Spears, “Transcribe Live Chess with Machine Learning Part 1,” Medium, 25-Aug-2019. [Online]. Available: https://towardsdatascience.com/transcribe-live-chess-with-machine-learning-part-1-928f73306e1f. [Accessed: 03-Jun-2021].
[2] D. M. Quintana, A. A. del B. Garc´ıa, and M. P. Mat´ıas, “LiveChess2FEN: a Framework for Classifying Chess Pieces based on CNNs,” arxiv, 15-Dec-2020. [Online]. Available: https://arxiv.org/pdf/2012.06858.pdf. [Accessed: 02-Jun-2021].
[3] P. Koryakin, “Chess Positions,” Kaggle, 03-Feb-2019. [Online]. Available: https://www.kaggle.com/koryakinp/chess-positions. [Accessed: 01-Jun-2021]. Recognise chess position of 5-15 pieces, Version 1
[4] Ultralytics. (2021) YOLOv5 (Version 5) [Source code]. https://github.com/ultralytics/yolov5. [Accessed: 01-Jul-2021].
[5] Ultralytics. (2021) YOLOv5 (Version 5) [Source code]. https://github.com/ultralytics/yolov5/blob/master/models/yolov5x.yaml. [Accessed: 01-Jul-2021].
[6] Ultralytics. (2021) YOLOv5 (Version 5) [Source code]. https://github.com/ultralytics/yolov5/blob/master/models/common.py. [Accessed: 01-Jul-2021].
[7] R. Xu, H. Lin, K. Lu, L. Cao, and Y. Liu, “A forest fire detection system based on ensemble learning,” Forests, vol. 12, no. 2, p. 217, 2021. Refer to figure of YOLOv5 network architecture