Semester long group project for CS-519-M01, taught by Dr. Huiping Cao in Spring 2023.
- Long Tran
- Theoderic Platt
- os: openSUSE Leap 15.4 -- all testing and training were run on this version of linux.
- pip 23.0.1 -- only if dependencies are not yet installed.
- Python 3.10.10 -- all testing was conducted on this version of python.
- requirements.txt -- this file contains all dependencies, packages, and libraries needed.
- To install the requirements: 'pip install -r requirements.txt'
- external_requirements.txt -- this file contains the names of the command-line programs that must be installed on the machine.
- To install, you can look for online instructions to install them.
The codebase is run through two primary python scripts, both of which need additional parameters to be properly utilized.
The main program runs our models against an inputted .png image containing a mathematical formula. The output of main is what our models identified as the text contained within the image file. To run main, do the following:
- 'python main.py -p img'
- img should contain the path to the image file that you with to test. Some default images we have supplied for our testing are as follows:
- 'python main.py -p input.png'
- 'python main.py -p input2.png'
- img should contain the path to the image file that you with to test. Some default images we have supplied for our testing are as follows:
The tools section of the code is utilized to allow for the datasets to be generated, the models to be trained, or the models to be evaluated on premade test cases.
Data can be generated by performing one of the following:
- 'python tools.py generate_symbols'
- Provides options for generating characters data, numbers data, and operators data. Overwrites the local ./data/symbol_dataset.csv with whatever data was run in this call. For proper usage, ensure that all three data sub-sets are used.
- 'python tools.py generate_piecewise'
- Generates piecewise functions as data with heights 1, 2, 3, and 4. Number of instances for each height will be requested upon running.
For all model generation, if an existing model exists, you will be given the option to overwrite it with the newly generated model. If no such model exists, you will be given the option to save this model. Models will be saved as .bin files under the sub-directory ./trained_models. Models can be trained by performing the following:
- 'python tools.py train'
- Training single class models will train the characters, numbers, and operators models using Logistic Regression.
- Training the interClass model will train a model containing all single class models, for distinguishing between which is needed. This is trained using a CNN, the architecture can be seen in the report.
- Training the piecewise class model will train the piecewise dataset model using a CNN, the architecture can be seen in the report.
Evaluating the models simply runs the model on a hardcoded pre-made test case and provides metrics for how accurately the interClass model performs, how well the single class models perform, and how well these models perform together. To run model evaluation:
- 'python tools.py evaluate'
The datasets used are contained within ./data. The data is distributed between characters, numbers, operators, and piecewise data. The descriptions for these datasets are assuming the data has not been overwritten by the user after downloading this repository. If the data has been regenerated, the exact specifications may vary.
The classes for this dataset are [0,1,2,3,4,5,6,7,8,9]
This dataset is made up of 100 instances for each class. Each instance is a grayscale 100x100 pixel .png containing an image of the number corresponding to the class label. The instances have random fonts, and are all fit to fill the image.
The classes for this dataset are [a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z]
This dataset is made up of 100 instances for each class. Each instance is a grayscale 100x100 pixel .png containing an image of the character corresponding to the class label. The instances have random fonts, and are all fit to fill the image.
The classes for this dataset are [(,),+,-,=,',',divide,times,curly_bracket]
This dataset is made up of 100 instances for each class. Each instance is a grayscale 100x100 pixel .png containing an image of the operator corresponding to the class label. For the class label divide, the standard division symbol is used. For the class times, the multiplication symbol is used. For the class curly_brace, the curly brace is used. The instances have random fonts, and are all fit to fill the image.
The classes for this dataset are [1,2,3,4]
This dataset is made up of 2000 instances for each class. Each instance is a grayscale 100x100 pixel .png containing an image of a piecewise function with class number of cases. A class 1 is a curly brace followed by one line-height of output. A class 2 is a curly brace followed by two line-height of output, and so on. The instances have random fonts and, are all fit to fill the image.
Note: It was decided very late in development that this dataset is no longer needed. The data is present still for legacy sake.
The following is the entire codebase description.
The data directory contains the datasets outlined above in this readme. Folders containing the datasets are present, along with two .csv files which serve as data-key pair information storage.
The eval_codes directory contains various test images for evaluating the accuracy of the models. Additionally, eval codes contains two python scripts.
- evaluate.py -- evaluates the models on a test image and outputs evaluation metrics. This script is called as a part of the tools.py script
- predict.py -- Performs the same action as evaluate.py, but does not print out evaluation metrics. This script is depricated.
The func_codes directory contains two python scripts.
- converter.py -- Predicts the contents of a supplied image and converts them to latex. This script is called as a part of the main.py script.
- split.py -- contains all code for segmentizing an input image to split all individual characters apart for model prediction.
The settings directory simply contains a single CONFIG.py file which holds all paths, data references, evaluation cases, the list of fonts and model paths.
The train_codes directory contains a set of python scripts used for either generation of the datasets or training of the models.
- cnn_models.py -- Code for training the CNN models for both interClass and piecewise class.
- piecewise_gen.py -- Code for generating piecewise dataset.
- save_tools.py -- header file for pickle saving data and models
- single_gen.py -- Code for generating the single datasets (numbers, characters, and operators)
- single_symbol_recognizer.py -- Code for training the models for all datasets
- transformer.py -- Code for transforming and preprocessing image segments gained by split.py into similar formatting to the datasets we have trained models on.
The trained_models directory contains the .bin files which have been saved using pickle for the models described above. Some models may not be present upon initially pulling the repository due to file size limitations of github. If models are not present (fail to load), please generate and save the missing models locally.
The remaining files are those not contained within subdirectories. These files are either images for testing, the requirements documents, or a set of several python scripts. These scripts are:
- main.py -- The main file for running the code. Details on how to run are outlined above in this readme.
- tools.py -- The tools file for running the code. Details on how to run are outlined above in this readme.
- misc.py -- Code containing a collection of useful miscellaneous functions used heavily throughout the codebase.
- piecewise_generator.py -- depricated code which has now been moved into other files
- dataset_prediction.py -- depricated code which has now been moved into other files