This repository uses Graph4NLP library to evaluate Graph2Tree class on MAWPS dataset. Graph to Tree interpretation of the math word equation is depicted as follows:
Depiction from sentence to math word equation in the form of Graph input and Tree output:
- Common functions such as accuracy computation and out-of-vocabulary functions are stored in
utils.py
. - The model implementation code is inside
Mawps.py
. - All hyperparameters are stored in
config.yaml
file. These hyperparameters include graph construction arguments (this graph is the embedding converted from natural language text data into graph embedding), graph initialization arguments, and decoder arguments. - The file which loads the
config.yaml
file into our project is available inload_config.py
. - The file from which we run all the models experiments are kept inside
run_experiemnts.py
.
- data: We kept here two directories-
a) Processed- This has the
NodeEmbGraph
directory which consists of thedata.pt
andvocab.pt
files which are used for creating the graph embeddings. b) Raw- This directory holds the raw MAWPS data where we split the dataset into 80% train, 10% validation, and 10% test dataset. - checkpoint_save: This directory holds all the files that have the history of some of our experiments. We did not upload all the run history files here as this is not the best practice. We uploaded some files for sample runs.
- experiements: We initiated to run the model and experiments with Jupyter Notebooks first and then we came up with this modular structured approach to formally ran the model and experiments.
Libraries to install:-
- Graph4Nlp- pip install graph4nlp
- Torchtext- pip install torchtext
- Torch- pip install torch
- Numpy- pip install numpy
Tools:- CoreNLP
This tools is published by Standford to work with NLP processing tasks. This is available as JAR file, on Huggingface, and as Maven. For this project, we used JAR file to properly setup this tool. We need to navigate to this website: CoreNLP Software Link This software is used in building the graph embeddings. From this page, we downloaded the JAR files (binaries) to our system. On command prompt we need to provide the following command to start the server:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer --port 9001 --timeout 15000
Reference figure is shown here:-
As stated above, there are two ways to run this project:-
-
Using the Modular Approach: You can first clone this repository into your system. Start the core nlp server and run the experiments using the command on terminal:
python run_experiments.py
-
You can use the jupyter notebook ipynb file stored in the
experiment
directory and run each cell sequentially after connecting to the corenlp server.
The model architecture is depicted as below. Entire report for this project is uploaded on GradeScope for the Assessment: