This is a repository containing code and data for the paper:
K. Noorbakhsh, M. Sulaiman, M. Sharifi, K. Roy and P. Jamshidi. Pretrained Language Models are Symbolic Mathematics Solvers too!
This code depends on the following packages:
Torch
NumPy
SymPy
Transformers
Apex
-
trainer.py
contains code for fine-tuning the pre-trained language models. Please modify the following parameters for running:language
: the pre-trained language.Model_Type
: mbart or Marian.path1
andpath2
: the path of the training and the validation data.max_input_length
andmax_output_length
: 1024 for the mBART model and 512 for the Marian-MT model.model_name
: name of the model you wish to save.
-
evaluator.py
contains code for evaluting the fine-tuned language model on the symbolic math data. Please modify the parameter 1-4 same as thetrainer
section and also modify the following parameter:path
: the path of the test dataset.saved_model
: the path of the saved fine-tuned model.
-
src/hf_utils.py
contains code for reading the datasets and some utilities for evaluation.
The rest of the code is adopted from Deep learning for symbolic mathematics (Lample et al.).
The datasets are available here.
train
,valid
, andtest
files contain the training, validation and test datasets for the mBART model.language_data
contains data for the training, validation and test datasets of the Marian-MT model.distribution_test
contains the test files for the distribution shift section (polynomial, trgonometric and logarithmic).
Please cite us if you use our work in your research.
@article{noorbakhsh2021pretrained,
title={Pretrained Language Models are Symbolic Mathematics Solvers too!},
author={Kimia Noorbakhsh and Modar Sulaiman and Mahdi Sharifi and Kallol Roy and Pooyan Jamshidi},
journal={arXiv preprint arXiv:2110.03501},
year={2021}
}