A modular framework for task-specific finetuning of pre-trained language models. Supports NLU (Natural Language Understanding) tasks such as Question Answering and Sentence Similarity. Configs can be used to define tasks, base models (from hugging face), the dataset class for finetuning and other training and evaluation hyperparameters. Driver files are used to launch training/evaluation jobs based on pre-defined configs. The framework is easily extendable to other NLU tasks.
# From root directory install all the dependancies
pip install -r requirements.txt
# set configs (configs/run_config.json) and launch tool
./driver/driver.py
│───README.md
│───requirements.txt
│
├───configs
│ └───run_config.json
│ └───run_config_appendix.json
│
├───dataset
│ |───dataset.py
│ │
│ ├───STSRawData
│ │ └───sts2016-english-with-gs-v1.0
│ │ correlation-noconfidence.pl
│ │ LICENSE.txt
│ │ README.txt
│ │ STS2016.gs.answer-answer.txt
| | .......
| | .......
├───driver
│ └───driver.py
| └───driver_ensemble.py
│
├───models
│ └───model.py
│ └───model_maps.py
├───scripts
│ └───correlation-noconfidence.pl
│
└───utils
└───evaluation.py
└───utility.py
driver/driver.py
: Starting point using which tool is launched.models/model.py
: Model definitions (Sentence Transformer, Word2vec, Doc2Vec, Glove, and others).configs/run_config_sts.json
: Specify the model names and device to run the inference (CPU ("cpu") or GPU ("cuda") supported). If Dataset download is set to false, existing dataset folder used under dataset/STSRawData, otherwise dataset is fetched from given download link.models/model_maps.py
: Supported models based on libraries offering their pre-trained weights. To Extend the package, this file needs to be augmented.driver/driver_ensemble.py
: Starting point using the tool to train ensemble on SQUAD dataset.train/*
: train files and utilties for squad and stsevaluation/*
: evaluation files and utilities for squad and stsmodels/*
: QA and STS model variations and their definitionsconfigs/run_config_squad.json
: Specify the model names and device to run the inference (CPU ("cpu") or GPU ("cuda") supported). If Dataset download is set to false, existing dataset is used.