This repository provides a command line script written in Python for generating multiple variations of questions (both syntactically or/and lexically) from an input question automatically. The generated questions can be used in training of automated question answering system as augmentation for training datasets. This augmentation technique is especially useful when training datasets are small and limited.This repository also contains code for forming permutations of questions for training.
git clone --recursive https://github.com/rajat1433/qna_permute
cd Question-Generation
python -m virtualenv env
source env/bin/activate
pip install -r requirements.txt
python setup.py install
- Download fastText English vectors [direct link]
- Decompress and put
cc.en.300.bin
undermodel/pretrained/fastText
directory
- Download spaCy pretrained GloVe model [direct link]
- Decompress and put
en_vectors_web_lg-2.1.0
(the most nested folder) undermodel/pretrained/spacy_glove
directory
- Download the transformer variant of Universal Sentence Encoder [direct link]
- Decompress and put
assets
,variables
,saved_model.pb
andtfhub_module.pb
undermodel/pretrained/universal_sentence_encoder
directory
Checkpoint without reinforcement learning:
- Download the pretrained model from this link
- Decompress and put
translate.ckpt-1460356.data-00000-of-00001
,translate.ckpt-1460356.index
andtranslate.ckpt-1460356.meta
undermodel/pretrained/active-qa/translate.ckpt-1460356
directory
Checkpoint with reinforcement learning:
- Download the pretrained model from this link
- Decompress and put
translate.ckpt-6156696.data-00000-of-00001
,translate.ckpt-6156696.index
andtranslate.ckpt-6156696.meta
undermodel/pretrained/active-qa/translate.ckpt-6156696
directory
python script/quest_gen.py
It will generate a output.json file containing questions as keys and their corresponding generated questions
as the values.
===================================================================================================
python algo.py
It will generate 3_permutations_original.csv file corresponding to the 3_permutations which is needed.
Edit the various parameters in algo.py file to generate corresponding number of questions as required.