Early Results from Automating Voice-based Question-Answering Services Among Low-income Populations in India

This is the repository containing the code for our work here (add link).

Instructions for running:

Our code can be used for identifying the query in an FAQ database, which is most similar to the input test query. It can concatenate multiple datasets, conduct data augmentation by generation of similar sentences and also perform automatic theme classification. We hope our contribution helps future researchers in providing a headstart on experiments. Please add your data inside the data directory and modify the code for reading them appropriately.

bring_in_stt.py : generates the final csv file used for pre-processing data. This will take in the speech to text transcripts (STTs) from the respective excel file and combine them with the dataset containing all other information.
preprocess_data.py : generates train and test splits. Supports multiple train sets for a single test set (make appropriate changes in config.py, instructions are self explanatory)
All the notebooks correspond to different models/libraries used and describe their workflow in detail. They can be run directly by using Google Colab using the link present in the notebooks for ease. The notebooks download this repository, and then use the data present in the data folder for training, testing and evaluation.
For data augmentation and data concatenation, refer to data_concatenation_augmentation.ipynb : this can both concatenate and augment datasets, by creating similar sentences using manually crafted synonyms and using iNLTK library's api call for similar sentence generation.
Theme Classification: theme_classification.ipynb contains code to experiment with BERT and Tf-Idf weighted N-gram models which will predict the theme of the input test query (q2), and then generate a filtered test data where q1's theme will be among the top 3 predicted themes for q2.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
LICENSE		LICENSE
QA analysis.ipynb		QA analysis.ipynb
README.md		README.md
bring_in_stt.py		bring_in_stt.py
config.py		config.py
data_concatenation_augmentation.ipynb		data_concatenation_augmentation.ipynb
flair_and_inltk_libraries_workflow.ipynb		flair_and_inltk_libraries_workflow.ipynb
jaccard_model_workflow.ipynb		jaccard_model_workflow.ipynb
preprocess_data.py		preprocess_data.py
theme_classification.ipynb		theme_classification.ipynb
transformers_workflow.ipynb		transformers_workflow.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Early Results from Automating Voice-based Question-Answering Services Among Low-income Populations in India

About

Releases

Packages

Languages

License

ICTD-IITD/Voice_App_Automated_QnA

Folders and files

Latest commit

History

Repository files navigation

Early Results from Automating Voice-based Question-Answering Services Among Low-income Populations in India

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages