- This step is optional and is required in-case a virtual environment needs to be created
- Note: the modules have been tested on python 3.6
virtualenv -p /usr/bin/python3.6 <env_name>
source path_to_<env_name>/bin/activate
pip install -U sentence-transformers
pip install dateparser
pip install textdistance
pip install stanza
pip install -r requirements.txt
This semantic search is based on BERT sentence embedding i.e. a comparison of the input user query with the phrases present in the dataset (since this is BERT based, it takes time to load the model)
Following this link
Using the multilingual model :
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('distiluse-base-multilingual-cased')
Running instructions:
python get_yes_no.py --query="<query text>"
This module is built on top of the dateparser library
This module essentially calculates the date from the user's response and then subtracts it (and takes the absolute value) to get the user age
python get_age.py --query="<query_text>"
- This module is built on top of Stanza NLP library
- It uses PoS tags for Hindi language using the Stanza library
python get_number.py --query="<query_text>"
The name module is based on a 5 gram approach in which an SVM model predicts the probability of the center word (i.e. 3rd word from beginning) as the name of a person.
Environment setup for Name module:
source path_to_<env_name>/bin/activate
sudo apt install libpq-dev python3-dev
sudo apt-get install python-numpy libicu-dev
pip install pyicu
pip install polyglot
pip install pycld2
pip install Morfessor
polyglot download embeddings2.hi
polyglot download ner2.hi
cd Name/libsvm-3.23/
rm svm-scale svm-train svm-predict svm.o
cd python/
cd ../../../ (go back to home dir)
from Name import main
pred_name = main.get_name("<query_text>")
- The README file for the location module is inside the location folder
Environment setup for location module:
source path_to_<env_name>/bin/activate
sudo apt install libpq-dev python3-dev
sudo apt-get install python-numpy libicu-dev
pip install pyicu
pip install polyglot
pip install pycld2
pip install Morfessor
polyglot download embeddings2.hi
polyglot download ner2.hi
- This module is a heuristic based DoB extraction approach. The heuristics were developed after manual analysis of how users spoke their date of birth.
python get_dob.py --query="<query_text>"
- The code and documentation for the voice survey app is inside the voice_survey_android_app folder
The code for these modules is also present in a single Jupyter Notebook finalEvalForPaper.ipynb
Please request for the data at contact@oniondev.com