- This step is optional and is required in-case a virtual environment needs to be created
- Note: the modules have been tested on python 3.6
virtualenv -p /usr/bin/python3.6 <env_name>
source path_to_<env_name>/bin/activate
pip install -U sentence-transformers
pip install dateparser
pip install textdistance
pip install stanza
OR
pip install -r requirements.txt
-
This semantic search is based on BERT sentence embedding i.e. a comparison of the input user query with the phrases present in the dataset (since this is BERT based, it takes time to load the model)
-
Following this link
-
Using the multilingual model :
distiluse-base-multilingual-cased
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('distiluse-base-multilingual-cased')
Running instructions:
python get_yes_no.py --query="<query text>"
-
This module is built on top of the dateparser library
-
This module essentially calculates the date from the user's response and then subtracts it (and takes the absolute value) to get the user age
python get_age.py --query="<query_text>"
- This module is built on top of Stanza NLP library
- It uses PoS tags for Hindi language using the Stanza library
python get_number.py --query="<query_text>"
The name module is based on a 5 gram approach in which an SVM model predicts the probability of the center word (i.e. 3rd word from beginning) as the name of a person.
Environment setup for Name module:
source path_to_<env_name>/bin/activate
sudo apt install libpq-dev python3-dev
sudo apt-get install python-numpy libicu-dev
pip install pyicu
pip install polyglot
pip install pycld2
pip install Morfessor
polyglot download embeddings2.hi
polyglot download ner2.hi
cd Name/libsvm-3.23/
rm svm-scale svm-train svm-predict svm.o
make
cd python/
make
cd ../../../ (go back to home dir)
from Name import main
pred_name = main.get_name("<query_text>")
- The README file for the location module is inside the location folder
Environment setup for location module:
source path_to_<env_name>/bin/activate
sudo apt install libpq-dev python3-dev
sudo apt-get install python-numpy libicu-dev
pip install pyicu
pip install polyglot
pip install pycld2
pip install Morfessor
polyglot download embeddings2.hi
polyglot download ner2.hi
- This module is a heuristic based DoB extraction approach. The heuristics were developed after manual analysis of how users spoke their date of birth.
python get_dob.py --query="<query_text>"
- The code and documentation for the voice survey app is inside the voice_survey_android_app folder
The code for these modules is also present in a single Jupyter Notebook finalEvalForPaper.ipynb
Please request for the data at contact@oniondev.com