Custom Entity Modules

Create virtual environment

This step is optional and is required in-case a virtual environment needs to be created
Note: the modules have been tested on python 3.6

virtualenv -p /usr/bin/python3.6 <env_name> 
source path_to_<env_name>/bin/activate

pip install -U sentence-transformers
pip install dateparser
pip install textdistance
pip install stanza

OR
pip install -r requirements.txt

Yes/No Entity module

This semantic search is based on BERT sentence embedding i.e. a comparison of the input user query with the phrases present in the dataset (since this is BERT based, it takes time to load the model)
Following this link
Using the multilingual model : distiluse-base-multilingual-cased

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('distiluse-base-multilingual-cased')

Running instructions:

python get_yes_no.py --query="<query text>"

Age entity module

This module is built on top of the dateparser library
This module essentially calculates the date from the user's response and then subtracts it (and takes the absolute value) to get the user age

python get_age.py --query="<query_text>"

Number entity module

This module is built on top of Stanza NLP library
It uses PoS tags for Hindi language using the Stanza library

python get_number.py --query="<query_text>"

Name module

The name module is based on a 5 gram approach in which an SVM model predicts the probability of the center word (i.e. 3rd word from beginning) as the name of a person.

Environment setup for Name module:

source path_to_<env_name>/bin/activate
sudo apt install libpq-dev python3-dev
sudo apt-get install python-numpy libicu-dev
pip install pyicu
pip install polyglot
pip install pycld2
pip install Morfessor
polyglot download embeddings2.hi
polyglot download ner2.hi
cd Name/libsvm-3.23/
rm svm-scale svm-train svm-predict svm.o
make
cd python/
make
cd ../../../ (go back to home dir)

from Name import main
pred_name = main.get_name("<query_text>")

Location module

The README file for the location module is inside the location folder

Environment setup for location module:

source path_to_<env_name>/bin/activate
sudo apt install libpq-dev python3-dev
sudo apt-get install python-numpy libicu-dev
pip install pyicu
pip install polyglot
pip install pycld2
pip install Morfessor
polyglot download embeddings2.hi
polyglot download ner2.hi

DoB module

This module is a heuristic based DoB extraction approach. The heuristics were developed after manual analysis of how users spoke their date of birth.

python get_dob.py --query="<query_text>"

Voice survey App

The code and documentation for the voice survey app is inside the voice_survey_android_app folder

The code for these modules is also present in a single Jupyter Notebook finalEvalForPaper.ipynb

Data

Please request for the data at contact@oniondev.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Custom Entity Modules

Create virtual environment

Yes/No Entity module

Age entity module

Number entity module

Name module

Location module

DoB module

Voice survey App

Data

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Name		Name
Name_training		Name_training
location		location
voice_survey_android_app		voice_survey_android_app
.gitignore		.gitignore
README.md		README.md
finalEvalForPaper.ipynb		finalEvalForPaper.ipynb
get_age.py		get_age.py
get_dob.py		get_dob.py
get_number.py		get_number.py
get_yes_no.py		get_yes_no.py
requirements.txt		requirements.txt

ICTD-IITD/Voice_App_Custom_Entity_Extraction

Folders and files

Latest commit

History

Repository files navigation

Custom Entity Modules

Create virtual environment

Yes/No Entity module

Age entity module

Number entity module

Name module

Location module

DoB module

Voice survey App

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages