A probabilistic language identification system that identifies the language of a sentence
- Python 2.7.15
- Python 3.7.0
- Pip
pip install -r requirements.txt
ali is a probabilistic language identification system that identifies the langue of a sentence.
usage: ali [-v] (-c TRAIN-CORPUS)* [-t TEST-FILE]
-v Prints debugging messages.
-c Specifies the training text(s) for the language.
-t Specifies the test set for the model.
Examples
generate an unigram and a bigram for each corpus and predict the training sentences using the later.
python ali.py -c "data/en.txt" -c "data/sp.txt" -c "data/fr.txt" -t "data/first10TestSentences.txt"
outputs: see output/output.md
Train the models
python train.py
Run the server
python server.py
# or
gunicorn server:app