All the project work done during coursework of Natural Language Processing during Masters at USC
- A Naive Bayes classifier to identify hotel reviews as either truthful or deceptive, and either positive or negative with no external python libraries used.
- Run nbclassify3.py and give input data as command line argument. It will create nbmodel file with words as features and corresponding numbers giving probabilities.
- Run nbclassify giving nbmodel and test data as input.
- Got 90% accuracy on the data given.
- A Perceptron classifier(both vanilla model and average model) to identify hotel reviews as either truthful or deceptive, and either positive or negative with no external python libraries used.
- Run perceplearn3.py and give input data as command line argument. It will create vanilla model file and average model file with words as features.
- Run percepclassify3.py giving vanilla/average model file path as first argument and test data path as second argument.
Hidden Markov Model Part-of-Speech Tagger(HMM-POS Tagger)
- A Hidden Markov Model part-of-speech tagger for Italian, Japanese, and a surprise language. The training data are provided tokenized and tagged.
- Run hmmlearn3.py with giving input data as command line argument. It will create hmmmodel.txt with values of transition as well as emission states
- Run hmmdecode3.py which takes path of model file as input and using viterbi decoding gets the most suitable tagger and produces the output.txt which contains tagged test data.
- It has given the accuracy of 93% for japaneese, 91% for italian and 92% for surprise language.