This notebook is a demonstration to implement various classifiers to classify the sentiment of a review as negative or positive.
- Python
- Sklearn
- NLTK
- Classification Algorithms Used
- Naive Bayes Algorithm
- Multinomial Naive Bayes
- Logistic Regression
- Bernoulli Naive Bayes Classifier
- Linear SVC classifier
- Custom made Vote classifier
-
The given two txt files has to be manually handled to classify them as positive or negative according to the label of file name.
-
While reading the file, words that are adjectives are only appended to the our
all_words
list. This operation is achieved by usingpos_tag
attribute of NLTK. Example-from nltk.tokenize import PunktSentenceTokenizer example_text="I love my country , I am grateful for all the things it has given me" sample_text='I love biscuits. I am grateful to shops.' custom_token=PunktSentenceTokenizer(example_text) tokkend=custom_token.tokenize(sample_text) for i in tokkend: words=word_tokenize(i) tagged=nltk.pos_tag(words) print(tagged)
-
Creating featuresets that is of tuple datatype. Most common words are selected that are cross-referenced against occurring words in document to create a Boolean feature like below.