This notebook is a demonstration to implement various classifiers to classify the sentiment of a review as negative or positive.
- Python
- Sklearn
- Classification Algorithms Used
- Naive Bayes Algorithm
- Multinomial Naive Bayes
- Logistic Regression
- Bernoulli Naive Bayes Classifier
- Linear SVC classifier
- Custom made Vote classifier
The given two txt files has to be manually handled to classify them as positive or negative according to the label of file name.
While reading the file, words that are adjectives are only appended to the our
list. This operation is achieved by usingpos_tag
attribute of NLTK. Example-from nltk.tokenize import PunktSentenceTokenizer example_text="I love my country , I am grateful for all the things it has given me" sample_text='I love biscuits. I am grateful to shops.' custom_token=PunktSentenceTokenizer(example_text) tokkend=custom_token.tokenize(sample_text) for i in tokkend: words=word_tokenize(i) tagged=nltk.pos_tag(words) print(tagged)
Creating featuresets that is of tuple datatype. Most common words are selected that are cross-referenced against occurring words in document to create a Boolean feature like below.