In this Classification Project
, we use Natural Language Processing techniques and different Machine Learning models to classify Amazon book reviews into POSITIVE
or NEGATIVE
sentiments.
-
Bag of Words technique was used to vectorized the reviews using the CountVectorizer (binary only) and TfidfVectorizer (Term Frequency - Inverse Document Frequency).
-
Since the smaller dataset was imbalanced, we use a bigger dataset and balance out the
POSITIVE
s andNEGATIVE
s. -
The models that were compared are as follows:
- Linear SVM
- Decision Tree
- Naives Bayes
- Logistic Regression
-
Further, GridSearchCV was used to find the best parameters for the Linear SVM model.
-
The Linear SVM model with best parameters found using GridSearchCV is stored using the Pickle library for future use.
-
In future, other models will be studied for finding best parameters of for each. An overall best performing model can then be found.
- Jupyter Notebook
- Scikit-learn
- Pickle
- Pandas
- json
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project (click on
Fork
in the top-left corner) - Create your Feature Branch (
git checkout -b feature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature
) - Open a Pull Request
Sinjoy Saha