This repository contains the course work for Machine learning nanodegree program by Udacity
- I am using Visual Studio 2017 Enterprise with Python tools. I find it extremely useful especially the intellisense and all the editor features for development
This is the practice project for learning the naive bayes classifier applying to the dataset of SMS messages tagged as spam/not.
- Running the project
python Naive_bayes_spam_filter.py
- Notes: Naive bayes classification alogrithm
One of the major advantages that Naive Bayes has over other classification algorithms is its ability to handle an extremely large number of features. In our case, each word is treated as a feature and there are thousands of different words. Also, it performs well even with the presence of irrelevant features and is relatively unaffected by them. The other major advantage it has is its relative simplicity. Naive Bayes' works well right out of the box and tuning it's parameters is rarely ever necessary, except usually in cases where the distribution of the data is known. It rarely ever overfits the data. Another important advantage is that its model training and prediction times are very fast for the amount of data it can handle. All in all, Naive Bayes' really is a gem of an algorithm!