Email spam detection program using Multinomial Naive Bayes and TF-IDF for text preprocessing and evaluation
Tools used;
• Python
• Apache Spam Assassian Dataset
• skLearn Machine learning library
• Anaconda enviroment
• Load CSV file into a data frame
• Preprocess data - i.e remove headers, turn text into lowercase, remove Special Characters
• use the Term Frequency and the Naives Bayes library
• Split the dataset into training (80%) and testing (20%) of datasets.
• Then evaulate performance and test on actual spam emails to test accuracy.