This Google chrome extension purpose is to identify and evaluate if they are deemed phishing or legitimate based on the URLS. It is built with the understanding to aid privacy, so that the user data browsing pattern is not collected.
Dataset: UCI repository
Machine Learning Technique used : Random Forest Classifier
Dataset is taken from UCI repository.
First, download the dataset and save it as dataset.arff
. The arff file is then loaded by preprocess.py
in order to convert it to an array. Additionally the dataset is split for training and testing where 30% is for testing set.
Once this is completed -
Change to /backend/dataset
directory and run the file preprocess.py
which will result into creating training and test data in a *.npy
file into main working directory.
RandomForest an (ensemble learner) is then fit with the training set and then, Accuracy and Cross validation scores are printed to evaluate the performance of the model.
Change working directory to /backend/classifier
and Run
training.py
classifier.py
is created into directory named /static
.
This requires you to turn on developer mode in chrome extensions. Navigate to chrome://extensions/
on your google browser and turn on developer mode.
- Select load unpacked and choose the
frontend
directory of this repository.
Libraries:
Python3.x
sklearn
numpy
liac-arff