The purpose of this project is to classify whether a connection attempt is secure or not using supervised learning. We will begin doing so by analysing our data so we could understand it very well. Then, we will clean it as a part of the preprocessing stage. We will deal with missing data, duplicated rows, and strings.
- Anaconda
- Required packages: pandas, numpy, scikit-learn, sklearn
The dataset used in this project is the KDD Cup 19 dataset (Information and Computer Science - University of California, Irvine - Irvine, CA 92697-3425). This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between "bad" connections, called intrusions or attacks, and "good" normal connections.