ML Classification - Network Traffic Analysis

This project aims to analyze and classify a real network traffic dataset to detect malicious/benign traffic records. It compares and tunes the performance of several Machine Learning algorithms to maintain the highest accuracy and lowest False Positive/Negative rates.

Data Set (Aposemat IoT-23)

The dataset used in this demo is: CTU-IoT-Malware-Capture-34-1.

It is part of Aposemat IoT-23 dataset.
A labeled dataset with malicious and benign IoT network traffic.
This dataset was created as part of the Avast AIC laboratory with the funding of Avast Software.

Data Classification Details

The project is implemented in four distinct steps simulating the essential data processing and analysis phases.

Each step is represented in a corresponding notebook inside notebooks.
Intermediary data files are stored inside the data path.
Trained models are stored inside models.

PHASE 1 - Initial Data Cleaning

Corresponding notebook: initial-data-cleaning.ipynb

Implemented data exploration and cleaning tasks:

Loading the raw dataset file into pandas DataFrame.
Exploring dataset summary and statistics.
Fixing combined columns.
Dropping irrelevant columns.
Fixing unset values and validating data types.
Checking the cleaned version of the dataset.
Storing the cleaned dataset to a csv file.

PHASE 2 - Data Processing

Corresponding notebook: data-preprocessing.ipynb

Implemented data processing and transformation tasks:

Loading dataset file into pandas DataFrame.
Exploring dataset summary and statistics.
Analyzing the target attribute.
Encoding the target attribute using LabelEncoder.
Handling outliers using IQR (Inter-quartile Range).
Handling missing values:
1. Impute missing categorical features using KNeighborsClassifier.
2. Impute missing numerical features using KNNImputer.
Scaling numerical attributes using MinMaxScaler.
Encoding categorical features: handling rare values and applying One-Hot Encoding.
Checking the processed dataset and storing it to a csv file.

PHASE 3 - Model Training

Corresponding notebook: model-training.ipynb

Trained and analyzed classification models:

Naive Bayes: ComplementNB
Decision Tree: DecisionTreeClassifier
Logistic Regression: LogisticRegression
Random Forest: RandomForestClassifier
Support Vector Classifier: SVC
K-Nearest Neighbors: KNeighborsClassifier
XGBoost: XGBClassifier

Evaluation method:

Cross-Validation Technique: Stratified K-Folds Cross-Validator
Folds number: 5
Shuffled: Enabled

Results were analyzed and compared for each considered model.

PHASE 4 - Model Tuning

Corresponding notebook: model-tuning.ipynb

Model tuning details:

Tuned model: Support Vector Classifier - SVC
Tuning method: GridSearchCV
Results were analyzed before/after tuning.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
data		data
models		models
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Classification - Network Traffic Analysis

Data Set (Aposemat IoT-23)

Data Classification Details

PHASE 1 - Initial Data Cleaning

PHASE 2 - Data Processing

PHASE 3 - Model Training

PHASE 4 - Model Tuning

About

Releases

Packages

Languages

License

sinanw/ml-classification-malicious-network-traffic

Folders and files

Latest commit

History

Repository files navigation

ML Classification - Network Traffic Analysis

Data Set (Aposemat IoT-23)

Data Classification Details

PHASE 1 - Initial Data Cleaning

PHASE 2 - Data Processing

PHASE 3 - Model Training

PHASE 4 - Model Tuning

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages