This project focuses on classifying websites visited over the Tor network by analyzing network traffic patterns. The goal is to evaluate the performance of website fingerprinting models in both Closed-world and Open-world scenarios.
Website fingerprinting identifies websites a user visits by analyzing encrypted network traffic patterns. Despite encryption, unique traffic patterns arise from web content like embedded resources and request-response behaviors.
In the Tor network, which ensures anonymity by encrypting and routing traffic through multiple nodes, attackers can analyze traffic at entry points to infer visited websites. Using Tor traffic data with timestamped packet sizes, we aim to classify websites by building and evaluating machine learning models.
The project focuses on two scenarios:
- Closed-world: Users visit only websites in the training set.
- Open-world: Users visit both monitored and unmonitored websites.
This project explores the full pipeline, including data preprocessing, feature extraction, model training, and evaluation, to analyze classification performance and traffic patterns.
The dataset consists of two files: mon_standard.pkl
and unmon_standard10.pkl
. The mon_standard.pkl
file contains data from monitored websites with 95 classes representing 95 different websites. It includes 19,000 instances, where each website has 10 subpages observed 20 times. The unmon_standard10_3000.pkl
file contains 3,000 instances of unmonitored websites.
In the closed-world scenario, it is assumed that users visit only monitored websites. The objective in this case is to classify traffic into the 95 monitored website classes. In the open-world scenario, users can visit both monitored and unmonitored websites. This involves two tasks: binary classification and multi-class classification. For binary classification, monitored instances are labeled as 1
(positive), and unmonitored instances are labeled as -1
(negative). For multi-class classification, monitored instances are assigned labels {0, 1, ..., 94}
, and unmonitored instances are labeled as -1
.
- Resources
- Total RAM: 8.00 GB
- Available RAM: 1.41 GB
- Total CPU Cores: 8 cores
Follow these steps to execute the project in a Jupyter Notebook environment.
-
Install Jupyter Notebook:
pip install notebook
-
Download the Notebook File:
Go to the project GitHub repository and download the file containing
BEST
in its name -
Open the Notebook:
Place the downloaded notebook file in your working directory, then launch Jupyter Notebook:
jupyter notebook
-
Access the Dataset:
Use the shared drive link: https://drive.google.com/drive/folders/13sDplxKUNmntbYr6WhpqQARiBvH41Oum
-
Download Files:
- Download
mon_standard.pkl
for monitored websites. - Download
unmon_standard10_3000.pkl
for unmonitored websites.
- Download
- Python libraries
pip install numpy scikit-learn matplotlib tensorflow
This experiment evaluates the performance of website fingerprinting models on defended traffic in the Tor network. The defended traffic employs a multiple-path communication defense mechanism to reduce the amount of observable traffic for an attacker. The goal is to determine whether the implemented model can outperform the accuracy of TrafficSilver-Net, a baseline model for defended traffic scenarios.
In a traditional Tor network, attackers can infer user activity by monitoring traffic between the Entry Node and the user. A proposed defense method involves using multiple communication paths between the Tor user and middle proxies. This approach reduces the amount of traffic an attacker can observe, potentially mitigating the risk of accurate fingerprinting.
The provided defended traffic dataset simulates this defense mechanism. The task is to classify website traffic using the defended dataset and compare the model’s performance to TrafficSilver-Net.
-
Install Jupyter Notebook:
pip install notebook
-
Download the Notebook File:
Go to the project GitHub repository and download the file containing
BEST
in its name -
Open the Notebook:
Place the downloaded notebook file in your working directory, then launch Jupyter Notebook:
jupyter notebook
-
Access the Dataset:
Use the shared drive link: https://drive.google.com/drive/folders/13sDplxKUNmntbYr6WhpqQARiBvH41Oum
-
Download Files:
ts/mon.zip
ts/unmon.zip
- Python libraries
pip install numpy scikit-learn matplotlib tensorflow
This repository contains experimental code and results for closed-world multi classification scenarios. The project explores various techniques to improve classification performance, including feature selection, model selection, hyperparameter optimization. Use closed-multi-BEST.ipynb for the final and best-performing model configuration, along with visual insights into data and model performance. The repository is organized into the following files:
- Purpose: Compares model performance using 4-feature sets: 8, 24, 32 and 26 features.
- Details:
- Includes results for the 8, 24, 32, 26 features model.
- Experiments with the 24 features set are integrated into the
closed-multi-BEST.ipynb
file for final performance.
- Outcome: Demonstrates the significance of feature selection in boosting model accuracy and F1-score.
- Purpose: Uses
GridSearchCV
to find the best hyperparameters for the model. - Details:
- Performs hyperparameter tuning experiments.
- Finalized hyperparameters are implemented in
closed-multi-BEST.ipynb
.
- Outcome: Identifies optimal hyperparameters for enhanced performance.
- Purpose: Uses
RandomizedSearchCV
to find the best hyperparameters for the model. - Details:
- Performs hyperparameter tuning experiments.
- Outcome: Identifies optimal hyperparameters for enhanced performance.
- Purpose: finds the most optimal model.
- Details:
- Based on 24 fixed features, experiments were conducted with three models: Extra Tree, Random Forest, and XGBoost under the same conditions.
- Extra Tree, which performed best, is applied in the
closed-multi-BEST.ipynb
file.
- Purpose: Consolidates all the best experimental results and final configurations.
- Details:
- Combines the 24 features set, optimized hyperparameters and model.
- Contains the highest-performing model implementation.
- Includes detailed result visualizations:
- Macro-average PR Curve : Shows the overall Precision-Recall (PR) curve averaged across all classes.
-
- Precision-Recall Curve per class** : Displays individual PR curves and PR AUC scores for each of the 95 classes.
- Top 10 Precision-Recall Curve : Highlights the PR curves of the top 10 performing classes.
- Performance at Different Threshold : Visualizes the model’s precision, recall, and F1 score performance at various classification thresholds to identify the optimal threshold.
- ROC Curve : Presents ROC curves for each of the 95 classes along with the macro-average ROC curve.
This repository contains experimental code and results for open-world binary classification scenarios. The project explores various techniques to improve classification performance, including feature selection, hyperparameter optimization, and upsampling methods. Use open-binary-BEST.ipynb for the final and best-performing model configuration, along with visual insights into data and model performance. The repository is organized into the following files:
- Purpose: Compares model performance using two feature sets: 8 features versus 24 features.
- Details:
- Includes results for the 8-feature model.
- Experiments with the expanded 24-feature set are integrated into the
open-binary-BEST.ipynb
file for final performance.
- Outcome: Demonstrates the significance of feature selection in boosting model accuracy and F1-score.
- Purpose: Uses
RandomizedSearchCV
to find the best hyperparameters for the model. - Details:
- Performs hyperparameter tuning experiments.
- Finalized hyperparameters are implemented in
open-binary-BEST.ipynb
.
- Outcome: Identifies optimal hyperparameters for enhanced performance.
- Purpose: Addresses dataset imbalance using SMOTE upsampling.
- Details:
- Balances the monitored and unmonitored datasets.
- Final improvements, including reduced false positive rates, are reflected in the
open-binary-BEST.ipynb
file.
- Purpose: Consolidates all the best experimental results and final configurations.
- Details:
- Combines the expanded 24-feature set, optimized hyperparameters, and balanced dataset.
- Contains the highest-performing model implementation.
- Includes detailed result visualizations:
- Data Distribution Before and After SMOTE: Shows class balance improvements through upsampling.
- Confusion Matrix: Highlights model performance in correctly classifying monitored and unmonitored data.
- Precision-Recall Curve: Evaluates precision and recall trade-offs across different thresholds.
- ROC Curve: Demonstrates the model’s ability to distinguish between classes with a high AUC score.
This repository contains experimental code and results for open-world binary classification scenarios. The project explores various techniques to improve classification performance, including feature selection, hyperparameter optimization, and upsampling methods. Use open_multi_up_rf_feature26_BEST.ipynb for the final and best-performing model configuration, along with visual insights into data and model performance. The repository is organized into the following files:
- Purpose: experiments with down sampled data, 8 features.
- Details: Includes results for the 8-feature model. (3 models(random forest, gradient boosting, svm)
- Outcome: Demonstrates that the result of down sampling with any model is poor.
- Purpose: experiments with down sampled data, 24 features.
- Details: Includes results for the 24-featured random forest model.
- Outcome: Demonstrates that the result of down sampling with any model and any number of features is poor.
- Purpose: Addresses dataset imbalance using SMOTE upsampling.
- Details: Balances the monitored and unmonitored datasets.
- Purpose: Addresses dataset imbalance using SMOTE upsampling.
- Details: Balances the monitored and unmonitored datasets.
- Purpose: Consolidates all the best experimental results and final configurations.
- Details:
- Combines the expanded 26-feature set, optimized hyperparameters, and balanced dataset.
- Contains the highest-performing model implementation.
- Includes detailed result visualizations:
- Data Distribution Before and After SMOTE: Shows class balance improvements through upsampling.
- Precision-Recall Curve: Evaluates precision and recall trade-offs across different thresholds.
- ROC Curve: Demonstrates the model’s ability to distinguish between classes with a high AUC score.
-
Purpose:
Classify monitored website traffic using the Deep Fingerprinting (DF) model.
-
Details:
Parse network data, extract features, and train a neural network for multi-class classification.
-
Outcome:
Achieved high accuracy in identifying website traffic patterns.
Daye Jang, Hayeon Doh, Sejin Park, Sojeong Lee, Sunho Kwak