Text Classification of Legitimate and Rogue Online Privacy Policies

Welcome to the repository for my Master Thesis Project titled "Text Classification of Legitimate and Rogue Online Privacy Policies: Manual Analysis and a Machine Learning Experimental Approach."

This project investigates the use of supervised binary classification to distinguish between legitimate privacy policies from Fortune Global 500 companies and rogue policies from less trustworthy websites. The repository contains the code, data, and documents used in the thesis work.

Project Overview

In this project, we evaluate 15 classification algorithms to determine how well they can differentiate between legitimate and rogue privacy policies. The dataset consists of 100 privacy policies from legitimate websites (top Fortune Global 500 companies) and 67 privacy policies from rogue websites. A manual analysis was conducted to measure adherence to seven general privacy principles, highlighting significant statistical differences between legitimate and rogue policies.

Our findings show that privacy policies from legitimate companies have 98% adherence to the seven privacy principles, compared to only 45% adherence in rogue companies' policies. After evaluating the classification models, the Naïve Bayes Multinomial algorithm exhibited the best performance with an AUC score of 0.90 (0.08), outperforming other candidates in statistical tests.

You can find a more detailed abstract and analysis in the full thesis document, accessible here.

Repository Contents

The repository is organized into the following key components:

Java Code for the Classification Model
The complete Java code used to implement and evaluate the 15 classification algorithms.
Privacy Policies Dataset
A zip file containing the privacy policies collected for the project, including both legitimate and rogue policies.
- File: policies.zip
Manual Analysis Dataset
A zip file containing the manual analysis of the privacy policies, including the evaluation of the seven privacy principles.
- File: manual analysis dataset.zip
Thesis Document (PDF)
The full thesis document detailing the project, analysis, and results.
- File: Master_Thesis_submission.pdf
- You can also access it online at Diva Portal.
Project Proposal (PDF)
The proposal that was submitted for the approval of the Master Thesis Project.
- File: proposal for thesis.pdf

How to Cite

Journal Article

This project was later submitted to a journal. If you would like to cite the journal article, please use the following format:

Boldt, M. and Rekanar, K., 2019. Analysis and text classification of privacy policies from rogue and top-100 fortune global companies. International Journal of Information Security and Privacy (IJISP), 13(2), pp.47-66.

Master Thesis

If you would like to cite the thesis itself, please use the following format:

Rekanar, Kaavya. "Text Classification of Legitimate and Rogue online Privacy Policies: Manual Analysis and a Machine Learning Experimental Approach." (2016). 
Available at: https://www.diva-portal.org/smash/get/diva2:1045553/FULLTEXT02

How to Use This Repository

Classification Model
The Java code can be run to replicate the experiments. It implements the binary classification models and evaluation metrics used in the project.
Privacy Policies Dataset
The zip file contains the privacy policies dataset that can be used for similar text classification problems.
Manual Analysis
The manual analysis zip file provides insights into how privacy policies were manually evaluated against the seven general privacy principles.
Thesis Document
Read the thesis PDF for a comprehensive explanation of the methodology, experiments, and results.
Project Proposal
The project proposal PDF outlines the original objectives and structure of the Master's thesis.

Contact

For any questions or further details, please feel free to reach out!

Thank you for your interest in this project! We hope the resources provided here will be useful for your research or practical applications in text classification or privacy policy analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Master_Thesis_submission.pdf		Master_Thesis_submission.pdf
README.md		README.md
TextDirectoryToArff.java		TextDirectoryToArff.java
manual analysis dataset.zip		manual analysis dataset.zip
manual_final.arff		manual_final.arff
policies.arff		policies.arff
policies.zip		policies.zip
proposal for thesis.pdf		proposal for thesis.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Classification of Legitimate and Rogue Online Privacy Policies

Project Overview

Repository Contents

How to Cite

Journal Article

Master Thesis

How to Use This Repository

Contact

About

Releases

Packages

Languages

KaavyaRekanar/Master_Thesis

Folders and files

Latest commit

History

Repository files navigation

Text Classification of Legitimate and Rogue Online Privacy Policies

Project Overview

Repository Contents

How to Cite

Journal Article

Master Thesis

How to Use This Repository

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages