Skip to content

Latest commit

 

History

History
27 lines (22 loc) · 1.06 KB

README.md

File metadata and controls

27 lines (22 loc) · 1.06 KB

DataScienceFinalProject

In this project, we analyzed the bookcrossing dataset. The dataset contains three CSV files which are ratings, users, and books. We explored the data and preprocessed it. We applied a dimesionaltiy reduction technique (PCA), three classification algorithms (Logistic Regression, Decision Tree, K-Nearest Neighbors), and two clustering algorithms (K-Means and Hierarchical) to build models from the dataset. Our project contains the following parts:

  • Dataset

  • Exploratory data analysis

  • Visualization techniques

  • Imbalanced data set

  • Missing data imputation

  • Multicollinearity

  • Logistic Regression

  • PCA

    • PCA with Logistic Regression
  • Clustering

    • K-Means Clustering
    • Hierarchical Clustering
      • Missing Data with Hierarchical Clustering
  • Classification

    • Decision Tree
      • Decision tree with imbalanced data
    • K-Nearest Neighbors (K-NN)

Project Members:

Ayşe Ceren Çiçek & Gizem Kurnaz