Skip to content

crncck/DataScienceFinalProject

Repository files navigation

DataScienceFinalProject

In this project, we analyzed the bookcrossing dataset. The dataset contains three CSV files which are ratings, users, and books. We explored the data and preprocessed it. We applied a dimesionaltiy reduction technique (PCA), three classification algorithms (Logistic Regression, Decision Tree, K-Nearest Neighbors), and two clustering algorithms (K-Means and Hierarchical) to build models from the dataset. Our project contains the following parts:

  • Dataset

  • Exploratory data analysis

  • Visualization techniques

  • Imbalanced data set

  • Missing data imputation

  • Multicollinearity

  • Logistic Regression

  • PCA

    • PCA with Logistic Regression
  • Clustering

    • K-Means Clustering
    • Hierarchical Clustering
      • Missing Data with Hierarchical Clustering
  • Classification

    • Decision Tree
      • Decision tree with imbalanced data
    • K-Nearest Neighbors (K-NN)

Project Members:

Ayşe Ceren Çiçek & Gizem Kurnaz

About

Data analysis of the book crossing dataset with R

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published