Skip to content

Latest commit

 

History

History
52 lines (39 loc) · 1.98 KB

README.md

File metadata and controls

52 lines (39 loc) · 1.98 KB

DDAM

"Distributed Data Analysis and Mining" Class' Team Project - MSc in Data Science and Business Informatics @ University of Pisa


Logo

eCommerce behavior Dataset

End of Course Project A.Y. 2023/24
Read the report »

Data source · View code

Team members

About the Dataset

eCommerce behavior data from multi-category store

The dataset taken into account contains behavior data for only 1 month (March 2020) from a large multi-category online store.

Each row in the file represents an event. All events are related to products and users. Each event is like many-to-many relations between products and users.

There are different types of events. Semantics (or how to read it):
User user_id during session user_session added to shopping cart (property event_type is equal cart) product product_id of brand brand of category category_code with price price at event_time.

About the project

This project consists of analyzing a large amount of eCommerce data in order to predict the users' behavior with data mining and Hadoop (Spark) tools.

The project is divided into four parts as follows:

  • Data Reduction
  • Understanding & Preparation
  • Features Extraction
  • Classification