PROBLEM SYNOPSIS: With online sales gaining popularity, tech companies are exploring ways to improve their sales by analysing customer behaviour and gaining insights about product trends. Furthermore, the websites make it easier for customers to find the products they require without much scavenging. This is done by tracking their clicks on the website and searching for patterns within them. The clickstream data contains all the logs as to how the customer navigated through the website. It also contains other details such as time spent on every page, etc. From this, tech companies make use of data ingesting frameworks such as Apache Kafka or AWS Kinesis in order to store it in frameworks such as Hadoop. From there, machine learning engineers or business analysts use this data to derive valuable insights.
PROJECT AIM: To extract data and gather insights from a real-life data set of an e-commerce company, using AWS EMR and S3, and Hadoop and Hive systems.
SKILLS REQUIRED: Hive Querying (HQL), working with AWS EMR & S3