Skip to content
This repository has been archived by the owner on Oct 16, 2023. It is now read-only.

New York πŸ—½ school bus 🚌 data stream analysis. Uniroma2, SABD 2019/2020, Marco Balletti and Francesco Marino's second project.

Notifications You must be signed in to change notification settings

fmarino-412/SABD_project2_BallettiMarino

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SABD 2019/2020 second project

Authors: Marco Balletti, Francesco Marino

Project structure descritption

Folder containing the input dataset as a CSV file (dataset.csv).

Folder containing scripts and file for a container based execution of the project architecture:

  1. start-dockers.sh creates the Kafka Cluster and necessary Kafka topics,
  2. stop-dockers.sh stops and deletes the Kafka Cluster after the created topics deletion and
  3. docker-compose.yml is the Docker Compose file used to create the container infrastructure.

Folder containing benchmark results (under Benchmark directory), project report and presentation slides.

Folder containing Flink computation results as CSV files:

  1. query1_daily.csv containing the output of the first query evaluated by daily windows,
  2. query1_weekly.csv containing the output of the first query evaluated by weekly windows,
  3. query1_monthly.csv containing the output of the first query evaluated by monthly windows,
  4. query2_daily.csv containing the output of the second query evaluated by daily windows,
  5. query2_weekly.csv containing the output of the second query evaluated by weekly windows,
  6. query3_daily.csv containing the output of the third query evaluated by daily windows and
  7. query3_weekly.csv containing the output of the third query evaluated by weekly windows.

Results are evaluated from the entire dataset content

This directory contains in its subdirectories Java code for:

  1. creation of Kafka Topic producer for input data,
  2. creation of a Flink topology to run a DSP analysis of the three queries,
  3. creation of a Kafka Streams topology to run an alternative DSP analysis of the same three queries and
  4. creation of several Kafka topic consumers for DSP output saving.

Java Project structure description

It is recommended to open the entire directory with an IDE for better code navigation. Java project part was developed using JetBrains' IntelliJ IDEA.

In the main folder there are processing architecture launchers:

This package contains classes for queries' topologies building and execution using Flink as DSP framework.

This package contains configurations for the Kafka publish-subscribe service and classes for Consumers and Producers instantiation:

This package contains classes for queries' topologies building and execution using Kafka Streams as DSP library and the KafkaStreamsConfig.java used to get properties for the stream processing library execution.

This package contains classes for queries' topologies creation:

This package contains custom Kafka Streams windows:

  • CustomTimeWindows.java that is an abstract class representing a generic custom duration time window,
  • DailyTimeWindows.java that implements a daily time window aligned to a given time zone,
  • MonthlyTimeWindows.java that implements a monthly time window (aligned to the first day of a month in a given time zone) and
  • WeeklyTimeWindows.java implementing a weekly time window (starts on Monday and ends on Sunday aligned to a given time zone).

This package contains classes needed for queries' execution support, in particular:

This package contains classes used as accumulators for both Flink and Kafka Streams processing:

This package contains utilities for latency and throughput evaluation:

This package contains utilities for delay string parsing and delay type ranking:

This package contains data serialization and deserialization utilities:

About

New York πŸ—½ school bus 🚌 data stream analysis. Uniroma2, SABD 2019/2020, Marco Balletti and Francesco Marino's second project.

Topics

Resources

Stars

Watchers

Forks