In this project, we have designed and developed a stream processing application which processes the Airline Data available from 2009 Data Expo - Airline On-Time Performance, in streaming fashion and applies database queries on the streams to answer some of the questions. The dataset has 120M records and is 10 GB in size and has 4 supporting csv files which have information of airport, carriers, planes and metadata of the dataset. The tools and technologies used in this project are Spark Structured Streaming, Flask, HTML, AmCharts.Js and CSS.
Our main objective for this project is to process the streaming flight data using Spark Structured Streaming and to be able to answer the following questions using database queries and visualizations on them (data streams): Which airline carrier is the most reliable in terms of punctuality? What were the worst months to fly historically? What are the busiest airports and paths in the United States? We aim to be able to organize and display our findings in a simple process and web application model.
Expected Users of this dashboard are, Airline staff, route planners, pilots and US domestic travellers.
Steps to run:
- git clone https://github.com/Jaini8/Airline_On_Time_Streaming_Data_Processing_And_Analysis.git
- cd Airline_On_Time_Streaming_Data_Processing_And_Analysis/flask_app
- python flask_app.py
Run : http://ilab1.cs.rutgers.edu:9996/
Video of the working: Airline On-Time Streaming Data Processing And Analysis