This project involves analysing the airline datasets to solve the problem statements using HADOOP.
There are multiple ways to approach this problem such as PIG, Map Reduce algorithms etc. I'm using the HIVE approach. It's the most popular and widely used approach when it comes to analysing datasets within Hadoop!
Problem Statements:
Find list of Airports operating in the Country India Find the list of Airlines having zero stops List of Airlines operating with code share Which country (or) territory having highest Airports Find the list of Active Airlines in United state
In this use case, there are 3 data sets:
Final_airlines, routes.dat, airports_mod.dat