Skip to content

End to end implementation of Machine Learning Airline Flight Fare Prediction using python

Notifications You must be signed in to change notification settings

sonu275981/Flight-Fare-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Airline Flight Fare Prediction

Problem Statement

Flight ticket prices can be something hard to guess, today we might see a price, check out the price of the same flight tomorrow, it will be a different story. We might have often heard travelers saying that flight ticket prices are so unpredictable. As data scientists, we are gonna prove that given the right data anything can be predicted. Here you will be provided with prices of flight tickets for various airlines between the months of March and June of 2019 and between various cities. Size of training set: 10683 records

About the Project

  • The Airline Flight Fare Prediction project is to predict airline flight fares across the Indian cities. The dataset for the project is taken from Kaggle, and it is a time-stamped dataset so, while building the model, extensive pre-processing was done on the dataset especially on the date-time columns to finally come up with a ML model which could effectively predict airline fares across various Indian Cities. The dataset had many features which had to pre-processed and transformed into new parameters for a cleaner and simple web application layout to predict the fares. The various independent features in the dataset were:

  • Airline: The name of the airline.

  • Date_of_Journey: The date of the journey

  • Source: The source from which the service begins.

  • Destination: The destination where the service ends.

  • Route: The route taken by the flight to reach the destination.

  • Dep_Time: The time when the journey starts from the source.

  • Arrival_Time: Time of arrival at the destination.

  • Duration: Total duration of the flight.

  • Total_Stops: Total stops between the source and destination.

  • Additional_Info: Additional information about the flight

  • Price: The price of the ticket

The code is written in Python 3.6.10. If you don't have Python installed, you can find it on google. If you are using a lower version of Python you can upgrade using the pip package, ensuring you have the latest version of pip. To install the required packages and libraries, check the project file in the project directory after cloning the repository.

Cleaning the Data

I needed to clean it up so that it was usable for our model. I made the following changes and created the following variables:

  • Calculated the total flight duration
  • Removed the null values
  • Removed the outliers

Model Building

First, I transformed the categorical variables into dummy variables. I also split the data into train and tests sets with a test size of 30%.

I tried six different models and evaluated them. using r2_score.

Model Accuracy

GradientBoostingRegressor

  • 0.581642

Releases

No releases published

Packages

No packages published