We will find the best machine learning pipeline for Titanic Dataset (reference from Kaggle) using genetic programming. You can find the dataset from Dataset.
The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.
One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.
Here for finding the best pipeline for titanic dataset, we will use the TPOT library from Sklearn (Scikit learn).
TPOT
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
More information regarding TPOT : TPOT Github
Some libraries which are needed to run the file Titanic_prediction.ipynb. Install using following command :
pip install matplotlib timeit sklearn pandas numpy tpot xgboost
For most data sets there’s still a lot of data cleaning, feature engineering, and final model selection to do - not to mention the most important step of asking the right questions up front. Then you might need to productionize your model. And TPOT isn’t doing exhaustive searches yet. So TPOT isn’t going to replace the data scientist role — but this tool might make your final machine learning algorithms better faster.
If any modification or suggestion, then feel free to suggest. 👍