Titanic_Survival

Completed by Sonakshi Chauhan.

Overview: This project is using the Titanic Dataset to create a model that will

-return a conditional survival probabily of a passenger -Help you comapre and contrast all the Classification models based on accuracy -data vizualizations given a condition on a numerical variable from the dataset.

Problem Statement: Build a model that will return a passengers survival chance given a passengers detail as input.

Data: Titanic Kaggle Challenge

Deliverables: Probability

Ahoy! Let's Sail

Topics Covered

Statistical Modeling
Imputation of Missing values
Probability
Various Classification Techniques

Tools Used

Scikit-learn
Google Colab

Installation and Usage

Ensure that the following packages have been installed and imported.

pip install numpy
pip install pandas
pip install seaborn

Jupyter Notebook - to run ipython notebook (.ipynb) project file

Follow instruction on https://docs.anaconda.com/anaconda/install/ to install Anaconda with Jupyter. Alternatively: VS Code can render Jupyter Notebooks

Notebook Structure

The structure of this notebook is as follows: -Imports -Data Loading -Data Pre-processing -Data Analysis -Data Vizualization -Encoding -Supporting Target and Features -Spliting Data -Model Training -Testing and Prediction

Data Pre-Processing

->observing the data above we found it had missing columns and rows ->We dropped the 'Cabin' column as it had highest number pf missing values ->We manipulated the 'Age' and 'Embarked Column'

Data Analysis

->Prediction has to be made depending on the survival number ->Here we analyze the number of survived people according to different classes

Data Vizualizations

->Here we vizualize our data to have a better understanding of highest survivval rates are from which category.

#Categorial Encoding ->Here we encode all the values numerically so as to ensure similarity in data types

#Supporting Target and Features ->Here we divide data into dependent and independent variables mainly 'Y' having the dependent value and 'X' having independent values

#Splitting our Dataset into Train and Test Set ->Using sklearn library we split our dataset into train and test

Model Training

->First we scale our train and test set values -> Here we train multiple classification models to choose which one is more accurate -> We find RandomForest more accurate and move ahead with it.

#Prediction and accuracy ->This is the final step where we test and make predictions on our model

#Conclusion ->We built a Classifier using Random Forest technique to predict titanic survival rates

Contact: sonakshichauhan1402@gmail.com

Project Continuity

This is project is complete

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
RMS_Titanic.ipynb		RMS_Titanic.ipynb
titanic.zip		titanic.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic_Survival

Topics Covered

Tools Used

Installation and Usage

Jupyter Notebook - to run ipython notebook (.ipynb) project file

Notebook Structure

Data Pre-Processing

Data Analysis

Data Vizualizations

Model Training

Project Continuity

Contributing

About

Releases

Packages

Languages

SonakshiChauhan/Titanic_Survival

Folders and files

Latest commit

History

Repository files navigation

Titanic_Survival

Topics Covered

Tools Used

Installation and Usage

Jupyter Notebook - to run ipython notebook (.ipynb) project file

Notebook Structure

Data Pre-Processing

Data Analysis

Data Vizualizations

Model Training

Project Continuity

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages