In this project, I'm applying Data Engineering techniques to analyze the data from Figure Eight to build a classification model for an API that classify the Disaster message into one of the given 36 categories and contact the particular category company to help these peoples.
In data, Directory contains the dataset of real Disaster messages amd categories. You will also get the update data from Figure Eight website. I use these datasets to perform ETL operations to clean the data that can easily feed into the Machine Learning model and get better results. In Data Engineering, the main task is to clean the data and make a pipeline to use multiple algorithms at a time and get better accuracy.
To make it useable for not a technical person, I'll make a user-friendly web-app and help it, users, to easily type their queries and get the response from the appropriate relief agency.
- Load the dataset ('Disaster_messages.csv and Disaster_categories.csv')
- Merge the dataset
- Perform Data wrangling techniques to clean tha data
- Store the data into SQlite Database
- Load data from SQlite Database
- Split the data into testing and training set
- Build a pipeline of text processing (NLP) and Machine Learning Model
- Train and tune the model using GridSearchCV
- Predict the output on test data
- Export the final model into modelname.pkl format
- Load the Database and model
- run the command python app/
- webpage is open in a given address
Go to the data directory and run the following command.
python data/ data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- includes the ETL pipeline
- Add the dataset file path
- Make a Sqlite Database file
Go to the models directory run the following command.
python models/ data/DisasterResponse.db models/classifier.pkl
- train_Classifier includes the ML model
- Add the database
- Make a classifier.pkl pickel file
Go to the app directory run the following command.
python app/
Web-app open on the browser
Add the message into the search bar
And the result category is shown below
. ├── app │ ├── FLASK FILE THAT RUNS APP │ ├── static │ │ └── favicon.ico---------------# FAVICON FOR THE WEB APP │ └── templates │ ├── go.html-------------------# CLASSIFICATION RESULT PAGE OF WEB APP │ └── master.html---------------# MAIN PAGE OF WEB APP ├── data │ ├── DisasterResponse.db-----------# DATABASE TO SAVE CLEANED DATA TO │ ├── disaster_categories.csv-------# DATA TO PROCESS │ ├── disaster_messages.csv---------# DATA TO PROCESS │ └── PERFORMS ETL PROCESS ├── img-------------------------------# PLOTS FOR USE IN README AND THE WEB APP ├── models │ └── PERFORMS CLASSIFICATION TASK
Thanks Udacity for this course that helps me to sharpen my skills and able me to learn new technologies and real-world data.