Identify the level of income qualification needed for the families in Latin America
The aim of this project is to identify the segment of the population in a certain location based on their income and living condition.
This segment identification will help to rightfully distribute aid for people in need. There are many factors that need to be addressed to clasify poverty level of one's family. This machine laerning project employed classification algorithm will decide which factors among those that fit to be considered in assigning the poverty level of the family in Latin America.
- Understand the type of data
- Output variable identification
- Data cleaning, include:
- Replace null value(s) if any
- Remove redundant columns
- Check whether all members of the house have the same poverty level:
- Is there a house without a family head?
- Set poverty level of the members and the head of the house within a family
- Check if there are any biases in the dataset
- Predict the accuracy using random forest classifier
- Check the accuracy using a random forest with cross-validation
- Find the important variable in the random forest model