Costa Rican Household Poverty Level Prediction

Analysis based on a dataset of Costa Rican household characteristics in order to improve the Proxy Means Test (or PMT) performance.

Here's the Kaggle page for this challenge.

My Approach:

The Kernel is divided into 3 parts:

Data Exploration:

Here I try to analyse the Target feature. I visualise the relationship of the Target variable with other features and use the graphs to draw conclusions.

I also identified the types of features and cleaned up the inconsistency in data here.

Preprocessing:

I do 4 important things over here -

missing values imputation
generating ordinal features from dummy features
removing redundant features
creating new household-wide features

Modelling:

I try 2 gradient boosting machines - LightGBM and XGBoost - to model the data.

I have used a 10-fold cross-validation strategy to get the CV scores of each model for comparision.

In the end, I decided on the LightGBM model and got an F1-macro score of 0.431 on the test data (currently in the top 10% of Kaggle leaderboard).

I have included the data in the ./input/ directory and my kernel is present in the ./notebook/ directory

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
input		input
notebook		notebook
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Costa Rican Household Poverty Level Prediction

My Approach:

Data Exploration:

Preprocessing:

Modelling:

About

Releases

Packages

Languages

nityeshaga/costa_rican_poverty_prediction

Folders and files

Latest commit

History

Repository files navigation

Costa Rican Household Poverty Level Prediction

My Approach:

Data Exploration:

Preprocessing:

Modelling:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages