LGBM Solution to AMEX Default Prediction Kaggle Competition

This GitHub repository contains our code for the American Express Default Prediction competition. Developped in collaboration with Ben See Jian Rong and Younes Siyar, our team utilized LightGBM, an advanced machine learning technique, to predict credit card defaults. This solution highlights our strategic approach to handle a massive dataset with a complex structure, focusing on efficiency and scalability to meet the challenges posed by real-world financial data.

Competition Description

The competition, hosted by American Express, focused on using machine learning to predict credit default. The challenge involved handling an industrial-scale dataset with complex features including delinquency, spend, payment, balance, and risk variables.

Challenges of the Problem

Key challenges included managing the large dataset size, addressing data quality issues, and the intensive computational requirements.

Proposed Solution

Our approach utilized LightGBM due to its efficiency and scalability. Initial experiments with Random Forest provided insights but were not scalable for the competition's dataset. We opted for LightGBM, a gradient boosting framework that uses tree-based learning algorithms, known for its superior handling of large datasets.

Key Steps in Our LightGBM Solution:

Data Preprocessing: To handle the vast amounts of data, we employed techniques such as handling missing values, encoding categorical variables, and reducing dimensionality where feasible.
Feature Engineering: We crafted features that could capture nuances in the data, significantly impacting the model's predictive power.
Model Configuration: Tuning LightGBM parameters like learning rate was crucial to optimize our model for accuracy and efficiency.
Validation Strategy: To ensure the robustness of our model, we implemented a cross-validation strategy, which helped in identifying stable and reliable model configurations.

Solution Effectiveness

Extensive data preprocessing and hyperparameter tuning led to significant improvements:

AUC: 0.97966
Binary Log Loss: 0.167891
AMEX Score: 0.78410

Conclusion

This project provided a profound learning experience in handling real-world data and emphasized the importance of model selection, data quality, preprocessing, and collaborative teamwork.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
lgbm-solution-to-amex-competition.ipynb		lgbm-solution-to-amex-competition.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LGBM Solution to AMEX Default Prediction Kaggle Competition

Table of Contents

Competition Description

Challenges of the Problem

Proposed Solution

Key Steps in Our LightGBM Solution:

Solution Effectiveness

Conclusion

About

Releases

Packages

Languages

Nicolas-Bolouri/LGBM-Solution-Amex-Kaggle

Folders and files

Latest commit

History

Repository files navigation

LGBM Solution to AMEX Default Prediction Kaggle Competition

Table of Contents

Competition Description

Challenges of the Problem

Proposed Solution

Key Steps in Our LightGBM Solution:

Solution Effectiveness

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages