Kaggle/Santander Customer Satisfaction

Abstract

Santander Customer Satisfaction Competition

Host : Santander Bank, a British bank, wholly owned by the Spanish Santander Group.
Prize : $ 60,000
Problem : Binary Classification
Evaluation : AUC
Period : Mar 2 2016 ~ May 2 2016 (61 days)

Santander Bank is asking Kagglers to help them identify dissatisfied customers early in their relationship. Doing so would allow Santander to take proactive steps to improve a customer's happiness before it's too late.

This competition requires you to deal with combination of sparse, noisy and weak predictors effectively. Since the feature names are provided with anonymity, there is limitation in feature interactions, and general scheme of feature engineering spans from clipping outliers, dropping duplicate and sparse columns, extracting PCA/K-means/t-SNE features and adding count of zeros. Even with enough feature engineering, a single model prediction will have difficulty climbing the ladder of Private LB.

This competition requires an ensemble of multiple models and variety of feature+model combinations as well as more iterated blending techniques to rise to the top of the ladder. 3rd place team had 6 members each with multiple feature+model predictions, they all utilized 5-fold x 20 times cv seed, and ensembled all of their predictions to reach the top place. A solution from one of the 3rd place team ranks 509th place (Top 10%) all by itself. This tells us the importance of ensemble in this competition.

Although time consuming, well structured cv seed and adding diversity in single models by combining multiple feature+model leads to more generalizable result.

My re-implentation includes partial re-implementation of winner's solutions, hence their Private LB rank is not high as their actual rank.

Result

Submission	CV LogLoss	Public LB	Rank	Private LB	Rank
bare_minimum	0.800430	0.797948	3986	0.785805	3958
kweonwooj redux	0.821226	0.836279	3029	0.822967	2917
kweonwooj		0.840797	1667	0.826500	947
toshi_k redux	0.8418	0.840002	2122	0.826658	810
wpppj redux	0.841137	0.839622	2232	0.826814	556
mathias from Team Leustago		0.839441	2302	0.826881	509

Total teams : 5,123

How to Run

[Data]

Place data in input directory. You can download data from here.

[Code]

Above results can be replicated by runinng

python code/main.py

for each of the directories.

Make sure you are on Python 2.7 with library versions same as specified in requirements.txt

Add requirements.txt

[Submit]

Submit the resulting csv file here and verify the score.

Expected Result

for bare minimum

for mathias's solution form Team Leustago (3rd place)

What I've learned

Importance of ensemble in squeezing the private LB score
Techniques of adding diversity to the prediction
- combination of feature sets
- combination of algorithms
- multiple sets of 5-fold cross validation
- power of rank averaging in combining multiple submissions

Winning Methods

3rd place solution on Forum, Github by Dmitry Efimov
7th place solution on Forum by Francisco Javier Díaz Herrera
12th place solution on Forum by Antonio José Navarro Céspedes
13th place solution on Forum by Ouranos
34th place solution on Forum, Github by wpppj
44th place solution on Forum, Github by toshi_k

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
34_wpppj		34_wpppj
3_Leustagos		3_Leustagos
44_toshi		44_toshi
bare_minimum		bare_minimum
input		input
kweonwooj		kweonwooj
README.md		README.md
requirements.txt		requirements.txt
work_diary.numbers		work_diary.numbers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle/Santander Customer Satisfaction

Abstract

Result

How to Run

Add requirements.txt

Expected Result

What I've learned

Winning Methods

About

Releases

Packages

Languages

kweonwooj/kaggle_santander_customer_satisfaction

Folders and files

Latest commit

History

Repository files navigation

Kaggle/Santander Customer Satisfaction

Abstract

Result

How to Run

Add requirements.txt

Expected Result

What I've learned

Winning Methods

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages