Sberbank AutoML solution

Dataset preparation

If the dataset is big (>2GB) then we calculate features correlation matrix and the delete correlated features
Else we make Mean Target Encoding and One Hot Encoding.
After that, we select top-10 features by coefficients of the linear model (Ridge/LogisticRegression)
We generate new features by pair division from top-10 features. This method generates 90 new features (10^2–10) and concatenates it to the dataset.

If the dataset is small then we can train three LightGBM models by k-folds, after that blend prediction from every fold.
If the dataset is big and the time limit is small (5 minutes) then we just train linear models (logistic regression or ridge)
Else we train one big LightGBM (n_estimators=800)

5th place on private leaderboard

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
solution		solution
.DS_Store		.DS_Store
README.md		README.md
README_RUS.md		README_RUS.md
compress_all.sh		compress_all.sh