This repository is related to this Medium post, where I explained that we often take it for granted that machine learning models will meet our expectations and assume that features can predict the dependent variables — at least to some extent. However, sometimes our expectations are not met, and we can't tackle the problem in a routine and straightforward way. In such cases, even simple baselines like the mean can outperform complex models. Can we still work around these issues and apply machine learning models to such data?
You can read the post first, and for more plots and analysis, look at the Jupyter notebook file, car_insurance_premium.ipynb
, which presents the analysis of the car insurance claim datasets freMTPL2freq
and freMTPL2sev
.