This data set comes from a Kaggle competition and Coursera course to predict monthly sales for a store with multiple physical locations. There are two solutions in this repository. The first solution uses lagged time based encodings for total item sales, total shop sales, and average values. The second solution uses lag time based encodings and stacking of multiple machine learning algorithms.
There are two versions of the model. The first version uses mean encodings. The second version uses mean encodings with stacking.
You will need to download the dataset here. You will also need to install anaconda or at least the Python libraries that come with anaconda (jupyter, pandas) as well as XGBoost.
There is nothing to install other than what is listed under the pre-requisites.
- Python
- Pandas
- XGBoost
- Scikit-learn