Analytics Vidhya JanataHack Demand Forecasting

I secured 6th rank in the competition and the final model was an xgboost model trained on data with only a few additional features. Final versions of the models I tried are shared here.

Description Of Problem

Given 3 years of SKU sales data from 76 stores, the problem is to predict sales for next 12 weeks.

Evaluation

The evaluation metric was 100*RMSLE (Root Mean Squared Log Error). This eval metric is available in xgboost out-of-the-box.

Data

Data can be downloaded from the contest page JanataHack: Demand Forecasting - Problem Statement

Fields in data

record_ID (unique ID for each row)
week (starting date of the week)
store_id
sku_id
base_price
total_price (in case of discount total_price < base_price)
is_featured_sku (flag for whether sku_id is featured product of the week)
is_display_sku (flag for whether sku_id is displayed product of the week - this will potentially mean banners , carousels etc)
units_sold (target)

Models

1. Forecasting using fbprophet

Experiment Set 1: I used fbprophet library to fit time series model for each (store, sku) pair. This gave scores in range 785-809 (public), 710-739 (private).
Experiment Set 2: Log transformed dependent variable and fitted prophet model. Score: 638.649013296102(public), 577.817887311402 (private)
Experiment Set 3: Added regressors - total_price, base_price, is_featured_sku, is_display_sku to the model. Score jumped to 497.965157591909 (public), 563.605151425418 (private)
Experiment Set 4: I added discounting, featured and display as special events to model. I added total price / max total price for sku as signal for discounted price for sku. This model gave the scores 499.384415259399 (public) , 566.333485529977 (private).

This model takes ~ 30 mins to train. I didn't spend more time to tune prophet model or add more signals but there definitely looked scope to further improve it.

2. Forecasting using xgboost

Additional features added to the xgb models were week_of_year, week_num (from starting of data), max total price for sku and total price/ max total price for sku. Target variable was log transformed sales.

Experiment Set 1: xgb model performed will right from the start. I added store and sku as features to the model since test set had the same stores and sku. The baseline model score was 564.961275573224 (public),569.959964232437 (private)
Experiment Set 2: I created model for each store independently but it didn't do well -- probably there was a bug?
Experiment Set 3: Added lagged units_sold as features - that didn't do well either -- probably a bug in code which I didn't get into.
Experiment Set 4: Tuned the first xgb model to get the final model. Additional features added to the xgb models were week_of_year, week_num (from starting of data), max total price for sku and total price/ max total price for sku. The final model gives score 386.025005238819 (public), 424.879932387956 (private) with rank 6 overall in the competition.

Approach for Best Performing Model

The best performing model for me was xgboost. The training process is as follows:

Train-Eval Split: Last 12 weeks of data for each store and sku was used as eval set and rest was training set.
Training: Trained xgboost with early stopping on eval set.
Final Model: Trained final model on full data with best iterations learned in previous step.

PS: I write about Data Science, Machine Learning and AI at dilbertai.com, where I share ideas and concepts I have learned over the course of my work, summarize research papers I read and practical knowledge I have gained from use cases I have worked on.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md
fbprophet-av-janatahack-demand-forecasting.ipynb		fbprophet-av-janatahack-demand-forecasting.ipynb
xgb-av-janatahack-demand-forecasting.ipynb		xgb-av-janatahack-demand-forecasting.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analytics Vidhya JanataHack Demand Forecasting

Description Of Problem

Evaluation

Data

Models

1. Forecasting using fbprophet

2. Forecasting using xgboost

Approach for Best Performing Model

About

Releases

Packages

Languages

silpara/av-janatahack-demand-forecasting

Folders and files

Latest commit

History

Repository files navigation

Analytics Vidhya JanataHack Demand Forecasting

Description Of Problem

Evaluation

Data

Models

1. Forecasting using fbprophet

2. Forecasting using xgboost

Approach for Best Performing Model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages