Part of the Recommender Systems exam at Politecnico di Milano consists in a kaggle challenge. In this repository you can find all the files that have been used for the competition.
Note that the base (non hybrid) recommenders come from this repository.
The complete description of the problem to be solved can be found in the kaggle competition link (check the top of the read.me). Shortly, given the User Rating Matrix and the Item Content Matrix, the objective of the competition was to create the best recommender system for a book recommendation service by providing 10 recommended books to each user. In particular the URM was composed by around 135k interactions, 7947 users and 25975 item; the ICM instead contained, for each book, a subset of 20000 possible tokens (some books had less than 10 tokens, other more than 40).
Note that the evaluation metric for this competition was the mean average precision at position 10 (MAP@10).
The final model used for the best submission is an hybrid recommeder created by averaging predictions of different models. The idea is that, if the composing models have all good performances and are different enough, the combined predictions will improve since different models are able to capture different aspects of the problem. The final hybrid is the results of several steps:
Simple hybrids: item scores of two basic recommenders are normalized and combined, hyperparameters jointly optimized. P3Alpha + ItemKNNCBF gave the best results (MAP@10 = 0.08856 on public leaderboard)
Multilevel hybrids: Instead of a simple recommender, pass to an hybrid other two hybrids as components (Basic block: P3Alpha + ItemKNNCBF hybrid). Just normalize and mix scores. We may use the same hybrids with different hyperparameters; also some are trained with just URM, others with URM concatenated with ICM (MAP@10 = 0.09159 on public).
Specialized hybrids: the basic idea is to tune hyperparameters of some hybrids to make better predictions only for cold or only for warm users. In practice: set a threshold, force an hybrid to make random predictions if the user profile lenght is below/above it, and do hyperparameter tuning. Then combine different specialized hybrids in multilevel way: the final recommender contains specialized hybrids for 4 user groups created by counting the number of user interactions (MAP@10 = 0.09509 on public).
IALS: add IALS recommeder to the final hybrid (very different model from the previous ones). Using URM concatenated with ICM improved performance in CF and CBF algorithms, and improved also this ML model. Since this algorithm is very slow, tune with max 300 factors, and assume will work for more; also tune carefully the hyperparameter alpha.
Best model overall: hybrid of previous best multilevel specialized hybrid and IALS with n_factors = 1200 and alpha = 25, MAP@10 = 0.09877 (public), 0.10803 (private).
In this repo you can find the implementation of different recommender systems; in particular the following models can be found in the Recommenders folder:
- Item and User based Collaborative Filtering
- Item Content Based Filtering
- P3Alpha and RP3Beta Graph Based models
- Pure SVD and Implicit Alternating Least Squares models
- Slim BPR and Slim ElasticNet
- Hybrids and multi-level hubrids used for the final ensamble
The requirements.txt
file lists all the libraries that are necessary to run the scripts. Install them using:
pip install -r requirements.txt
Some of the models use Cython implementations. As written in the original repository you have to compile all Cython algorithms. In order to compile you must first have installed: gcc and python3 dev. Under Linux those can be installed with the following commands:
sudo apt install gcc
sudo apt-get install python3-dev
If you are using Windows as operating system, the installation procedure is a bit more complex. You may refer to the official guide.
Now you can compile all Cython algorithms by running the following command. The script will compile within the current active environment. The code has been developed for Linux and Windows platforms. During the compilation you may see some warnings.
python run_compile_all_cython.py
To see a plot of MAP@10 for the best model and the hybrids composing it on various user groups, you can run the following command:
python HybridFinalParall.py
Note that the script tries to train in parallel as many recommenders as possible, and this may cause problems on machines with less than 16GB of RAM.
- Ranked 2nd among 70 teams
- MAP@10 = 0.10803 on Kaggle's private leaderboard