This repository provides the source code of our paper: DE-RRD: A Knowledge Distillation Framework for Recommender System, accepted in CIKM'20 as a full research paper.
In the paper, we propose two distillation methods:
-
Distillation Experts (DE) that distills the teacher's latent knowledge.
-
Relaxed Ranking Distillation (RRD) that distills ranking information from the teacher's predictions.
We provide the leave-one-out evaluation protocol used in the paper. The protocol is as follows:
- For each test user
- randomly sample two positive (observed) items
- each of them is used for test/validation purpose.
- randomly sample 499 negative (unobserved) items
- evaluate how well each method can rank the test item higher than these sampled negative items.
- randomly sample two positive (observed) items
We provide three ranking metrics broadly adopted in the recent papers: HR@N, NDCG@N, MRR@N. The hit ratio simply measures whether the test item is present in the top-N list, which is defined as follows:
where δ is the indicator function, Utest is the set of the test users, pu is the hit ranking position of the test item for the user u. On the other hand, the normalized discounted cumulative gain and the mean reciprocal rank are ranking position-aware metrics that put higher scores to the hits at upper ranks. N@N and M@N are defined as follows:
A. For DE, run "main_DE.py"
B. For RRD, run "main_URRD.py"
We also provide the training log and the learning curve of each method. You can find them in /logs folder and the attached jupyter notebook.
Please note that Topology Distillation (KDD'21), which is a follow-up study of DE, is available in https://github.com/SeongKu-Kang/Topology_Distillation_KDD21.
Also, IR-RRD (Information Sciences'21), which is a follow-up study of RRD, is available in https://github.com/SeongKu-Kang/IR-RRD_INS21.
We found that the sampling processes for top-ranked unobserved items are unnecessary, and removing the processes gave considerable performance improvements for the ranking matching KD methods including RRD [CIKM'20], DCD [CIKM'21]. We provide more detailed explanation and experiment results on our new paper: Distillation from Heterogeneous Models for Top-K Recommendation [WWW'23].