Skip to content

Latest commit

 

History

History
247 lines (179 loc) · 11.1 KB

README.md

File metadata and controls

247 lines (179 loc) · 11.1 KB

RecLearn

简体中文 | English

RecLearn (Recommender Learning) which summarizes the contents of the master branch in Recommender System with TF2.0 is a recommended learning framework based on Python and TensorFlow2.x for students and beginners. Of course, if you are more comfortable with the master branch, you can clone the entire package, run some algorithms in example, and also update and modify the content of model and layer. The implemented recommendation algorithms are classified according to two application stages in the industry:

  • matching recommendation stage (Top-k Recmmendation)
  • ranking recommendeation stage (CTR predict model)

Update

04/23/2022: update all matching model.

Installation

Package

RecLearn is on PyPI, so you can use pip to install it.

pip install reclearn

dependent environment:

  • python3.8+
  • Tensorflow2.5-GPU+/Tensorflow2.5-CPU+
  • sklearn0.23+

Local

Clone Reclearn to local:

git clone -b reclearn git@github.com:ZiyaoGeng/RecLearn.git

Quick Start

In example, we have given a demo of each of the recommended models.

Matching

1. Divide the dataset.

Set the path of the raw dataset:

file_path = 'data/ml-1m/ratings.dat'

Please divide the current dataset into training dataset, validation dataset and test dataset. If you use movielens-1m, Amazon-Beauty, Amazon-Games and STEAM, you can call method data/datasets/* of RecLearn directly:

train_path, val_path, test_path, meta_path = ml.split_seq_data(file_path=file_path)

meta_path indicates the path of the metafile, which stores the maximum number of user and item indexes.

2. Load the dataset.

Complete the loading of training dataset, validation dataset and test dataset, and generate several negative samples (random sampling) for each positive sample. The format of data is dictionary:

data = {'pos_item':, 'neg_item': , ['user': , 'click_seq': ,...]}

If you're building a sequential recommendation model, you need to introduce click sequences. Reclearn provides methods for loading the data for the above four datasets:

# general recommendation model
train_data = ml.load_data(train_path, neg_num, max_item_num)
# sequence recommendation model, and use the user feature.
train_data = ml.load_seq_data(train_path, "train", seq_len, neg_num, max_item_num, contain_user=True)

3. Set hyper-parameters.

The model needs to specify the required hyperparameters. Now, we take BPR model as an example:

model_params = {
        'user_num': max_user_num + 1,
        'item_num': max_item_num + 1,
        'embed_dim': FLAGS.embed_dim,
        'use_l2norm': FLAGS.use_l2norm,
        'embed_reg': FLAGS.embed_reg
    }

4. Build and compile the model.

Select or build the model you need and compile it. Take 'BPR' as an example:

model = BPR(**model_params)
model.compile(optimizer=Adam(learning_rate=FLAGS.learning_rate))

If you have problems with the structure of the model, you can call the summary method after compilation to print it out:

model.summary()

5. Learn the model and predict test dataset.

for epoch in range(1, epochs + 1):
    t1 = time()
    model.fit(
        x=train_data,
        epochs=1,
        validation_data=val_data,
        batch_size=batch_size
    )
    t2 = time()
    eval_dict = eval_pos_neg(model, test_data, ['hr', 'mrr', 'ndcg'], k, batch_size)
    print('Iteration %d Fit [%.1f s], Evaluate [%.1f s]: HR = %.4f, MRR = %.4f, NDCG = %.4f'
          % (epoch, t2 - t1, time() - t2, eval_dict['hr'], eval_dict['mrr'], eval_dict['ndcg']))

Ranking

Waiting......

Results

The experimental environment designed by Reclearn is different from that of some papers, so there may be some deviation in the results. Please refer to Experiement for details.

Matching

Model ml-1m Beauty STEAM
HR@10MRR@10NDCG@10 HR@10MRR@10NDCG@10 HR@10MRR@10NDCG@10
BPR0.57680.23920.30160.37080.21080.24850.77280.42200.5054
NCF0.58340.22190.30600.54480.28310.34510.77680.42730.5103
DSSM0.54980.21480.2929------
YoutubeDNN0.67370.34140.4201------
MIND(Error)0.63660.25970.3483------
GRU4Rec0.79690.46980.54830.52110.27240.33120.85010.54860.6209
Caser0.79160.44500.52800.54870.28840.35010.82750.50640.5832
SASRec0.81030.48120.56050.52300.27810.33550.86060.56690.6374
AttRec0.78730.45780.53630.49950.26950.3229---
FISSA0.81060.49530.57130.54310.28510.34620.86350.56820.6391

Ranking

Model 500w(Criteo) Criteo
Log Loss AUC Log Loss AUC
FM0.47650.77830.47620.7875
FFM----
WDL0.46840.78220.46920.7930
Deep Crossing0.46700.78260.46930.7935
PNN-0.7847--
DCN-0.78230.46910.7929
NFM0.47730.77620.47230.7889
AFM0.48190.78080.46920.7871
DeepFM-0.78280.46500.8007
xDeepFM0.46900.78390.46960.7919

Model List

1. Matching Stage

Paper|Model Published Author
BPR: Bayesian Personalized Ranking from Implicit Feedback|MF-BPR UAI, 2009 Steffen Rendle
Neural network-based Collaborative Filtering|NCF WWW, 2017 Xiangnan He
Learning Deep Structured Semantic Models for Web Search using Clickthrough Data|DSSM CIKM, 2013 Po-Sen Huang
Deep Neural Networks for YouTube Recommendations| YoutubeDNN RecSys, 2016 Paul Covington
Session-based Recommendations with Recurrent Neural Networks|GUR4Rec ICLR, 2016 Balázs Hidasi
Self-Attentive Sequential Recommendation|SASRec ICDM, 2018 UCSD
Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding|Caser WSDM, 2018 Jiaxi Tang
Next Item Recommendation with Self-Attentive Metric Learning|AttRec AAAAI, 2019 Shuai Zhang
FISSA: Fusing Item Similarity Models with Self-Attention Networks for Sequential Recommendation|FISSA RecSys, 2020 Jing Lin

2. Ranking Stage

Paper|Model Published Author
Factorization Machines|FM ICDM, 2010 Steffen Rendle
Field-aware Factorization Machines for CTR Prediction|FFM RecSys, 2016 Criteo Research
Wide & Deep Learning for Recommender Systems|WDL DLRS, 2016 Google Inc.
Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features|Deep Crossing KDD, 2016 Microsoft Research
Product-based Neural Networks for User Response Prediction|PNN ICDM, 2016 Shanghai Jiao Tong University
Deep & Cross Network for Ad Click Predictions|DCN ADKDD, 2017 Stanford University|Google Inc.
Neural Factorization Machines for Sparse Predictive Analytics|NFM SIGIR, 2017 Xiangnan He
Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks|AFM IJCAI, 2017 Zhejiang University|National University of Singapore
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction|DeepFM IJCAI, 2017 Harbin Institute of Technology|Noah’s Ark Research Lab, Huawei
xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems|xDeepFM KDD, 2018 University of Science and Technology of China
Deep Interest Network for Click-Through Rate Prediction|DIN KDD, 2018 Alibaba Group

Discussion

  1. If you have any suggestions or questions about the project, you can leave a comment on Issue.
  2. wechat: