Merge branch 'staging' into simonz/sarplus/20220517/time-now

recommenders-team · May 24, 2022 · 4aad9fa · 4aad9fa
2 parents a179406 + 8f1e287
commit 4aad9fa
Show file tree

Hide file tree

Showing 2 changed files with 68 additions and 38 deletions.
diff --git a/examples/02_model_collaborative_filtering/lightgcn_deep_dive.ipynb b/examples/02_model_collaborative_filtering/lightgcn_deep_dive.ipynb
@@ -15,7 +15,7 @@
    "source": [
     "# LightGCN - simplified GCN model for recommendation\n",
     "\n",
-    "This notebook serves as an introduction to LightGCN, which is an simple, linear and neat Graph Convolution Network (GCN) model for recommendation."
+    "This notebook serves as an introduction to LightGCN [1], which is an simple, linear and neat Graph Convolution Network (GCN) [3] model for recommendation."
    ]
   },
   {
@@ -52,7 +52,6 @@
    "source": [
     "import sys\n",
     "import os\n",
-    "import papermill as pm\n",
     "import scrapbook as sb\n",
     "import pandas as pd\n",
     "import numpy as np\n",
@@ -104,15 +103,53 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 1 LightGCN model architecture\n",
+    "## 1 LightGCN model\n",
     "\n",
-    "LightGCN is a simplified design of GCN to make it more concise and appropriate for recommendation. The model architecture is illustrated below.\n",
+    "LightGCN is a simplified version of Neural Graph Collaborative Filtering (NGCF) [4], which adapts GCNs in recommendation systems.\n",
     "\n",
-    "<img src=\"https://recodatasets.z20.web.core.windows.net/images/lightGCN-model.jpg\">\n",
+    "### 1.1 Graph Networks in Recommendation Systems\n",
     "\n",
-    "In Light Graph Convolution, only the normalized sum of neighbor embeddings is performed towards next layer; other operations like self-connection, feature transformation, and nonlinear activation are all removed, which largely simplifies GCNs. In Layer Combination, we sum over the embeddings at each layer to obtain the final representations.\n",
+    "GCN are networks that can learn patterns in graph data. They can be applied in many fields, but they are particularly well suited for Recommendation Systems, because of their ability to encode relationships.\n",
     "\n",
-    "### 1.1 Light Graph Convolution (LGC)\n",
+    "In traditional models like matrix factorization [5], user and items are represented as embeddings. And the interaction, which is the signal that encodes the behavior, is not part of the embeddings, but it is represented in the loss function, typically as a dot product. \n",
+    "\n",
+    "Despite their effectiveness, some authors [1,4] argue that these methods are not sufficient to yield satisfactory embeddings for collaborative filtering. The key reason is that the embedding function lacks an explicit encoding of the crucial collaborative signal, which is latent in user-item interactions to reveal the behavioral similarity between users (or items). \n",
+    "\n",
+    "GCNs can be used to encode the interaction signal in the embeddings. Interacted items can be seen as user´s features, because they provide direct evidence on a user’s preference. Similarly, the users that consume an item can be treated as the item’s features and used to measure the collaborative similarity of two items. A natural way to incorporate the interaction signal in the embedding is by exploiting the high-order connectivity from user-item interactions.\n",
+    "\n",
+    "In the figure below, the user-item interaction is shown (to the left) as well as the concept of higher-order connectivity (to the right).\n",
+    "\n",
+    "<img src=\"https://recodatasets.z20.web.core.windows.net/images/High_order_connectivity.png\" width=500 style=\"display:block; margin-left:auto; margin-right:auto;\">\n",
+    "\n",
+    "The high-order connectivity shows the collaborative signal in a graph form. For example, the path $u_1 ← i_2 ← u2$ indicates the behavior\n",
+    "similarity between $u_1$ and $u_2$, as both users have interacted with $i_2$; the longer path $u_1 ← i_2 ← u_2 ← i_4$ suggests that $u_1$ is likely to adopt $i_4$, since her similar user $u_2$ has consumed $i_4$ before. Moreover, from the holistic view of $l = 3$, item $i_4$ is more likely to be of interest to $u_1$ than item $i_5$, since there are two paths connecting $<i_4,u_1>$, while only one path connects $<i_5,u_1>$.\n",
+    "\n",
+    "Based on this high-order connectivity, NGCF [4] defines an embedding propagation layer, which refines a user’s (or an item’s) embedding by aggregating the embeddings of the interacted items (or users). By stacking multiple embedding propagation layers, we can enforce the embeddings\n",
+    "to capture the collaborative signal in high-order connectivities.\n",
+    "\n",
+    "More formally, let $\\mathbf{e}_{u}^{0}$ denote the original embedding of user $u$ and $\\mathbf{e}_{i}^{0}$ denote the original embedding of item $i$. The embedding propagation can be computed recursively as:\n",
+    "\n",
+    "$$\n",
+    "\\begin{array}{l}\n",
+    "\\mathbf{e}_{u}^{(k+1)}=\\sigma\\bigl( \\mathbf{W}_{1}\\mathbf{e}_{u}^{(k)} + \\sum_{i \\in \\mathcal{N}_{u}} \\frac{1}{\\sqrt{\\left|\\mathcal{N}_{u}\\right|} \\sqrt{\\left|\\mathcal{N}_{i}\\right|}} (\\mathbf{W}_{1}\\mathbf{e}_{i}^{(k)} + \\mathbf{W}_{2}(\\mathbf{e}_{i}^{(k)}\\cdot\\mathbf{e}_{u}^{(k)}) ) \\bigr)\n",
+    "\\\\\n",
+    "\\mathbf{e}_{i}^{(k+1)}=\\sigma\\bigl( \\mathbf{W}_{1}\\mathbf{e}_{i}^{(k)} +\\sum_{u \\in \\mathcal{N}_{i}} \\frac{1}{\\sqrt{\\left|\\mathcal{N}_{i}\\right|} \\sqrt{\\left|\\mathcal{N}_{u}\\right|}} (\\mathbf{W}_{1}\\mathbf{e}_{u}^{(k)} + \\mathbf{W}_{2}(\\mathbf{e}_{u}^{(k)}\\cdot\\mathbf{e}_{i}^{(k)}) ) \\bigr)\n",
+    "\\end{array}\n",
+    "$$\n",
+    "\n",
+    "where $\\mathbf{W}_{1}$ and $\\mathbf{W}_{2}$ are trainable weight matrices, $\\frac{1}{\\sqrt{\\left|\\mathcal{N}_{i}\\right|} \\sqrt{\\left|\\mathcal{N}_{u}\\right|}}$ is a discount factor expressed as the graph Laplacian norm, $\\mathcal{N}_{u}$ and $\\mathcal{N}_{i}$ denote the first-hop neighbors of user $u$ and item $i$, and $\\sigma$ is a non-linearity that in the paper is set as a LeakyReLU. \n",
+    "\n",
+    "To obtain the final representation, each propagated embedding is concatenated (i.e., $\\mathbf{e}_{u}^{(*)}=\\mathbf{e}_{u}^{(0)}||...||\\mathbf{e}_{u}^{(l)}$), and then the final user's preference over an item is computed as a dot product: $\\hat y_{u i} = \\mathbf{e}_{u}^{(*)T}\\mathbf{e}_{i}^{(*)}$.\n",
+    "\n",
+    "### 1.2 LightGCN architecture\n",
+    "\n",
+    "LightGCN is a simplified version of NGCF [4] to make it more concise and appropriate for recommendations. The model architecture is illustrated below.\n",
+    "\n",
+    "<img src=\"https://recodatasets.z20.web.core.windows.net/images/lightGCN-model.jpg\" width=600 style=\"display:block; margin-left:auto; margin-right:auto;\">\n",
+    "\n",
+    "In Light Graph Convolution, only the normalized sum of neighbor embeddings is performed towards next layer; other operations like self-connection, feature transformation via weight matrices, and nonlinear activation are all removed, which largely simplifies NGCF. In the layer combination step, instead of concatenating the embeddings, we sum over the embeddings at each layer to obtain the final representations.\n",
+    "\n",
+    "### 1.3 Light Graph Convolution (LGC)\n",
     "\n",
     "In LightGCN, we adopt the simple weighted sum aggregator and abandon the use of feature transformation and nonlinear activation. The graph convolution operation in LightGCN is defined as:\n",
     "\n",
@@ -126,7 +163,7 @@
     "The symmetric normalization term $\\frac{1}{\\sqrt{\\left|\\mathcal{N}_{u}\\right|} \\sqrt{\\left|\\mathcal{N}_{i}\\right|}}$ follows the design of standard GCN, which can avoid the scale of embeddings increasing with graph convolution operations.\n",
     "\n",
     "\n",
-    "### 1.2 Layer Combination and Model Prediction\n",
+    "### 1.4 Layer Combination and Model Prediction\n",
     "\n",
     "In LightGCN, the only trainable model parameters are the embeddings at the 0-th layer, i.e., $\\mathbf{e}_{u}^{(0)}$ for all users and $\\mathbf{e}_{i}^{(0)}$ for all items. When they are given, the embeddings at higher layers can be computed via LGC. After $K$ layers LGC, we further combine the embeddings obtained at each layer to form the final representation of a user (an item):\n",
     "\n",
@@ -145,7 +182,7 @@
     "which is used as the ranking score for recommendation generation.\n",
     "\n",
     "\n",
-    "### 1.3 Matrix Form\n",
+    "### 1.5 Matrix Form\n",
     "\n",
     "Let the user-item interaction matrix be $\\mathbf{R} \\in \\mathbb{R}^{M \\times N}$ where $M$ and $N$ denote the number of users and items, respectively, and each entry $R_{ui}$ is 1 if $u$ has interacted with item $i$ otherwise 0. We then obtain the adjacency matrix of the user-item graph as\n",
     "\n",
@@ -173,7 +210,7 @@
     "\n",
     "where $\\tilde{\\mathbf{A}}=\\mathbf{D}^{-\\frac{1}{2}} \\mathbf{A} \\mathbf{D}^{-\\frac{1}{2}}$ is the symmetrically normalized matrix.\n",
     "\n",
-    "### 1.4 Model Training\n",
+    "### 1.6 Model Training\n",
     "\n",
     "We employ the Bayesian Personalized Ranking (BPR) loss which is a pairwise loss that encourages the prediction of an observed entry to be higher than its unobserved counterparts:\n",
     "\n",
@@ -188,20 +225,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 2 TensorFlow implementation of LightGCN\n",
+    "## 2 TensorFlow implementation of LightGCN with MovieLens dataset\n",
     "\n",
     "We will use the MovieLens dataset, which is composed of integer ratings from 1 to 5.\n",
     "\n",
-    "We convert MovieLens into implicit feedback for model training and evaluation.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 3 TensorFlow LightGCN movie recommender\n",
+    "We convert MovieLens into implicit feedback for model training and evaluation.\n",
     "\n",
-    "### 3.1 Load and split data\n",
+    "### 2.1 Load and split data\n",
     "\n",
     "We split the full dataset into a `train` and `test` dataset to evaluate performance of the algorithm against a held-out set not seen during training. Because SAR generates recommendations based on user preferences, all users that are in the test set must also exist in the training set. For this case, we can use the provided `python_stratified_split` function which holds out a percentage (in this case 25%) of items from each user, but ensures all users are in both `train` and `test` datasets. Other options are available in the `dataset.python_splitters` module which provide more control over how the split occurs."
    ]
@@ -318,7 +348,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3.2 Process data\n",
+    "### 2.2 Process data\n",
     "\n",
     "`ImplicitCF` is a class that intializes and loads data for the training process. During the initialization of this class, user IDs and item IDs are reindexed, ratings greater than zero are converted into implicit positive interaction, and adjacency matrix $R$ of user-item graph is created. Some important methods of `ImplicitCF` are:\n",
     "\n",
@@ -342,7 +372,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3.3 Prepare hyper-parameters\n",
+    "### 2.3 Prepare hyper-parameters\n",
     "\n",
     "Important parameters of `LightGCN` model are:\n",
     "\n",
@@ -379,7 +409,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3.4 Create and train model\n",
+    "### 2.4 Create and train model\n",
     "\n",
     "With data and parameters prepared, we can create the LightGCN model.\n",
     "\n",
@@ -479,7 +509,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3.4 Recommendation and Evaluation"
+    "### 2.5 Recommendation and Evaluation"
    ]
   },
   {
@@ -493,7 +523,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### 3.4.1 Recommendation\n",
+    "#### 2.5.1 Recommendation\n",
     "\n",
     "We can call `recommend_k_items` to recommend k items for each user passed in this function. We set `remove_seen=True` to remove the items already seen by the user. The function returns a dataframe, containing each user and top k items recommended to them and the corresponding ranking scores."
    ]
@@ -588,7 +618,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### 3.4.2 Evaluation\n",
+    "#### 2.5.2 Evaluation\n",
     "\n",
     "With `topk_scores` predicted by the model, we can evaluate how LightGCN performs on this test set."
    ]
@@ -713,7 +743,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3.5 Infer embeddings\n",
+    "### 2.6 Infer embeddings\n",
     "\n",
     "With `infer_embedding` method of LightGCN model, we can export the embeddings of users and items in the training set to CSV files for future use."
    ]
@@ -731,7 +761,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3.6 Compare with SAR and NCF\n",
+    "## 3. Compare LightGCN with SAR and NCF\n",
     "\n",
     "Here there are the performances of LightGCN compared to [SAR](../00_quick_start/sar_movielens.ipynb) and [NCF](../00_quick_start/ncf_movielens.ipynb) on MovieLens dataset of 100k and 1m. The method of data loading and splitting is the same as that described above and the GPU used was a GeForce GTX 1080Ti.\n",
     "\n",
@@ -759,18 +789,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Reference: \n",
+    "### References: \n",
     "1. Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang & Meng Wang, LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation, 2020, https://arxiv.org/abs/2002.02126\n",
-    "\n",
-    "2. LightGCN implementation [TensorFlow]: https://github.com/kuandeng/lightgcn"
+    "2. LightGCN implementation [TensorFlow]: https://github.com/kuandeng/lightgcn\n",
+    "3. Thomas N. Kipf and Max Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR, 2017, https://arxiv.org/abs/1609.02907\n",
+    "4. Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua, Neural Graph Collaborative Filtering, SIGIR, 2019, https://arxiv.org/abs/1905.08108\n",
+    "5. Y. Koren, R. Bell and C. Volinsky, \"Matrix Factorization Techniques for Recommender Systems\", in Computer, vol. 42, no. 8, pp. 30-37, Aug. 2009, doi: 10.1109/MC.2009.263.  url: https://datajobs.com/data-science-repo/Recommender-Systems-%5BNetflix%5D.pdf"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {

diff --git a/tests/unit/recommenders/utils/test_gpu_utils.py b/tests/unit/recommenders/utils/test_gpu_utils.py
@@ -46,6 +46,11 @@ def test_get_cudnn_version():
     assert get_cudnn_version() > "7.0.0"
 
 
+@pytest.mark.gpu
+def test_cudnn_enabled():
+    assert torch.backends.cudnn.enabled == True
+
+
 @pytest.mark.gpu
 def test_tensorflow_gpu():
     assert tf.test.is_gpu_available()