Updated tabular, sequence examples in docs to use xGPR 0.1.2.3. Updat…

…ed HISTORY and init for 0.1.2.3 release. Fixed bug that caused error when switching from cpu to gpu or vice versa on trained model. Updated complete pipeline tests to do functionality test for variance calc.
jlparkI · Jun 27, 2023 · bbed4af · bbed4af
1 parent 95ca2b8
commit bbed4af
Show file tree

Hide file tree

Showing 6 changed files with 121 additions and 528 deletions.
diff --git a/HISTORY.md b/HISTORY.md
@@ -127,3 +127,8 @@ Updated dataset builder so that different batches with different
 xdim[1] are now accepted when building a dataset. This obviates
 the need to zero-pad data (although note that zero-padding is
 generally advisable for consistency).
+
+### Version 0.1.2.3
+
+Fixed a bug involving changing device after fitting from gpu
+to cpu.
diff --git a/docs/notebooks/sequence_example.ipynb b/docs/notebooks/sequence_example.ipynb
@@ -19,7 +19,7 @@
     "match or outperform the deep learning baselines without too\n",
     "much effort.\n",
     "\n",
-    "This was originally run on an A6000 GPU, using xGPR 0.1.0.0."
+    "This was originally run on an GTX1070 GPU, using xGPR 0.1.2.3."
    ]
   },
   {
@@ -54,7 +54,8 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "Cloning into 'FLIP'...\n"
+      "Cloning into 'FLIP'...\n",
+      "Checking out files: 100% (59/59), done.\n"
      ]
     }
    ],
@@ -84,7 +85,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "/tmp/ipykernel_3931463/4195622896.py:1: DtypeWarning: Columns (12) have mixed types. Specify dtype option on import or set low_memory=False.\n",
+      "/tmp/ipykernel_21623/4195622896.py:1: DtypeWarning: Columns (12) have mixed types. Specify dtype option on import or set low_memory=False.\n",
       "  raw_data = pd.read_csv(\"full_data.csv\")\n"
      ]
     }
@@ -361,22 +362,22 @@
       "Grid point 9 acquired.\n",
       "New hparams: [-0.8298276]\n",
       "Additional acquisition 10.\n",
-      "New hparams: [-0.773694]\n",
+      "New hparams: [-0.838633]\n",
       "Additional acquisition 11.\n",
-      "New hparams: [-0.8054813]\n",
+      "New hparams: [-0.8383481]\n",
       "Additional acquisition 12.\n",
-      "New hparams: [-0.7899]\n",
+      "New hparams: [-0.8468422]\n",
       "Additional acquisition 13.\n",
-      "New hparams: [-0.7798819]\n",
+      "New hparams: [-0.8321154]\n",
       "Additional acquisition 14.\n",
-      "New hparams: [-0.777553]\n",
+      "New hparams: [-0.8445669]\n",
       "Additional acquisition 15.\n",
       "New hparams: [-0.8072598]\n",
-      "Best score achieved: 120578.24\n",
-      "Best hyperparams: [-0.852417  -1.6094379 -0.7899   ]\n",
+      "Best score achieved: 121310.218\n",
+      "Best hyperparams: [-0.852417  -1.6094379 -0.8298276]\n",
       "Tuning complete.\n",
-      "Best estimated negative marginal log likelihood: 120578.24\n",
-      "Wallclock: 210.5481152534485\n"
+      "Best estimated negative marginal log likelihood: 121310.218\n",
+      "Wallclock: 663.9832231998444\n"
      ]
     }
    ],
@@ -465,7 +466,7 @@
       "Chunk 70 complete.\n",
       "Chunk 80 complete.\n",
       "Chunk 90 complete.\n",
-      "Wallclock: 18.054260969161987\n"
+      "Wallclock: 52.33980679512024\n"
      ]
     }
    ],
@@ -493,8 +494,11 @@
       "Iteration 10\n",
       "Iteration 15\n",
       "Iteration 20\n",
+      "Iteration 25\n",
+      "Iteration 30\n",
+      "Now performing variance calculations...\n",
       "Fitting complete.\n",
-      "Wallclock: 53.852142572402954\n"
+      "Wallclock: 66.21576309204102\n"
      ]
     }
    ],
@@ -516,7 +520,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Wallclock: 2.0166425704956055\n"
+      "Wallclock: 9.966946601867676\n"
      ]
     }
    ],
@@ -543,7 +547,7 @@
     {
      "data": {
       "text/plain": [
-       "SpearmanrResult(correlation=0.7658444861113735, pvalue=0.0)"
+       "SpearmanrResult(correlation=0.7664973351782205, pvalue=0.0)"
       ]
      },
      "execution_count": 13,
@@ -562,20 +566,13 @@
    "id": "a6e3d321",
    "metadata": {},
    "source": [
-    "Notice we're already at 0.766, outperforming a CNN for sequences.\n",
-    "Not bad, given that we are merely using one-hot encoded input. It is of course possible\n",
-    "to try to use another representation (e.g. the output of a language model)\n",
-    "as the input to a GP. The FHTConv1d kernel used here measures the similarity\n",
-    "of two sequences as the similarity between k-mers, as measured by an\n",
-    "RBF kernel assessed across each pair of k-mers. It is therefore\n",
-    "likely that information from some position-specific scoring matrix (PSSM)\n",
-    "(e.g. BLOSUM) would if used instead of one hot encoding improve performance\n",
-    "as well.\n",
-    "\n",
-    "Perhaps the most interesting result is the poor performance of the\n",
-    "pretrained model, which in this case (and on many other of the FLIP\n",
-    "benchmarks) loses both to a GP and a 1d CNN despite having access\n",
-    "to a large corpus for unsupervised pretraining."
+    "Notice we're already at 0.766, outperforming a CNN for sequences also trained\n",
+    "on one-hot encoded input. It is of course possible\n",
+    "to use another representation (e.g. the output of a language model)\n",
+    "as the input to a GP. We discuss some other\n",
+    "possible representations and show that for some tasks using a language\n",
+    "model embedding can improve results (although for this dataset, interestingly,\n",
+    "not by very much.)"
    ]
   },
   {
@@ -634,7 +631,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.16"
+   "version": "3.9.10"
   }
  },
  "nbformat": 4,