Skip to content

Commit

Permalink
Updated tabular, sequence examples in docs to use xGPR 0.1.2.3. Updat…
Browse files Browse the repository at this point in the history
…ed HISTORY and init for 0.1.2.3 release. Fixed bug that caused error when switching from cpu to gpu or vice versa on trained model. Updated complete pipeline tests to do functionality test for variance calc.
  • Loading branch information
jlparkI committed Jun 27, 2023
1 parent 95ca2b8 commit bbed4af
Show file tree
Hide file tree
Showing 6 changed files with 121 additions and 528 deletions.
5 changes: 5 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,8 @@ Updated dataset builder so that different batches with different
xdim[1] are now accepted when building a dataset. This obviates
the need to zero-pad data (although note that zero-padding is
generally advisable for consistency).

### Version 0.1.2.3

Fixed a bug involving changing device after fitting from gpu
to cpu.
59 changes: 28 additions & 31 deletions docs/notebooks/sequence_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"match or outperform the deep learning baselines without too\n",
"much effort.\n",
"\n",
"This was originally run on an A6000 GPU, using xGPR 0.1.0.0."
"This was originally run on an GTX1070 GPU, using xGPR 0.1.2.3."
]
},
{
Expand Down Expand Up @@ -54,7 +54,8 @@
"name": "stderr",
"output_type": "stream",
"text": [
"Cloning into 'FLIP'...\n"
"Cloning into 'FLIP'...\n",
"Checking out files: 100% (59/59), done.\n"
]
}
],
Expand Down Expand Up @@ -84,7 +85,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_3931463/4195622896.py:1: DtypeWarning: Columns (12) have mixed types. Specify dtype option on import or set low_memory=False.\n",
"/tmp/ipykernel_21623/4195622896.py:1: DtypeWarning: Columns (12) have mixed types. Specify dtype option on import or set low_memory=False.\n",
" raw_data = pd.read_csv(\"full_data.csv\")\n"
]
}
Expand Down Expand Up @@ -361,22 +362,22 @@
"Grid point 9 acquired.\n",
"New hparams: [-0.8298276]\n",
"Additional acquisition 10.\n",
"New hparams: [-0.773694]\n",
"New hparams: [-0.838633]\n",
"Additional acquisition 11.\n",
"New hparams: [-0.8054813]\n",
"New hparams: [-0.8383481]\n",
"Additional acquisition 12.\n",
"New hparams: [-0.7899]\n",
"New hparams: [-0.8468422]\n",
"Additional acquisition 13.\n",
"New hparams: [-0.7798819]\n",
"New hparams: [-0.8321154]\n",
"Additional acquisition 14.\n",
"New hparams: [-0.777553]\n",
"New hparams: [-0.8445669]\n",
"Additional acquisition 15.\n",
"New hparams: [-0.8072598]\n",
"Best score achieved: 120578.24\n",
"Best hyperparams: [-0.852417 -1.6094379 -0.7899 ]\n",
"Best score achieved: 121310.218\n",
"Best hyperparams: [-0.852417 -1.6094379 -0.8298276]\n",
"Tuning complete.\n",
"Best estimated negative marginal log likelihood: 120578.24\n",
"Wallclock: 210.5481152534485\n"
"Best estimated negative marginal log likelihood: 121310.218\n",
"Wallclock: 663.9832231998444\n"
]
}
],
Expand Down Expand Up @@ -465,7 +466,7 @@
"Chunk 70 complete.\n",
"Chunk 80 complete.\n",
"Chunk 90 complete.\n",
"Wallclock: 18.054260969161987\n"
"Wallclock: 52.33980679512024\n"
]
}
],
Expand Down Expand Up @@ -493,8 +494,11 @@
"Iteration 10\n",
"Iteration 15\n",
"Iteration 20\n",
"Iteration 25\n",
"Iteration 30\n",
"Now performing variance calculations...\n",
"Fitting complete.\n",
"Wallclock: 53.852142572402954\n"
"Wallclock: 66.21576309204102\n"
]
}
],
Expand All @@ -516,7 +520,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Wallclock: 2.0166425704956055\n"
"Wallclock: 9.966946601867676\n"
]
}
],
Expand All @@ -543,7 +547,7 @@
{
"data": {
"text/plain": [
"SpearmanrResult(correlation=0.7658444861113735, pvalue=0.0)"
"SpearmanrResult(correlation=0.7664973351782205, pvalue=0.0)"
]
},
"execution_count": 13,
Expand All @@ -562,20 +566,13 @@
"id": "a6e3d321",
"metadata": {},
"source": [
"Notice we're already at 0.766, outperforming a CNN for sequences.\n",
"Not bad, given that we are merely using one-hot encoded input. It is of course possible\n",
"to try to use another representation (e.g. the output of a language model)\n",
"as the input to a GP. The FHTConv1d kernel used here measures the similarity\n",
"of two sequences as the similarity between k-mers, as measured by an\n",
"RBF kernel assessed across each pair of k-mers. It is therefore\n",
"likely that information from some position-specific scoring matrix (PSSM)\n",
"(e.g. BLOSUM) would if used instead of one hot encoding improve performance\n",
"as well.\n",
"\n",
"Perhaps the most interesting result is the poor performance of the\n",
"pretrained model, which in this case (and on many other of the FLIP\n",
"benchmarks) loses both to a GP and a 1d CNN despite having access\n",
"to a large corpus for unsupervised pretraining."
"Notice we're already at 0.766, outperforming a CNN for sequences also trained\n",
"on one-hot encoded input. It is of course possible\n",
"to use another representation (e.g. the output of a language model)\n",
"as the input to a GP. We discuss some other\n",
"possible representations and show that for some tasks using a language\n",
"model embedding can improve results (although for this dataset, interestingly,\n",
"not by very much.)"
]
},
{
Expand Down Expand Up @@ -634,7 +631,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.9.10"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit bbed4af

Please sign in to comment.