Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document ability to export cuML RF to predict on other machines #3890

Merged
merged 2 commits into from
May 28, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 65 additions & 1 deletion docs/source/pickling_cuml_models.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,70 @@
"source": [
"single_gpu_model.cluster_centers_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exporting cuML Random Forest models for inferencing on machines without GPUs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Starting with cuML version 21.06, you can export cuML Random Forest models and run predictions with them on machines without an NVIDIA GPUs. The [Treelite](https://github.com/dmlc/treelite) package defines an efficient exchange format that lets you portably move the cuML Random Forest models to other machines. We will refer to the exchange format as \"checkpoints.\"\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a link to here from the main docstring of the RF classes (with a note like RF classes can be exported to Treelite for inference on machines without GPUs or so) would be useful for people looking at the api docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"\n",
"Here are the steps to export the model:\n",
"\n",
"1. Call `to_treelite_checkpoint()` to obtain the checkpoint file from the cuML Random Forest model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from cuml.ensemble import RandomForestClassifier as cumlRandomForestClassifier\n",
"from sklearn.datasets import load_iris\n",
"import numpy as np\n",
"\n",
"X, y = load_iris(return_X_y=True)\n",
"X, y = X.astype(np.float32), y.astype(np.int32)\n",
"clf = cumlRandomForestClassifier(max_depth=3, random_state=0, n_estimators=10)\n",
"clf.fit(X, y)\n",
"\n",
"checkpoint_path = './checkpoint.tl'\n",
"# Export cuML RF model as Treelite checkpoint\n",
"clf.convert_to_treelite_model().to_treelite_checkpoint(checkpoint_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Copy the generated checkpoint file `checkpoint.tl` to another machine on which you'd like to run predictions.\n",
"\n",
"3. On the target machine, install Treelite by running `pip install treelite` or `conda install -c conda-forge treelite`. The machine does not need to have an NVIDIA GPUs and does not need to have cuML installed.\n",
"\n",
"4. You can now load the model from the checkpoint, by running the following on the target machine:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import treelite\n",
"\n",
"# The checkpoint file has been copied over\n",
"checkpoint_path = './checkpoint.tl'\n",
"tl_model = treelite.Model.deserialize(checkpoint_path)\n",
"out_prob = treelite.gtil.predict(tl_model, X, pred_margin=True)\n",
"print(out_prob)"
]
}
],
"metadata": {
Expand All @@ -201,7 +265,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
"version": "3.8.8"
}
},
"nbformat": 4,
Expand Down
5 changes: 5 additions & 0 deletions python/cuml/ensemble/randomforestclassifier.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,11 @@ class RandomForestClassifier(BaseRandomForestModel,
histogram-based algorithm to determine splits, rather than an exact
count. You can tune the size of the histograms with the n_bins parameter.
.. note:: You can export cuML Random Forest models and run predictions
with them on machines without an NVIDIA GPUs. See
https://docs.rapids.ai/api/cuml/nightly/pickling_cuml_models.html
for more details.
**Known Limitations**: This is an early release of the cuML
Random Forest code. It contains a few known limitations:
Expand Down
5 changes: 5 additions & 0 deletions python/cuml/ensemble/randomforestregressor.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,11 @@ class RandomForestRegressor(BaseRandomForestModel,
histogram-based algorithm to determine splits, rather than an exact
count. You can tune the size of the histograms with the n_bins parameter.
.. note:: You can export cuML Random Forest models and run predictions
with them on machines without an NVIDIA GPUs. See
https://docs.rapids.ai/api/cuml/nightly/pickling_cuml_models.html
for more details.
**Known Limitations**: This is an early release of the cuML
Random Forest code. It contains a few known limitations:
Expand Down