-
Notifications
You must be signed in to change notification settings - Fork 448
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Kubeflow Pipelines Examples (#1632)
* Init commit with e2e example * Add Early Stopping and MPI Examples * Add MPI to README * Modify SDK for MPI example * Modify doc * Update Early Stopping example * Finish e2e example * Modify links for KFP guide
- Loading branch information
1 parent
e5d7636
commit 195db29
Showing
5 changed files
with
1,295 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,34 @@ | ||
# Using Katib with Kubeflow Pipelines | ||
|
||
The following examples show how to use Katib with | ||
[Kubeflow Pipelines](https://github.com/kubeflow/pipelines). | ||
|
||
You can find the Katib Component source code for the Kubeflow Pipelines | ||
[here](https://github.com/kubeflow/pipelines/tree/master/components/kubeflow/katib-launcher). | ||
|
||
## Prerequisites | ||
|
||
You have to install the following Python SDK to run these examples: | ||
|
||
- [`kfp`](https://pypi.org/project/kfp/) >= 1.8.4 | ||
- [`kubeflow-katib`](https://pypi.org/project/kubeflow-katib/) >= 0.12.0 | ||
|
||
## Multi-User Pipelines Setup | ||
|
||
The Notebooks examples run Pipelines in multi-user mode and your Kubeflow Notebook | ||
must have the appropriate `PodDefault` with the `pipelines.kubeflow.org` audience. | ||
|
||
Please follow [this guide](https://www.kubeflow.org/docs/components/pipelines/sdk/connect-api/#multi-user-mode) | ||
to give an access Kubeflow Notebook to run Kubeflow Pipelines. | ||
|
||
## List of Examples | ||
|
||
The following Pipelines are deployed from Kubeflow Notebook: | ||
|
||
- [Kubeflow E2E MNIST](kubeflow-e2e-mnist.ipynb) | ||
|
||
- [Katib Experiment with Early Stopping](early-stopping.ipynb) | ||
|
||
The following Pipelines have to be compiled and uploaded to the Kubeflow Pipelines UI: | ||
|
||
- [MPIJob Horovod](mpi-job-horovod.py) |
336 changes: 336 additions & 0 deletions
336
examples/v1beta1/kubeflow-pipelines/early-stopping.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,336 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Kubeflow Pipelines with Katib component\n", | ||
"\n", | ||
"In this notebook you will:\n", | ||
"- Create Katib Experiment using random algorithm.\n", | ||
"- Use median stopping rule as an early stopping algorithm.\n", | ||
"- Use Kubernetes Job with mxnet mnist training container as a Trial template.\n", | ||
"- Create Pipeline to get the optimal hyperparameters.\n", | ||
"\n", | ||
"Reference documentation:\n", | ||
"- https://kubeflow.org/docs/components/katib/experiment/#random-search\n", | ||
"- https://kubeflow.org/docs/components/katib/early-stopping/\n", | ||
"- https://kubeflow.org/docs/pipelines/overview/concepts/component/\n", | ||
"\n", | ||
"**Note**: This Pipeline runs in the multi-user mode. Follow [this guide](https://github.com/kubeflow/katib/tree/master/examples/v1beta1/kubeflow-pipelines#multi-user-pipelines-setup) to give your Notebook access to Kubeflow Pipelines." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Install required packages (Kubeflow Pipelines and Katib SDK).\n", | ||
"!pip install kfp==1.8.4\n", | ||
"!pip install kubeflow-katib==0.12.0" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import kfp\n", | ||
"import kfp.dsl as dsl\n", | ||
"from kfp import components\n", | ||
"\n", | ||
"from kubeflow.katib import ApiClient\n", | ||
"from kubeflow.katib import V1beta1ExperimentSpec\n", | ||
"from kubeflow.katib import V1beta1AlgorithmSpec\n", | ||
"from kubeflow.katib import V1beta1EarlyStoppingSpec\n", | ||
"from kubeflow.katib import V1beta1EarlyStoppingSetting\n", | ||
"from kubeflow.katib import V1beta1ObjectiveSpec\n", | ||
"from kubeflow.katib import V1beta1ParameterSpec\n", | ||
"from kubeflow.katib import V1beta1FeasibleSpace\n", | ||
"from kubeflow.katib import V1beta1TrialTemplate\n", | ||
"from kubeflow.katib import V1beta1TrialParameterSpec" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Define an Experiment\n", | ||
"\n", | ||
"You have to create an Experiment object before deploying it. This Experiment is similar to [this](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/early-stopping/median-stop.yaml) YAML." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Experiment name and namespace.\n", | ||
"experiment_name = \"median-stop\"\n", | ||
"experiment_namespace = \"kubeflow-user-example-com\"\n", | ||
"\n", | ||
"# Trial count specification.\n", | ||
"max_trial_count = 18\n", | ||
"max_failed_trial_count = 3\n", | ||
"parallel_trial_count = 2\n", | ||
"\n", | ||
"# Objective specification.\n", | ||
"objective=V1beta1ObjectiveSpec(\n", | ||
" type=\"maximize\",\n", | ||
" goal= 0.99,\n", | ||
" objective_metric_name=\"Validation-accuracy\",\n", | ||
" additional_metric_names=[\n", | ||
" \"Train-accuracy\"\n", | ||
" ]\n", | ||
")\n", | ||
"\n", | ||
"# Algorithm specification.\n", | ||
"algorithm=V1beta1AlgorithmSpec(\n", | ||
" algorithm_name=\"random\",\n", | ||
")\n", | ||
"\n", | ||
"# Early Stopping specification.\n", | ||
"early_stopping=V1beta1EarlyStoppingSpec(\n", | ||
" algorithm_name=\"medianstop\",\n", | ||
" algorithm_settings=[\n", | ||
" V1beta1EarlyStoppingSetting(\n", | ||
" name=\"min_trials_required\",\n", | ||
" value=\"2\"\n", | ||
" )\n", | ||
" ]\n", | ||
")\n", | ||
"\n", | ||
"\n", | ||
"# Experiment search space.\n", | ||
"# In this example we tune learning rate, number of layer and optimizer.\n", | ||
"# Learning rate has bad feasible space to show more early stopped Trials.\n", | ||
"parameters=[\n", | ||
" V1beta1ParameterSpec(\n", | ||
" name=\"lr\",\n", | ||
" parameter_type=\"double\",\n", | ||
" feasible_space=V1beta1FeasibleSpace(\n", | ||
" min=\"0.01\",\n", | ||
" max=\"0.3\"\n", | ||
" ),\n", | ||
" ),\n", | ||
" V1beta1ParameterSpec(\n", | ||
" name=\"num-layers\",\n", | ||
" parameter_type=\"int\",\n", | ||
" feasible_space=V1beta1FeasibleSpace(\n", | ||
" min=\"2\",\n", | ||
" max=\"5\"\n", | ||
" ),\n", | ||
" ),\n", | ||
" V1beta1ParameterSpec(\n", | ||
" name=\"optimizer\",\n", | ||
" parameter_type=\"categorical\",\n", | ||
" feasible_space=V1beta1FeasibleSpace(\n", | ||
" list=[\n", | ||
" \"sgd\", \n", | ||
" \"adam\",\n", | ||
" \"ftrl\"\n", | ||
" ]\n", | ||
" ),\n", | ||
" ),\n", | ||
"]\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Define a Trial template\n", | ||
"\n", | ||
"In this example, the Trial's Worker is the Kubernetes Job." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# JSON template specification for the Trial's Worker Kubernetes Job.\n", | ||
"trial_spec={\n", | ||
" \"apiVersion\": \"batch/v1\",\n", | ||
" \"kind\": \"Job\",\n", | ||
" \"spec\": {\n", | ||
" \"template\": {\n", | ||
" \"metadata\": {\n", | ||
" \"annotations\": {\n", | ||
" \"sidecar.istio.io/inject\": \"false\"\n", | ||
" }\n", | ||
" },\n", | ||
" \"spec\": {\n", | ||
" \"containers\": [\n", | ||
" {\n", | ||
" \"name\": \"training-container\",\n", | ||
" \"image\": \"docker.io/kubeflowkatib/mxnet-mnist:v1beta1-45c5727\",\n", | ||
" \"command\": [\n", | ||
" \"python3\",\n", | ||
" \"/opt/mxnet-mnist/mnist.py\",\n", | ||
" \"--batch-size=64\",\n", | ||
" \"--lr=${trialParameters.learningRate}\",\n", | ||
" \"--num-layers=${trialParameters.numberLayers}\",\n", | ||
" \"--optimizer=${trialParameters.optimizer}\"\n", | ||
" ]\n", | ||
" }\n", | ||
" ],\n", | ||
" \"restartPolicy\": \"Never\"\n", | ||
" }\n", | ||
" }\n", | ||
" }\n", | ||
"}\n", | ||
"\n", | ||
"# Configure parameters for the Trial template.\n", | ||
"# We set the retain parameter to \"True\" to not clean-up the Trial Job's Kubernetes Pods.\n", | ||
"trial_template=V1beta1TrialTemplate(\n", | ||
" retain=True,\n", | ||
" primary_container_name=\"training-container\",\n", | ||
" trial_parameters=[\n", | ||
" V1beta1TrialParameterSpec(\n", | ||
" name=\"learningRate\",\n", | ||
" description=\"Learning rate for the training model\",\n", | ||
" reference=\"lr\"\n", | ||
" ),\n", | ||
" V1beta1TrialParameterSpec(\n", | ||
" name=\"numberLayers\",\n", | ||
" description=\"Number of training model layers\",\n", | ||
" reference=\"num-layers\"\n", | ||
" ),\n", | ||
" V1beta1TrialParameterSpec(\n", | ||
" name=\"optimizer\",\n", | ||
" description=\"Training model optimizer (sdg, adam or ftrl)\",\n", | ||
" reference=\"optimizer\"\n", | ||
" ),\n", | ||
" ],\n", | ||
" trial_spec=trial_spec\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Define an Experiment specification\n", | ||
"\n", | ||
"Create an Experiment specification from the above parameters." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"experiment_spec=V1beta1ExperimentSpec(\n", | ||
" max_trial_count=max_trial_count,\n", | ||
" max_failed_trial_count=max_failed_trial_count,\n", | ||
" parallel_trial_count=parallel_trial_count,\n", | ||
" objective=objective,\n", | ||
" algorithm=algorithm,\n", | ||
" early_stopping=early_stopping,\n", | ||
" parameters=parameters,\n", | ||
" trial_template=trial_template\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Create a Pipeline using Katib component\n", | ||
"\n", | ||
"The best hyperparameters are printed after Experiment is finished.\n", | ||
"The Experiment is not deleted after the Pipeline is finished." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Get the Katib launcher.\n", | ||
"katib_experiment_launcher_op = components.load_component_from_url(\n", | ||
" \"https://raw.githubusercontent.com/kubeflow/pipelines/master/components/kubeflow/katib-launcher/component.yaml\")\n", | ||
"\n", | ||
"@dsl.pipeline(\n", | ||
" name=\"Launch Katib early stopping Experiment\",\n", | ||
" description=\"An example to launch Katib Experiment with early stopping\"\n", | ||
")\n", | ||
"\n", | ||
"def median_stop():\n", | ||
"\n", | ||
" # Katib launcher component.\n", | ||
" # Experiment Spec should be serialized to a valid Kubernetes object.\n", | ||
" op = katib_experiment_launcher_op(\n", | ||
" experiment_name=experiment_name,\n", | ||
" experiment_namespace=experiment_namespace,\n", | ||
" experiment_spec=ApiClient().sanitize_for_serialization(experiment_spec),\n", | ||
" experiment_timeout_minutes=60,\n", | ||
" delete_finished_experiment=False)\n", | ||
"\n", | ||
" # Output container to print the results.\n", | ||
" op_out = dsl.ContainerOp(\n", | ||
" name=\"best-hp\",\n", | ||
" image=\"library/bash:4.4.23\",\n", | ||
" command=[\"sh\", \"-c\"],\n", | ||
" arguments=[\"echo Best HyperParameters: %s\" % op.output],\n", | ||
" )" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Run the Kubeflow Pipeline\n", | ||
"\n", | ||
"You can check the Katib Experiment info in the Katib UI." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"scrolled": true | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Run the Kubeflow Pipeline in the user's namespace.\n", | ||
"kfp.Client().create_run_from_pipeline_func(median_stop, namespace=experiment_namespace, arguments={})" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.10" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
Binary file not shown.
Oops, something went wrong.