Skip to content

Commit

Permalink
Add Kubeflow Pipelines Examples (#1632)
Browse files Browse the repository at this point in the history
* Init commit with e2e example

* Add Early Stopping and MPI Examples

* Add MPI to README

* Modify SDK for MPI example

* Modify doc

* Update Early Stopping example

* Finish e2e example

* Modify links for KFP guide
  • Loading branch information
andreyvelich authored Oct 12, 2021
1 parent e5d7636 commit 195db29
Show file tree
Hide file tree
Showing 5 changed files with 1,295 additions and 0 deletions.
33 changes: 33 additions & 0 deletions examples/v1beta1/kubeflow-pipelines/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,34 @@
# Using Katib with Kubeflow Pipelines

The following examples show how to use Katib with
[Kubeflow Pipelines](https://github.com/kubeflow/pipelines).

You can find the Katib Component source code for the Kubeflow Pipelines
[here](https://github.com/kubeflow/pipelines/tree/master/components/kubeflow/katib-launcher).

## Prerequisites

You have to install the following Python SDK to run these examples:

- [`kfp`](https://pypi.org/project/kfp/) >= 1.8.4
- [`kubeflow-katib`](https://pypi.org/project/kubeflow-katib/) >= 0.12.0

## Multi-User Pipelines Setup

The Notebooks examples run Pipelines in multi-user mode and your Kubeflow Notebook
must have the appropriate `PodDefault` with the `pipelines.kubeflow.org` audience.

Please follow [this guide](https://www.kubeflow.org/docs/components/pipelines/sdk/connect-api/#multi-user-mode)
to give an access Kubeflow Notebook to run Kubeflow Pipelines.

## List of Examples

The following Pipelines are deployed from Kubeflow Notebook:

- [Kubeflow E2E MNIST](kubeflow-e2e-mnist.ipynb)

- [Katib Experiment with Early Stopping](early-stopping.ipynb)

The following Pipelines have to be compiled and uploaded to the Kubeflow Pipelines UI:

- [MPIJob Horovod](mpi-job-horovod.py)
336 changes: 336 additions & 0 deletions examples/v1beta1/kubeflow-pipelines/early-stopping.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,336 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Kubeflow Pipelines with Katib component\n",
"\n",
"In this notebook you will:\n",
"- Create Katib Experiment using random algorithm.\n",
"- Use median stopping rule as an early stopping algorithm.\n",
"- Use Kubernetes Job with mxnet mnist training container as a Trial template.\n",
"- Create Pipeline to get the optimal hyperparameters.\n",
"\n",
"Reference documentation:\n",
"- https://kubeflow.org/docs/components/katib/experiment/#random-search\n",
"- https://kubeflow.org/docs/components/katib/early-stopping/\n",
"- https://kubeflow.org/docs/pipelines/overview/concepts/component/\n",
"\n",
"**Note**: This Pipeline runs in the multi-user mode. Follow [this guide](https://github.com/kubeflow/katib/tree/master/examples/v1beta1/kubeflow-pipelines#multi-user-pipelines-setup) to give your Notebook access to Kubeflow Pipelines."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Install required packages (Kubeflow Pipelines and Katib SDK).\n",
"!pip install kfp==1.8.4\n",
"!pip install kubeflow-katib==0.12.0"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import kfp\n",
"import kfp.dsl as dsl\n",
"from kfp import components\n",
"\n",
"from kubeflow.katib import ApiClient\n",
"from kubeflow.katib import V1beta1ExperimentSpec\n",
"from kubeflow.katib import V1beta1AlgorithmSpec\n",
"from kubeflow.katib import V1beta1EarlyStoppingSpec\n",
"from kubeflow.katib import V1beta1EarlyStoppingSetting\n",
"from kubeflow.katib import V1beta1ObjectiveSpec\n",
"from kubeflow.katib import V1beta1ParameterSpec\n",
"from kubeflow.katib import V1beta1FeasibleSpace\n",
"from kubeflow.katib import V1beta1TrialTemplate\n",
"from kubeflow.katib import V1beta1TrialParameterSpec"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define an Experiment\n",
"\n",
"You have to create an Experiment object before deploying it. This Experiment is similar to [this](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/early-stopping/median-stop.yaml) YAML."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Experiment name and namespace.\n",
"experiment_name = \"median-stop\"\n",
"experiment_namespace = \"kubeflow-user-example-com\"\n",
"\n",
"# Trial count specification.\n",
"max_trial_count = 18\n",
"max_failed_trial_count = 3\n",
"parallel_trial_count = 2\n",
"\n",
"# Objective specification.\n",
"objective=V1beta1ObjectiveSpec(\n",
" type=\"maximize\",\n",
" goal= 0.99,\n",
" objective_metric_name=\"Validation-accuracy\",\n",
" additional_metric_names=[\n",
" \"Train-accuracy\"\n",
" ]\n",
")\n",
"\n",
"# Algorithm specification.\n",
"algorithm=V1beta1AlgorithmSpec(\n",
" algorithm_name=\"random\",\n",
")\n",
"\n",
"# Early Stopping specification.\n",
"early_stopping=V1beta1EarlyStoppingSpec(\n",
" algorithm_name=\"medianstop\",\n",
" algorithm_settings=[\n",
" V1beta1EarlyStoppingSetting(\n",
" name=\"min_trials_required\",\n",
" value=\"2\"\n",
" )\n",
" ]\n",
")\n",
"\n",
"\n",
"# Experiment search space.\n",
"# In this example we tune learning rate, number of layer and optimizer.\n",
"# Learning rate has bad feasible space to show more early stopped Trials.\n",
"parameters=[\n",
" V1beta1ParameterSpec(\n",
" name=\"lr\",\n",
" parameter_type=\"double\",\n",
" feasible_space=V1beta1FeasibleSpace(\n",
" min=\"0.01\",\n",
" max=\"0.3\"\n",
" ),\n",
" ),\n",
" V1beta1ParameterSpec(\n",
" name=\"num-layers\",\n",
" parameter_type=\"int\",\n",
" feasible_space=V1beta1FeasibleSpace(\n",
" min=\"2\",\n",
" max=\"5\"\n",
" ),\n",
" ),\n",
" V1beta1ParameterSpec(\n",
" name=\"optimizer\",\n",
" parameter_type=\"categorical\",\n",
" feasible_space=V1beta1FeasibleSpace(\n",
" list=[\n",
" \"sgd\", \n",
" \"adam\",\n",
" \"ftrl\"\n",
" ]\n",
" ),\n",
" ),\n",
"]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define a Trial template\n",
"\n",
"In this example, the Trial's Worker is the Kubernetes Job."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# JSON template specification for the Trial's Worker Kubernetes Job.\n",
"trial_spec={\n",
" \"apiVersion\": \"batch/v1\",\n",
" \"kind\": \"Job\",\n",
" \"spec\": {\n",
" \"template\": {\n",
" \"metadata\": {\n",
" \"annotations\": {\n",
" \"sidecar.istio.io/inject\": \"false\"\n",
" }\n",
" },\n",
" \"spec\": {\n",
" \"containers\": [\n",
" {\n",
" \"name\": \"training-container\",\n",
" \"image\": \"docker.io/kubeflowkatib/mxnet-mnist:v1beta1-45c5727\",\n",
" \"command\": [\n",
" \"python3\",\n",
" \"/opt/mxnet-mnist/mnist.py\",\n",
" \"--batch-size=64\",\n",
" \"--lr=${trialParameters.learningRate}\",\n",
" \"--num-layers=${trialParameters.numberLayers}\",\n",
" \"--optimizer=${trialParameters.optimizer}\"\n",
" ]\n",
" }\n",
" ],\n",
" \"restartPolicy\": \"Never\"\n",
" }\n",
" }\n",
" }\n",
"}\n",
"\n",
"# Configure parameters for the Trial template.\n",
"# We set the retain parameter to \"True\" to not clean-up the Trial Job's Kubernetes Pods.\n",
"trial_template=V1beta1TrialTemplate(\n",
" retain=True,\n",
" primary_container_name=\"training-container\",\n",
" trial_parameters=[\n",
" V1beta1TrialParameterSpec(\n",
" name=\"learningRate\",\n",
" description=\"Learning rate for the training model\",\n",
" reference=\"lr\"\n",
" ),\n",
" V1beta1TrialParameterSpec(\n",
" name=\"numberLayers\",\n",
" description=\"Number of training model layers\",\n",
" reference=\"num-layers\"\n",
" ),\n",
" V1beta1TrialParameterSpec(\n",
" name=\"optimizer\",\n",
" description=\"Training model optimizer (sdg, adam or ftrl)\",\n",
" reference=\"optimizer\"\n",
" ),\n",
" ],\n",
" trial_spec=trial_spec\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define an Experiment specification\n",
"\n",
"Create an Experiment specification from the above parameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment_spec=V1beta1ExperimentSpec(\n",
" max_trial_count=max_trial_count,\n",
" max_failed_trial_count=max_failed_trial_count,\n",
" parallel_trial_count=parallel_trial_count,\n",
" objective=objective,\n",
" algorithm=algorithm,\n",
" early_stopping=early_stopping,\n",
" parameters=parameters,\n",
" trial_template=trial_template\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create a Pipeline using Katib component\n",
"\n",
"The best hyperparameters are printed after Experiment is finished.\n",
"The Experiment is not deleted after the Pipeline is finished."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get the Katib launcher.\n",
"katib_experiment_launcher_op = components.load_component_from_url(\n",
" \"https://raw.githubusercontent.com/kubeflow/pipelines/master/components/kubeflow/katib-launcher/component.yaml\")\n",
"\n",
"@dsl.pipeline(\n",
" name=\"Launch Katib early stopping Experiment\",\n",
" description=\"An example to launch Katib Experiment with early stopping\"\n",
")\n",
"\n",
"def median_stop():\n",
"\n",
" # Katib launcher component.\n",
" # Experiment Spec should be serialized to a valid Kubernetes object.\n",
" op = katib_experiment_launcher_op(\n",
" experiment_name=experiment_name,\n",
" experiment_namespace=experiment_namespace,\n",
" experiment_spec=ApiClient().sanitize_for_serialization(experiment_spec),\n",
" experiment_timeout_minutes=60,\n",
" delete_finished_experiment=False)\n",
"\n",
" # Output container to print the results.\n",
" op_out = dsl.ContainerOp(\n",
" name=\"best-hp\",\n",
" image=\"library/bash:4.4.23\",\n",
" command=[\"sh\", \"-c\"],\n",
" arguments=[\"echo Best HyperParameters: %s\" % op.output],\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Run the Kubeflow Pipeline\n",
"\n",
"You can check the Katib Experiment info in the Katib UI."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Run the Kubeflow Pipeline in the user's namespace.\n",
"kfp.Client().create_run_from_pipeline_func(median_stop, namespace=experiment_namespace, arguments={})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Binary file added examples/v1beta1/kubeflow-pipelines/images/9.bmp
Binary file not shown.
Loading

0 comments on commit 195db29

Please sign in to comment.