Add Kubeflow Pipelines Examples (#1632)

* Init commit with e2e example * Add Early Stopping and MPI Examples * Add MPI to README * Modify SDK for MPI example * Modify doc * Update Early Stopping example * Finish e2e example * Modify links for KFP guide
kubeflow · Oct 12, 2021 · 195db29 · 195db29
1 parent e5d7636
commit 195db29
Show file tree

Hide file tree

Showing 5 changed files with 1,295 additions and 0 deletions.
diff --git a/examples/v1beta1/kubeflow-pipelines/README.md b/examples/v1beta1/kubeflow-pipelines/README.md
@@ -1 +1,34 @@
 # Using Katib with Kubeflow Pipelines
+
+The following examples show how to use Katib with
+[Kubeflow Pipelines](https://github.com/kubeflow/pipelines).
+
+You can find the Katib Component source code for the Kubeflow Pipelines
+[here](https://github.com/kubeflow/pipelines/tree/master/components/kubeflow/katib-launcher).
+
+## Prerequisites
+
+You have to install the following Python SDK to run these examples:
+
+- [`kfp`](https://pypi.org/project/kfp/) >= 1.8.4
+- [`kubeflow-katib`](https://pypi.org/project/kubeflow-katib/) >= 0.12.0
+
+## Multi-User Pipelines Setup
+
+The Notebooks examples run Pipelines in multi-user mode and your Kubeflow Notebook
+must have the appropriate `PodDefault` with the `pipelines.kubeflow.org` audience.
+
+Please follow [this guide](https://www.kubeflow.org/docs/components/pipelines/sdk/connect-api/#multi-user-mode)
+to give an access Kubeflow Notebook to run Kubeflow Pipelines.
+
+## List of Examples
+
+The following Pipelines are deployed from Kubeflow Notebook:
+
+- [Kubeflow E2E MNIST](kubeflow-e2e-mnist.ipynb)
+
+- [Katib Experiment with Early Stopping](early-stopping.ipynb)
+
+The following Pipelines have to be compiled and uploaded to the Kubeflow Pipelines UI:
+
+- [MPIJob Horovod](mpi-job-horovod.py)
diff --git a/examples/v1beta1/kubeflow-pipelines/early-stopping.ipynb b/examples/v1beta1/kubeflow-pipelines/early-stopping.ipynb
@@ -0,0 +1,336 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Kubeflow Pipelines with Katib component\n",
+    "\n",
+    "In this notebook you will:\n",
+    "- Create Katib Experiment using random algorithm.\n",
+    "- Use median stopping rule as an early stopping algorithm.\n",
+    "- Use Kubernetes Job with mxnet mnist training container as a Trial template.\n",
+    "- Create Pipeline to get the optimal hyperparameters.\n",
+    "\n",
+    "Reference documentation:\n",
+    "- https://kubeflow.org/docs/components/katib/experiment/#random-search\n",
+    "- https://kubeflow.org/docs/components/katib/early-stopping/\n",
+    "- https://kubeflow.org/docs/pipelines/overview/concepts/component/\n",
+    "\n",
+    "**Note**: This Pipeline runs in the multi-user mode. Follow [this guide](https://github.com/kubeflow/katib/tree/master/examples/v1beta1/kubeflow-pipelines#multi-user-pipelines-setup) to give your Notebook access to Kubeflow Pipelines."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install required packages (Kubeflow Pipelines and Katib SDK).\n",
+    "!pip install kfp==1.8.4\n",
+    "!pip install kubeflow-katib==0.12.0"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import kfp\n",
+    "import kfp.dsl as dsl\n",
+    "from kfp import components\n",
+    "\n",
+    "from kubeflow.katib import ApiClient\n",
+    "from kubeflow.katib import V1beta1ExperimentSpec\n",
+    "from kubeflow.katib import V1beta1AlgorithmSpec\n",
+    "from kubeflow.katib import V1beta1EarlyStoppingSpec\n",
+    "from kubeflow.katib import V1beta1EarlyStoppingSetting\n",
+    "from kubeflow.katib import V1beta1ObjectiveSpec\n",
+    "from kubeflow.katib import V1beta1ParameterSpec\n",
+    "from kubeflow.katib import V1beta1FeasibleSpace\n",
+    "from kubeflow.katib import V1beta1TrialTemplate\n",
+    "from kubeflow.katib import V1beta1TrialParameterSpec"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Define an Experiment\n",
+    "\n",
+    "You have to create an Experiment object before deploying it. This Experiment is similar to [this](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/early-stopping/median-stop.yaml) YAML."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Experiment name and namespace.\n",
+    "experiment_name = \"median-stop\"\n",
+    "experiment_namespace = \"kubeflow-user-example-com\"\n",
+    "\n",
+    "# Trial count specification.\n",
+    "max_trial_count = 18\n",
+    "max_failed_trial_count = 3\n",
+    "parallel_trial_count = 2\n",
+    "\n",
+    "# Objective specification.\n",
+    "objective=V1beta1ObjectiveSpec(\n",
+    "    type=\"maximize\",\n",
+    "    goal= 0.99,\n",
+    "    objective_metric_name=\"Validation-accuracy\",\n",
+    "    additional_metric_names=[\n",
+    "        \"Train-accuracy\"\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "# Algorithm specification.\n",
+    "algorithm=V1beta1AlgorithmSpec(\n",
+    "    algorithm_name=\"random\",\n",
+    ")\n",
+    "\n",
+    "# Early Stopping specification.\n",
+    "early_stopping=V1beta1EarlyStoppingSpec(\n",
+    "    algorithm_name=\"medianstop\",\n",
+    "    algorithm_settings=[\n",
+    "        V1beta1EarlyStoppingSetting(\n",
+    "            name=\"min_trials_required\",\n",
+    "            value=\"2\"\n",
+    "        )\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "\n",
+    "# Experiment search space.\n",
+    "# In this example we tune learning rate, number of layer and optimizer.\n",
+    "# Learning rate has bad feasible space to show more early stopped Trials.\n",
+    "parameters=[\n",
+    "    V1beta1ParameterSpec(\n",
+    "        name=\"lr\",\n",
+    "        parameter_type=\"double\",\n",
+    "        feasible_space=V1beta1FeasibleSpace(\n",
+    "            min=\"0.01\",\n",
+    "            max=\"0.3\"\n",
+    "        ),\n",
+    "    ),\n",
+    "    V1beta1ParameterSpec(\n",
+    "        name=\"num-layers\",\n",
+    "        parameter_type=\"int\",\n",
+    "        feasible_space=V1beta1FeasibleSpace(\n",
+    "            min=\"2\",\n",
+    "            max=\"5\"\n",
+    "        ),\n",
+    "    ),\n",
+    "    V1beta1ParameterSpec(\n",
+    "        name=\"optimizer\",\n",
+    "        parameter_type=\"categorical\",\n",
+    "        feasible_space=V1beta1FeasibleSpace(\n",
+    "            list=[\n",
+    "                \"sgd\", \n",
+    "                \"adam\",\n",
+    "                \"ftrl\"\n",
+    "            ]\n",
+    "        ),\n",
+    "    ),\n",
+    "]\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Define a Trial template\n",
+    "\n",
+    "In this example, the Trial's Worker is the Kubernetes Job."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# JSON template specification for the Trial's Worker Kubernetes Job.\n",
+    "trial_spec={\n",
+    "    \"apiVersion\": \"batch/v1\",\n",
+    "    \"kind\": \"Job\",\n",
+    "    \"spec\": {\n",
+    "        \"template\": {\n",
+    "            \"metadata\": {\n",
+    "                \"annotations\": {\n",
+    "                     \"sidecar.istio.io/inject\": \"false\"\n",
+    "                }\n",
+    "            },\n",
+    "            \"spec\": {\n",
+    "                \"containers\": [\n",
+    "                    {\n",
+    "                        \"name\": \"training-container\",\n",
+    "                        \"image\": \"docker.io/kubeflowkatib/mxnet-mnist:v1beta1-45c5727\",\n",
+    "                        \"command\": [\n",
+    "                            \"python3\",\n",
+    "                            \"/opt/mxnet-mnist/mnist.py\",\n",
+    "                            \"--batch-size=64\",\n",
+    "                            \"--lr=${trialParameters.learningRate}\",\n",
+    "                            \"--num-layers=${trialParameters.numberLayers}\",\n",
+    "                            \"--optimizer=${trialParameters.optimizer}\"\n",
+    "                        ]\n",
+    "                    }\n",
+    "                ],\n",
+    "                \"restartPolicy\": \"Never\"\n",
+    "            }\n",
+    "        }\n",
+    "    }\n",
+    "}\n",
+    "\n",
+    "# Configure parameters for the Trial template.\n",
+    "# We set the retain parameter to \"True\" to not clean-up the Trial Job's Kubernetes Pods.\n",
+    "trial_template=V1beta1TrialTemplate(\n",
+    "    retain=True,\n",
+    "    primary_container_name=\"training-container\",\n",
+    "    trial_parameters=[\n",
+    "        V1beta1TrialParameterSpec(\n",
+    "            name=\"learningRate\",\n",
+    "            description=\"Learning rate for the training model\",\n",
+    "            reference=\"lr\"\n",
+    "        ),\n",
+    "        V1beta1TrialParameterSpec(\n",
+    "            name=\"numberLayers\",\n",
+    "            description=\"Number of training model layers\",\n",
+    "            reference=\"num-layers\"\n",
+    "        ),\n",
+    "        V1beta1TrialParameterSpec(\n",
+    "            name=\"optimizer\",\n",
+    "            description=\"Training model optimizer (sdg, adam or ftrl)\",\n",
+    "            reference=\"optimizer\"\n",
+    "        ),\n",
+    "    ],\n",
+    "    trial_spec=trial_spec\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Define an Experiment specification\n",
+    "\n",
+    "Create an Experiment specification from the above parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "experiment_spec=V1beta1ExperimentSpec(\n",
+    "    max_trial_count=max_trial_count,\n",
+    "    max_failed_trial_count=max_failed_trial_count,\n",
+    "    parallel_trial_count=parallel_trial_count,\n",
+    "    objective=objective,\n",
+    "    algorithm=algorithm,\n",
+    "    early_stopping=early_stopping,\n",
+    "    parameters=parameters,\n",
+    "    trial_template=trial_template\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Create a Pipeline using Katib component\n",
+    "\n",
+    "The best hyperparameters are printed after Experiment is finished.\n",
+    "The Experiment is not deleted after the Pipeline is finished."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Get the Katib launcher.\n",
+    "katib_experiment_launcher_op = components.load_component_from_url(\n",
+    "    \"https://raw.githubusercontent.com/kubeflow/pipelines/master/components/kubeflow/katib-launcher/component.yaml\")\n",
+    "\n",
+    "@dsl.pipeline(\n",
+    "    name=\"Launch Katib early stopping Experiment\",\n",
+    "    description=\"An example to launch Katib Experiment with early stopping\"\n",
+    ")\n",
+    "\n",
+    "def median_stop():\n",
+    "\n",
+    "    # Katib launcher component.\n",
+    "    # Experiment Spec should be serialized to a valid Kubernetes object.\n",
+    "    op = katib_experiment_launcher_op(\n",
+    "        experiment_name=experiment_name,\n",
+    "        experiment_namespace=experiment_namespace,\n",
+    "        experiment_spec=ApiClient().sanitize_for_serialization(experiment_spec),\n",
+    "        experiment_timeout_minutes=60,\n",
+    "        delete_finished_experiment=False)\n",
+    "\n",
+    "    # Output container to print the results.\n",
+    "    op_out = dsl.ContainerOp(\n",
+    "        name=\"best-hp\",\n",
+    "        image=\"library/bash:4.4.23\",\n",
+    "        command=[\"sh\", \"-c\"],\n",
+    "        arguments=[\"echo Best HyperParameters: %s\" % op.output],\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Run the Kubeflow Pipeline\n",
+    "\n",
+    "You can check the Katib Experiment info in the Katib UI."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "# Run the Kubeflow Pipeline in the user's namespace.\n",
+    "kfp.Client().create_run_from_pipeline_func(median_stop,  namespace=experiment_namespace, arguments={})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/examples/v1beta1/kubeflow-pipelines/images/9.bmp b/examples/v1beta1/kubeflow-pipelines/images/9.bmp