opendatahub-io · openshift-merge-robot · Oct 10, 2022 · Oct 6, 2022 · Oct 7, 2022 · Oct 7, 2022
diff --git a/README.md b/README.md
@@ -27,7 +27,7 @@ Open Data Hub is an end-to-end AI/ML platform on top of OpenShift Container Plat
     * [Thrift Server](thriftserver/README.md)
 * [Trino](trino/README.md)
 * [ODH Notebook Controller](odh-notebook-controller/README.md)
-* [ML Pipelines](ml-pipelines/README.md)
+* [Data Science Pipelines](data-science-pipelines/README.md)
 
 ## Deploy
 

diff --git a/ml-pipelines/OWNERS → data-science-pipelines/OWNERS b/ml-pipelines/OWNERS → data-science-pipelines/OWNERS
diff --git a/data-science-pipelines/README.md b/data-science-pipelines/README.md
@@ -0,0 +1,78 @@
+# Data Science Pipelines
+
+Data Science Pipelines is the Open Data Hub's pipeline solution for data scientists. It is built on top of the upstream [Kubeflow Piplines](https://github.com/kubeflow/pipelines) and [kfp-tekton](https://github.com/kubeflow/kfp-tekton) projects. The Open Data Hub community has a [fork](https://github.com/opendatahub-io/data-science-pipelines) of this upstream under the Open Data Hub org.
+
+
+## Installation
+
+### Prerequisites
+
+1. The cluster needs to be OpenShift 4.9 or higher
+2. OpenShift Pipelines 1.7.2 or higher needs to be installed on the cluster
+3. The Open Data Hub operator needs to be installed
+4. The default installation namespace for Data Science Pipelines is `odh-applications`. This namespace will need to be created. In case you wish to install in a custom location, create it and update the kfdef as documented below.
+
+### Installation Steps
+
+1. Ensure that the prerequisites are met.
+2. Apply the kfdef at [kfctl_openshift_ds-pipelines.yaml](https://github.com/opendatahub-io/odh-manifests/blob/master/kfdef/kfctl_openshift_ds-pipelines.yaml). You may need to update the `namespace` field under `metadata` in case you want to deploy in a namespace that isn't `odh-applications`.
+3. To find the url for Data Science pipelines, you can run the following command.
+    ```bash
+    $ oc get route -n <kdef_namespace> ds-pipeline-ui -o jsonpath='{.spec.host}'
+    ```
+    The value of `<kfdef_namespace>` should match the namespace field of the kfdef that you applied.
+4. Alternatively, you can access the route via the console. To do so:
+
+    1. Go to `<kfdef_namespace>`
+    2. Click on `Networking` in the sidebar on the left side.
+    3. Click on `Routes`. It will take you to a new page in the console.
+    4. Click the url under the `Location` column for the row item matching `ds-pipeline-ui`
+
+
+## Directory Structure
+
+### Base
+
+This directory contains artifacts for deploying all backend components of Data Science Pipelines. This deployment currently includes the kfp-tekton backend as well as a Minio deployment to act as an object store. The Minio deployment will be moved to an overlay at some point in the near future.
+
+### Overlays
+
+1. metadata-store-mysql: This overlay contains artifacts for deploying a MySQL database. MySQL is currently the only supported backend for Data Science Pipelines, so if you don't have an existing MySQL database deployed, this overlay needs to be applied.
+2. metadata-store-postgresql: This overlay contains artifacts for deploying a PostgreSQL database. Data Science Pipelines does not currently support PostgreSQL as a backend, so deploying this overlay will not actually modify Data Science Pipelines behaviour.
+3. ds-pipeline-ui: This overlay contains deployment artifacts for the Data Science Pipelines UI. Deploying Data Science Pipelines without this overlay will result in only the backend artifacts being created.
+4. object-store-minio: This overlay contains artifacts for deploying Minio as the Object Store to store Pipelines artifacts.
+
+### Prometheus
+
+This directory contains the service monitor definition for Data Science Pipelines. It is always deployed by base, so this will eventually be moved into the base directory itself.
+
+## Parameters
+
+You can customize the Data Science Pipelines deployment by injecting custom parameters to change the default deployment. The following parameters can be used:
+
+* **pipeline_install_configuration**: The ConfigMap name that contains the values to install the Data Science Pipelines environment. This parameter defaults to `pipeline-install-config` and you can find an example in the [repository](./base/configmaps/pipeline-install-config.yaml).
+* **ds_pipelines_configuration**: The ConfigMap name that contains the values to integrate Data Science Pipelines with the underlying components (Database and Object Store). This parameter defaults to `kfp-tekton-config` and you can find an example in the [repository](./base/configmaps/kfp-tekton-config.yaml).
+* **database_secret**: The secret that contains the credentials for the Data Science Pipelines Databse. It defaults to `mysql-secret` if using the `metadata-store-mysql` overlay or `postgresql-secret` if using the `metadata-store-postgresql` overlay.
+* **ds_pipelines_ui_configuration**: The ConfigMap that contains the values to customize UI. It defaults to `ds-pipeline-ui-configmap`.
+
+## Configuration
+
+* It is possible to configure what S3 storage is being used by Pipeline Runs. Detailed instructions on how to configure this will be added once Minio is moved to an overlay.
+
+## Usage
+
+### These instructions will be updated once Data Science Pipelines has a tile available in odh-dashboard
+
+1. Go to the ds-pipelines-ui route.
+2. Click on `Pipelines` on the left side.
+3. There will be a `[Demo] flip-coin` Pipeline already available. Click on it.
+4. Click on the blue `Create run` button towards the top of the screen.
+5. You can leave all the fields untouched. If desired, you can create a new experiment to link the pipeline run to, or rename the run itself.
+6. Click on the blue `Start` button.
+7. You will be taken to the `Runs` page. You will see a row matching the `Run name` you previously picked. Click on the `Run name` in that row.
+8. Once the Pipeline is done running, you can see a graph of all the pods that were created as well as the paths that were followed.
+9. For further verification, you can view all the pods that were created as part of the Pipeline Run in the `<kfdef_namespace>`. They will all show up as `Completed`.
+
+## Data Science Pipelines Architecture
+
+A complete architecture can be found at [ODH Data Science Pipelines Architecture and Design](https://docs.google.com/document/d/1o-JS1uZKLZsMY3D16kl5KBdyBb-aV-kyD_XycdJOYpM/edit#heading=h.3aocw3evrps0). This document will be moved to GitHub once the corresponding ML Ops SIG repos are created.
diff --git a/...es/base/configmaps/kfp-tekton-config.yaml → ...s/base/configmaps/ds-pipeline-config.yaml b/...es/base/configmaps/kfp-tekton-config.yaml → ...s/base/configmaps/ds-pipeline-config.yaml
@@ -5,7 +5,7 @@ data:
   artifact_bucket: mlpipeline
   artifact_endpoint: minio-service:9000
   artifact_endpoint_scheme: http://
-  artifact_image: quay.io/thoth-station/document-sync-job:v0.1.0
+  artifact_image: quay.io/opendatahub/ml-pipelines-artifact-manager:latest
   artifact_script: |-
     #!/usr/bin/env sh
     push_artifact() {
@@ -32,5 +32,5 @@ data:
 kind: ConfigMap
 metadata:
   labels:
-    application-crd-id: kubeflow-pipelines
-  name: kfp-tekton-config
+    application-crd-id: data-science-pipelines
+  name: ds-pipeline-config
diff --git a/...e/configmaps/pipeline-install-config.yaml → ...e/configmaps/pipeline-install-config.yaml b/...e/configmaps/pipeline-install-config.yaml → ...e/configmaps/pipeline-install-config.yaml
@@ -5,14 +5,14 @@ data:
   appVersion: 1.7.0
   autoUpdatePipelineDefaultVersion: "true"
   bucketName: mlpipeline
-  cacheDb: cachedb
+  cacheDb: mlpipeline
   cacheImage: registry.access.redhat.com/ubi8/ubi-minimal
   cacheNodeRestrictions: "false"
   cronScheduleTimezone: UTC
   dbHost: mysql
   dbPort: "3306"
   defaultPipelineRoot: ""
-  mlmdDb: metadb
+  mlmdDb: mlpipeline
   pipelineDb: mlpipeline
   warning: |
     1. Do not use kubectl to edit this configmap, because some values are used
@@ -24,5 +24,5 @@ data:
 kind: ConfigMap
 metadata:
   labels:
-    application-crd-id: kubeflow-pipelines
+    application-crd-id: data-science-pipelines
   name: pipeline-install-config
diff --git a/...sourcedefinitions/scheduledworkflows.yaml → ...sourcedefinitions/scheduledworkflows.yaml b/...sourcedefinitions/scheduledworkflows.yaml → ...sourcedefinitions/scheduledworkflows.yaml
@@ -2,7 +2,7 @@ apiVersion: apiextensions.k8s.io/v1
 kind: CustomResourceDefinition
 metadata:
   labels:
-    application-crd-id: kubeflow-pipelines
+    application-crd-id: data-science-pipelines
     kubeflow/crd-install: "true"
   name: scheduledworkflows.kubeflow.org
 spec:

diff --git a/...se/customresourcedefinitions/viewers.yaml → ...se/customresourcedefinitions/viewers.yaml b/...se/customresourcedefinitions/viewers.yaml → ...se/customresourcedefinitions/viewers.yaml
@@ -2,7 +2,7 @@ apiVersion: apiextensions.k8s.io/v1
 kind: CustomResourceDefinition
 metadata:
   labels:
-    application-crd-id: kubeflow-pipelines
+    application-crd-id: data-science-pipelines
     kubeflow/crd-install: "true"
   name: viewers.kubeflow.org
 spec:

diff --git a/data-science-pipelines/base/deployments/ds-pipeline-persistenceagent.yaml b/data-science-pipelines/base/deployments/ds-pipeline-persistenceagent.yaml
@@ -0,0 +1,60 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  labels:
+    app: ds-pipeline-persistenceagent
+    application-crd-id: data-science-pipelines
+  name: ds-pipeline-persistenceagent
+spec:
+  selector:
+    matchLabels:
+      app: ds-pipeline-persistenceagent
+      application-crd-id: data-science-pipelines
+  template:
+    metadata:
+      annotations:
+        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
+      labels:
+        app: ds-pipeline-persistenceagent
+        application-crd-id: data-science-pipelines
+    spec:
+      containers:
+        - env:
+            - name: NAMESPACE
+              valueFrom:
+                fieldRef:
+                  fieldPath: metadata.namespace
+          image: persistenceagent
+          imagePullPolicy: IfNotPresent
+          name: ds-pipeline-persistenceagent
+          command:
+            - persistence_agent
+            - "--logtostderr=true"
+            - "--namespace=$(namespace)"
+            - "--ttlSecondsAfterWorkflowFinish=86400"
+            - "--numWorker=2"
+            - "--mlPipelineAPIServerName=ds-pipeline"
+          livenessProbe:
+            exec:
+              command:
+                - pidof
+                - persistence_agent
+            initialDelaySeconds: 30
+            periodSeconds: 5
+            timeoutSeconds: 2
+          readinessProbe:
+            exec:
+              command:
+                - pidof
+                - persistence_agent
+            initialDelaySeconds: 3
+            periodSeconds: 5
+            timeoutSeconds: 2
+          resources:
+            requests:
+              cpu: 120m
+              memory: 500Mi
+            limits:
+              cpu: 250m
+              memory: 1Gi
+      serviceAccountName: ds-pipeline-persistenceagent
diff --git a/data-science-pipelines/base/deployments/ds-pipeline-scheduledworkflow.yaml b/data-science-pipelines/base/deployments/ds-pipeline-scheduledworkflow.yaml
@@ -0,0 +1,58 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  labels:
+    app: ds-pipeline-scheduledworkflow
+    application-crd-id: data-science-pipelines
+  name: ds-pipeline-scheduledworkflow
+spec:
+  selector:
+    matchLabels:
+      app: ds-pipeline-scheduledworkflow
+      application-crd-id: data-science-pipelines
+  template:
+    metadata:
+      annotations:
+        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
+      labels:
+        app: ds-pipeline-scheduledworkflow
+        application-crd-id: data-science-pipelines
+    spec:
+      containers:
+        - env:
+            - name: NAMESPACE
+              valueFrom:
+                fieldRef:
+                  fieldPath: metadata.namespace
+            - name: CRON_SCHEDULE_TIMEZONE
+              valueFrom:
+                configMapKeyRef:
+                  key: cronScheduleTimezone
+                  name: $(pipeline_install_configuration)
+          image: scheduledworkflow
+          imagePullPolicy: IfNotPresent
+          name: ds-pipeline-scheduledworkflow
+          livenessProbe:
+            exec:
+              command:
+                - pidof
+                - controller
+            initialDelaySeconds: 30
+            periodSeconds: 5
+            timeoutSeconds: 2
+          readinessProbe:
+            exec:
+              command:
+                - pidof
+                - controller
+            initialDelaySeconds: 3
+            periodSeconds: 5
+            timeoutSeconds: 2
+          resources:
+            requests:
+              cpu: 120m
+              memory: 100Mi
+            limits:
+              cpu: 250m
+              memory: 250Mi
+      serviceAccountName: ds-pipeline-scheduledworkflow
diff --git a/data-science-pipelines/base/deployments/ds-pipeline-viewer-crd.yaml b/data-science-pipelines/base/deployments/ds-pipeline-viewer-crd.yaml
@@ -0,0 +1,59 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  labels:
+    app: ds-pipeline-viewer-crd
+    application-crd-id: data-science-pipelines
+  name: ds-pipeline-viewer-crd
+spec:
+  selector:
+    matchLabels:
+      app: ds-pipeline-viewer-crd
+      application-crd-id: data-science-pipelines
+  template:
+    metadata:
+      annotations:
+        cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
+      labels:
+        app: ds-pipeline-viewer-crd
+        application-crd-id: data-science-pipelines
+    spec:
+      containers:
+        - env:
+            - name: MAX_NUM_VIEWERS
+              value: "50"
+            - name: NAMESPACE
+              valueFrom:
+                fieldRef:
+                  fieldPath: metadata.namespace
+            - name: MINIO_NAMESPACE
+              valueFrom:
+                fieldRef:
+                  fieldPath: metadata.namespace
+          image: viewer-crd-controller
+          imagePullPolicy: Always
+          name: ds-pipeline-viewer-crd
+          livenessProbe:
+            exec:
+              command:
+                - pidof
+                - controller
+            initialDelaySeconds: 30
+            periodSeconds: 5
+            timeoutSeconds: 2
+          readinessProbe:
+            exec:
+              command:
+                - pidof
+                - controller
+            initialDelaySeconds: 3
+            periodSeconds: 5
+            timeoutSeconds: 2
+          resources:
+            requests:
+              cpu: 120m
+              memory: 100Mi
+            limits:
+              cpu: 250m
+              memory: 500Mi
+      serviceAccountName: ds-pipeline-viewer-crd-service-account
diff --git a/...ents/ml-pipeline-visualizationserver.yaml → ...ents/ds-pipeline-visualizationserver.yaml b/...ents/ml-pipeline-visualizationserver.yaml → ...ents/ds-pipeline-visualizationserver.yaml
@@ -2,25 +2,29 @@ apiVersion: apps/v1
 kind: Deployment
 metadata:
   labels:
-    app: ml-pipeline-visualizationserver
-    application-crd-id: kubeflow-pipelines
-  name: ml-pipeline-visualizationserver
+    app: ds-pipeline-visualizationserver
+    application-crd-id: data-science-pipelines
+  name: ds-pipeline-visualizationserver
 spec:
   selector:
     matchLabels:
-      app: ml-pipeline-visualizationserver
-      application-crd-id: kubeflow-pipelines
+      app: ds-pipeline-visualizationserver
+      application-crd-id: data-science-pipelines
   template:
     metadata:
       annotations:
         cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
       labels:
-        app: ml-pipeline-visualizationserver
-        application-crd-id: kubeflow-pipelines
+        app: ds-pipeline-visualizationserver
+        application-crd-id: data-science-pipelines
     spec:
       containers:
         - image: visualization-server
-          imagePullPolicy: IfNotPresent
+          imagePullPolicy: Always
+          name: ds-pipeline-visualizationserver
+          ports:
+            - containerPort: 8888
+              name: http
           livenessProbe:
             exec:
               command:
@@ -30,13 +34,9 @@ spec:
                 - -O
                 - '-'
                 - http://localhost:8888/
-            initialDelaySeconds: 3
+            initialDelaySeconds: 30
             periodSeconds: 5
             timeoutSeconds: 2
-          name: ml-pipeline-visualizationserver
-          ports:
-            - containerPort: 8888
-              name: http
           readinessProbe:
             exec:
               command:
@@ -53,4 +53,7 @@ spec:
             requests:
               cpu: 30m
               memory: 500Mi
-      serviceAccountName: ml-pipeline-visualizationserver
+            limits:
+              cpu: 250m
+              memory: 1Gi
+      serviceAccountName: ds-pipeline-visualizationserver