Skip to content
This repository has been archived by the owner on Jan 31, 2024. It is now read-only.

Rename to Data Science Pipelines #685

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Open Data Hub is an end-to-end AI/ML platform on top of OpenShift Container Plat
* [Thrift Server](thriftserver/README.md)
* [Trino](trino/README.md)
* [ODH Notebook Controller](odh-notebook-controller/README.md)
* [ML Pipelines](ml-pipelines/README.md)
* [Data Science Pipelines](data-science-pipelines/README.md)

## Deploy

Expand Down
File renamed without changes.
78 changes: 78 additions & 0 deletions data-science-pipelines/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Data Science Pipelines

Data Science Pipelines is the Open Data Hub's pipeline solution for data scientists. It is built on top of the upstream [Kubeflow Piplines](https://github.com/kubeflow/pipelines) and [kfp-tekton](https://github.com/kubeflow/kfp-tekton) projects. The Open Data Hub community has a [fork](https://github.com/opendatahub-io/data-science-pipelines) of this upstream under the Open Data Hub org.


## Installation

### Prerequisites

1. The cluster needs to be OpenShift 4.9 or higher
2. OpenShift Pipelines 1.7.2 or higher needs to be installed on the cluster
3. The Open Data Hub operator needs to be installed
4. The default installation namespace for Data Science Pipelines is `odh-applications`. This namespace will need to be created. In case you wish to install in a custom location, create it and update the kfdef as documented below.

### Installation Steps

1. Ensure that the prerequisites are met.
2. Apply the kfdef at [kfctl_openshift_ds-pipelines.yaml](https://github.com/opendatahub-io/odh-manifests/blob/master/kfdef/kfctl_openshift_ds-pipelines.yaml). You may need to update the `namespace` field under `metadata` in case you want to deploy in a namespace that isn't `odh-applications`.
3. To find the url for Data Science pipelines, you can run the following command.
```bash
$ oc get route -n <kdef_namespace> ds-pipeline-ui -o jsonpath='{.spec.host}'
```
The value of `<kfdef_namespace>` should match the namespace field of the kfdef that you applied.
4. Alternatively, you can access the route via the console. To do so:

1. Go to `<kfdef_namespace>`
2. Click on `Networking` in the sidebar on the left side.
3. Click on `Routes`. It will take you to a new page in the console.
4. Click the url under the `Location` column for the row item matching `ds-pipeline-ui`


## Directory Structure

### Base

This directory contains artifacts for deploying all backend components of Data Science Pipelines. This deployment currently includes the kfp-tekton backend as well as a Minio deployment to act as an object store. The Minio deployment will be moved to an overlay at some point in the near future.

### Overlays

1. metadata-store-mysql: This overlay contains artifacts for deploying a MySQL database. MySQL is currently the only supported backend for Data Science Pipelines, so if you don't have an existing MySQL database deployed, this overlay needs to be applied.
2. metadata-store-postgresql: This overlay contains artifacts for deploying a PostgreSQL database. Data Science Pipelines does not currently support PostgreSQL as a backend, so deploying this overlay will not actually modify Data Science Pipelines behaviour.
3. ds-pipeline-ui: This overlay contains deployment artifacts for the Data Science Pipelines UI. Deploying Data Science Pipelines without this overlay will result in only the backend artifacts being created.
4. object-store-minio: This overlay contains artifacts for deploying Minio as the Object Store to store Pipelines artifacts.

### Prometheus

This directory contains the service monitor definition for Data Science Pipelines. It is always deployed by base, so this will eventually be moved into the base directory itself.

## Parameters

You can customize the Data Science Pipelines deployment by injecting custom parameters to change the default deployment. The following parameters can be used:

* **pipeline_install_configuration**: The ConfigMap name that contains the values to install the Data Science Pipelines environment. This parameter defaults to `pipeline-install-config` and you can find an example in the [repository](./base/configmaps/pipeline-install-config.yaml).
* **ds_pipelines_configuration**: The ConfigMap name that contains the values to integrate Data Science Pipelines with the underlying components (Database and Object Store). This parameter defaults to `kfp-tekton-config` and you can find an example in the [repository](./base/configmaps/kfp-tekton-config.yaml).
* **database_secret**: The secret that contains the credentials for the Data Science Pipelines Databse. It defaults to `mysql-secret` if using the `metadata-store-mysql` overlay or `postgresql-secret` if using the `metadata-store-postgresql` overlay.
* **ds_pipelines_ui_configuration**: The ConfigMap that contains the values to customize UI. It defaults to `ds-pipeline-ui-configmap`.

## Configuration

* It is possible to configure what S3 storage is being used by Pipeline Runs. Detailed instructions on how to configure this will be added once Minio is moved to an overlay.

## Usage

### These instructions will be updated once Data Science Pipelines has a tile available in odh-dashboard

1. Go to the ds-pipelines-ui route.
2. Click on `Pipelines` on the left side.
3. There will be a `[Demo] flip-coin` Pipeline already available. Click on it.
4. Click on the blue `Create run` button towards the top of the screen.
5. You can leave all the fields untouched. If desired, you can create a new experiment to link the pipeline run to, or rename the run itself.
6. Click on the blue `Start` button.
7. You will be taken to the `Runs` page. You will see a row matching the `Run name` you previously picked. Click on the `Run name` in that row.
8. Once the Pipeline is done running, you can see a graph of all the pods that were created as well as the paths that were followed.
9. For further verification, you can view all the pods that were created as part of the Pipeline Run in the `<kfdef_namespace>`. They will all show up as `Completed`.

## Data Science Pipelines Architecture

A complete architecture can be found at [ODH Data Science Pipelines Architecture and Design](https://docs.google.com/document/d/1o-JS1uZKLZsMY3D16kl5KBdyBb-aV-kyD_XycdJOYpM/edit#heading=h.3aocw3evrps0). This document will be moved to GitHub once the corresponding ML Ops SIG repos are created.
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ data:
artifact_bucket: mlpipeline
artifact_endpoint: minio-service:9000
artifact_endpoint_scheme: http://
artifact_image: quay.io/thoth-station/document-sync-job:v0.1.0
artifact_image: quay.io/opendatahub/ml-pipelines-artifact-manager:latest
artifact_script: |-
#!/usr/bin/env sh
push_artifact() {
Expand All @@ -32,5 +32,5 @@ data:
kind: ConfigMap
metadata:
labels:
application-crd-id: kubeflow-pipelines
name: kfp-tekton-config
application-crd-id: data-science-pipelines
name: ds-pipeline-config
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@ data:
appVersion: 1.7.0
autoUpdatePipelineDefaultVersion: "true"
bucketName: mlpipeline
cacheDb: cachedb
cacheDb: mlpipeline
cacheImage: registry.access.redhat.com/ubi8/ubi-minimal
cacheNodeRestrictions: "false"
cronScheduleTimezone: UTC
dbHost: mysql
dbPort: "3306"
defaultPipelineRoot: ""
mlmdDb: metadb
mlmdDb: mlpipeline
pipelineDb: mlpipeline
warning: |
1. Do not use kubectl to edit this configmap, because some values are used
Expand All @@ -24,5 +24,5 @@ data:
kind: ConfigMap
metadata:
labels:
application-crd-id: kubeflow-pipelines
application-crd-id: data-science-pipelines
name: pipeline-install-config
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
labels:
application-crd-id: kubeflow-pipelines
application-crd-id: data-science-pipelines
kubeflow/crd-install: "true"
name: scheduledworkflows.kubeflow.org
spec:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
labels:
application-crd-id: kubeflow-pipelines
application-crd-id: data-science-pipelines
kubeflow/crd-install: "true"
name: viewers.kubeflow.org
spec:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ds-pipeline-persistenceagent
application-crd-id: data-science-pipelines
name: ds-pipeline-persistenceagent
spec:
selector:
matchLabels:
app: ds-pipeline-persistenceagent
application-crd-id: data-science-pipelines
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
labels:
app: ds-pipeline-persistenceagent
application-crd-id: data-science-pipelines
spec:
containers:
- env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: persistenceagent
imagePullPolicy: IfNotPresent
name: ds-pipeline-persistenceagent
command:
- persistence_agent
- "--logtostderr=true"
- "--namespace=$(namespace)"
- "--ttlSecondsAfterWorkflowFinish=86400"
- "--numWorker=2"
- "--mlPipelineAPIServerName=ds-pipeline"
livenessProbe:
exec:
command:
- pidof
- persistence_agent
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 2
readinessProbe:
exec:
command:
- pidof
- persistence_agent
initialDelaySeconds: 3
periodSeconds: 5
timeoutSeconds: 2
resources:
requests:
cpu: 120m
memory: 500Mi
limits:
cpu: 250m
memory: 1Gi
serviceAccountName: ds-pipeline-persistenceagent
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ds-pipeline-scheduledworkflow
application-crd-id: data-science-pipelines
name: ds-pipeline-scheduledworkflow
spec:
selector:
matchLabels:
app: ds-pipeline-scheduledworkflow
application-crd-id: data-science-pipelines
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
labels:
app: ds-pipeline-scheduledworkflow
application-crd-id: data-science-pipelines
spec:
containers:
- env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CRON_SCHEDULE_TIMEZONE
valueFrom:
configMapKeyRef:
key: cronScheduleTimezone
name: $(pipeline_install_configuration)
image: scheduledworkflow
imagePullPolicy: IfNotPresent
name: ds-pipeline-scheduledworkflow
livenessProbe:
exec:
command:
- pidof
- controller
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 2
readinessProbe:
exec:
command:
- pidof
- controller
initialDelaySeconds: 3
periodSeconds: 5
timeoutSeconds: 2
resources:
requests:
cpu: 120m
memory: 100Mi
limits:
cpu: 250m
memory: 250Mi
serviceAccountName: ds-pipeline-scheduledworkflow
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ds-pipeline-viewer-crd
application-crd-id: data-science-pipelines
name: ds-pipeline-viewer-crd
spec:
selector:
matchLabels:
app: ds-pipeline-viewer-crd
application-crd-id: data-science-pipelines
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
labels:
app: ds-pipeline-viewer-crd
application-crd-id: data-science-pipelines
spec:
containers:
- env:
- name: MAX_NUM_VIEWERS
value: "50"
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MINIO_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: viewer-crd-controller
imagePullPolicy: Always
name: ds-pipeline-viewer-crd
livenessProbe:
exec:
command:
- pidof
- controller
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 2
readinessProbe:
exec:
command:
- pidof
- controller
initialDelaySeconds: 3
periodSeconds: 5
timeoutSeconds: 2
resources:
requests:
cpu: 120m
memory: 100Mi
limits:
cpu: 250m
memory: 500Mi
serviceAccountName: ds-pipeline-viewer-crd-service-account
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,29 @@ apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ml-pipeline-visualizationserver
application-crd-id: kubeflow-pipelines
name: ml-pipeline-visualizationserver
app: ds-pipeline-visualizationserver
application-crd-id: data-science-pipelines
name: ds-pipeline-visualizationserver
spec:
selector:
matchLabels:
app: ml-pipeline-visualizationserver
application-crd-id: kubeflow-pipelines
app: ds-pipeline-visualizationserver
application-crd-id: data-science-pipelines
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
labels:
app: ml-pipeline-visualizationserver
application-crd-id: kubeflow-pipelines
app: ds-pipeline-visualizationserver
application-crd-id: data-science-pipelines
spec:
containers:
- image: visualization-server
imagePullPolicy: IfNotPresent
imagePullPolicy: Always
name: ds-pipeline-visualizationserver
ports:
- containerPort: 8888
name: http
livenessProbe:
exec:
command:
Expand All @@ -30,13 +34,9 @@ spec:
- -O
- '-'
- http://localhost:8888/
initialDelaySeconds: 3
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 2
name: ml-pipeline-visualizationserver
ports:
- containerPort: 8888
name: http
readinessProbe:
exec:
command:
Expand All @@ -53,4 +53,7 @@ spec:
requests:
cpu: 30m
memory: 500Mi
serviceAccountName: ml-pipeline-visualizationserver
limits:
cpu: 250m
memory: 1Gi
serviceAccountName: ds-pipeline-visualizationserver
Loading