Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor external add-ons section #3836

Merged
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/en/_index.html
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ <h5 class="card-title text-white section-head">Model Training</h5>
<div class="card-body bg-primary-dark">
<h5 class="card-title text-white section-head">Model Serving</h5>
<p class="card-text text-white">
<a href="https://kserve.github.io/website/" target="_blank" rel="noopener" >KServe</a> <small>(previously <a href="/docs/external-add-ons/kserve/kserve/" target="_blank" rel="noopener" >KFServing</a>)</small> solves production model serving on Kubernetes.
<a href="/docs/external-add-ons/kserve/introduction/" target="_blank" rel="noopener">KServe</a> <small>(previously <em>KFServing</em>)</small> solves production model serving on Kubernetes.
It delivers high-abstraction and performant interfaces for frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX.
</p>
</div>
Expand Down
12 changes: 12 additions & 0 deletions content/en/_redirects
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,18 @@ docs/started/requirements/ /docs/started/getting-started/
# rename customize dashboard page
/docs/components/central-dash/customizing-menu /docs/components/central-dash/customize

# rename feature-store to feast
/docs/external-add-ons/feature-store/overview/ /docs/external-add-ons/feast/introduction
/docs/external-add-ons/feature-store/getting-started/ /docs/external-add-ons/feast/introduction
/docs/external-add-ons/feature-store/ /docs/external-add-ons/feast

# rename kserve/kserve to kserve/introduction
/docs/external-add-ons/kserve/kserve/ /docs/external-add-ons/kserve/introduction

# redirect kserve pages to kserve website
/docs/external-add-ons/kserve/first_isvc_kserve/ https://kserve.github.io/website/latest/get_started/first_isvc/
/docs/external-add-ons/kserve/migration/ https://kserve.github.io/website/latest/admin/migration/

# ===============
# IMPORTANT NOTE:
# Catch-all redirects should be added at the end of this file as redirects happen from top to bottom
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ weight = 90

Katib offers a few installation options to install control plane. This page describes the options
and the features available with each option. Check
[the installation guide](/docs/components/katib/installation/#katib-control-plane-components) to
[the installation guide](/docs/components/katib/installation/#installing-control-plane) to
understand the Katib control plane components.

## The Default Katib Standalone Installation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This guide describes
[the Katib Config](https://github.com/kubeflow/katib/blob/19268062f1b187dde48114628e527a2a35b01d64/manifests/v1beta1/installs/katib-standalone/katib-config.yaml) —
the main configuration file for every Katib component. We use Kubernetes
[ConfigMap](https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/) to
fetch that config into [the Katib control plane components](/docs/components/katib/installation/#katib-control-plane-components).
fetch that config into [the Katib control plane components](/docs/components/katib/installation/#installing-control-plane).

The ConfigMap must be deployed in the
[`KATIB_CORE_NAMESPACE`](/docs/components/katib/user-guides/env-variables/#katib-controller)
Expand Down
8 changes: 6 additions & 2 deletions content/en/docs/external-add-ons/_index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
+++
title = "External Add-Ons"
description = "Additional tools that may be integrated with a Kubeflow deployment or distribution."
description = "Externally developed projects that integrate with Kubeflow"
weight = 30
+++
+++

{{% alert title="Ownership of External Add-Ons" color="dark" %}}
These add-ons are <strong>not owned or maintained</strong> by the Kubeflow project, they are developed and supported by their respective maintainers.
{{% /alert %}}
18 changes: 2 additions & 16 deletions content/en/docs/external-add-ons/elyra/_index.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,5 @@
+++
title = "Elyra"
description = "Elyra enables data scientists to visually create end-to-end machine learning (ML) workflows."
description = "Elyra | JupyterLab UI for Kubeflow Pipelines"
weight = 30
+++

Elyra aims to help data scientists, machine learning engineers and AI developers
through the model development life cycle complexities. Elyra integrates with JupyterLab
providing a Pipeline visual editor that enables low code/no code creation of Pipelines
that can be executed in a Kubeflow environment.

Below is an example of a Pipeline created with Elyra, you can identify the components/tasks
and related properties that are all managed in the visual editor.

<img src="/docs/external-add-ons/elyra/elyra-pipeline-covid-scenario.png" alt="A pipeline example created using Elyra Pipeline Visual Editor" class="mt-3 mb-3 p-3 border border-info rounded" />

To learn more about Elyra, visit <a href="https://github.com/elyra-ai/elyra" target="_blank">Elyra GitHub project</a>

To enable Elyra in your Kubeflow Environment, visit <a href="https://elyra.readthedocs.io/en/stable/recipes/using-elyra-with-kubeflow-notebook-server.html" target="_blank">Using Elyra with the Kubeflow Notebook Server</a>
+++
9 changes: 9 additions & 0 deletions content/en/docs/external-add-ons/elyra/github.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
+++
title = "GitHub Repository"
description = "LINK | GitHub repository for Elyra"
weight = 999
manualLink = "https://github.com/elyra-ai/elyra"
icon = "fa-brands fa-github"
+++

Elyra is developed in the [`elyra-ai/elyra`](https://github.com/elyra-ai/elyra) repository.
30 changes: 30 additions & 0 deletions content/en/docs/external-add-ons/elyra/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
+++
title = "Introduction"
description = "A brief introduction to Elyra"
weight = 10
+++

## What is Elyra?

[Elyra](https://elyra.readthedocs.io/en/stable/index.html) is an [open-source](https://github.com/elyra-ai/elyra) tool to reduce model development life cycle complexities.
Elyra is a _JupyterLab extension_ that provides a _visual pipeline editor_ to enable low-code creation of pipelines that can be executed with Kubeflow Pipelines.

Below is an example of a Pipeline created with Elyra, you can identify the components/tasks and related properties that are all managed in the visual editor.

<img src="/docs/external-add-ons/elyra/elyra-pipeline-covid-scenario.png" alt="A pipeline example created using Elyra Pipeline Visual Editor" class="mt-3 mb-3 p-3 border border-info rounded"></img>

## How to use Elyra with Kubeflow?

Elyra can be used with Kubeflow to create and run Pipelines in a Kubeflow environment.

You may create a [custom Kubeflow Notebook image](/docs/components/notebooks/container-images/#custom-images) based on any of our pre-built Jupyter Notebook images and install Elyra in it.
The Elyra project has an [example in their documentation](https://elyra.readthedocs.io/en/stable/recipes/using-elyra-with-kubeflow-notebook-server.html) and a [`Dockerfile`](https://github.com/elyra-ai/elyra/blob/main/etc/docker/kubeflow/Dockerfile) that you can use as a reference.

{{% alert title="Elyra and JupyterLab 4.0" color="warning" %}}
Elyra [`3.15.0`](https://github.com/elyra-ai/elyra/releases/tag/v3.15.0) may not properly support JupyterLab 4.0, which has been included in the default Kubeflow Notebook images since Kubeflow 1.9.0.
{{% /alert %}}

## Next steps

- Visit the <a href="https://github.com/elyra-ai/elyra" target="_blank">Elyra GitHub Repository</a>
- <a href="https://elyra.readthedocs.io/en/stable/recipes/using-elyra-with-kubeflow-notebook-server.html" target="_blank">Use Elyra in Kubeflow Notebooks</a>
9 changes: 9 additions & 0 deletions content/en/docs/external-add-ons/elyra/website.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
+++
title = "Elyra Website"
description = "LINK | Elyra Documentation Website"
weight = 998
manualLink = "https://elyra.readthedocs.io/"
icon = "fa-solid fa-arrow-up-right-from-square"
+++

Elyra has its own documentation website hosted at [`elyra.readthedocs.io/`](https://elyra.readthedocs.io/).
5 changes: 5 additions & 0 deletions content/en/docs/external-add-ons/feast/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
+++
title = "Feast"
description = "Feast | Feature Store"
weight = 20
+++
9 changes: 9 additions & 0 deletions content/en/docs/external-add-ons/feast/github.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
+++
title = "GitHub Repository"
description = "LINK | GitHub repository for Feast"
weight = 999
manualLink = "https://github.com/feast-dev/feast"
icon = "fa-brands fa-github"
+++

Feast is developed in the [`feast-dev/feast`](https://github.com/feast-dev/feast) repository.
99 changes: 99 additions & 0 deletions content/en/docs/external-add-ons/feast/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
+++
title = "Introduction"
description = "A brief introduction to Feast and Feature Stores"
weight = 10
+++

## What is Feast?

[Feast](https://docs.feast.dev/) is an [open-source](https://github.com/feast-dev/feast) feature store that helps teams operate ML systems at scale by allowing them to define, manage, validate, and serve features to models in production.

Feast provides the following functionality:

- __Load streaming and batch data__: Feast is built to be able to ingest data from a variety of bounded or unbounded sources.
thesuperzapper marked this conversation as resolved.
Show resolved Hide resolved
Feast allows users to ingest data from streams, object stores, databases, or notebooks.
thesuperzapper marked this conversation as resolved.
Show resolved Hide resolved
Data that is ingested into Feast is persisted in both online store and historical stores, which in turn is used for the creation of training datasets and serving features to online systems.
thesuperzapper marked this conversation as resolved.
Show resolved Hide resolved

- __Standardized definitions__: Feast becomes the single source of truth for all feature definitions and data within an organization.
Teams are able to capture documentation, metadata, and metrics about features.
This allows teams to communicate clearly about features, test feature data, and determine if a feature is both safe and relevant to their use cases.

- __Historical serving__: Features that are persisted in Feast can be retrieved through its feature serving APIs to produce training datasets.
Feast is able to produce massive training datasets that are agnostics of the data source that was used to ingest the data originally.
Feast is also able to ensure point-in-time correctness when joining these data sources, which in turn ensures the quality and consistency of features reaching models.

- __Online serving__: Feast exposes low latency serving APIs for all data that has been ingested into the system.
This allows all production ML systems to use Feast as the primary data source when looking up real-time features.

- __Consistency between training and serving__: Feast provides a consistent view of feature data through the use of a unified ingestion layer, unified serving API and canonical feature references.
By building ML systems on feature references, teams abstract away the underlying data infrastructure and make it possible to safely move models between training and serving without a drop in data consistency.

- __Feature sharing and reuse__: Feast provides a discovery and metadata API that allows teams to track, share, and reuse features across projects.
thesuperzapper marked this conversation as resolved.
Show resolved Hide resolved
Feast also decouples the process of creating features from the process of consumption, meaning teams that start new projects can begin by simply consuming features that already exist in the store, instead of starting from scratch.

- __Statistics and validation__: Feast allows for the generation of statistics based on features within the systems.
Feast has compatibility with TFDV, meaning statistics that are generated by Feast can be validated using TFDV.
Feast also allows teams to capture TFDV schemas as part of feature definitions, allowing domain experts to define data properties that can be used for validating these features in other production settings like training, ingestion, or serving.

### What is a feature store?

Feature stores are systems that reduce challenges faced by ML teams when productionizing features.

Some of the key challenges that feature stores help to address include:

- __Feature sharing and reuse__: Engineering features is one of the most time-consuming activities in building an end-to-end ML system, yet many teams continue to develop features in silos.
This leads to a high amount of re-development and duplication of work across teams and projects.

- __Serving features at scale__: Models need data that can come from a variety of sources, including event streams, data lakes, warehouses, or notebooks.
thesuperzapper marked this conversation as resolved.
Show resolved Hide resolved
ML teams need to be able to store and serve all these data sources to their models in a performant and reliable way.
The challenge is scalable production of massive datasets of features for model training, and providing access to real-time feature data at low latency and high throughput in serving.

- __Consistency between training and serving__: The separation between data scientists and engineering teams often lead to the re-development of feature transformations when moving from training to online serving.
Inconsistencies that arise due to discrepancies between training and serving implementations frequently leads to a drop in model performance in production.

- __Point-in-time correctness__: General purpose data systems are not built with ML use cases in mind and by extension don't provide point-in-time correct lookups of feature data.
Without a point-in-time correct view of data, models are trained on datasets that are not representative of what is found in production, leading to a drop in accuracy.

- __Data quality and validation__: Features are business critical inputs to ML systems. Teams need to be confident in the quality of data that is served in production and need to be able to react when there is any drift in the underlying data.

## How to use Feast with Kubeflow?

### Requirements

- A Kubernetes cluster with [Kubeflow installed](/docs/started/installing-kubeflow/)
- A database to use as an [offline store](https://docs.feast.dev/reference/offline-stores/overview) _(BigQuery, Snowflake, Redshift, etc.)_
- A database to use as an [online store](https://docs.feast.dev/reference/online-stores/overview) _(Redis, Datastore, DynamoDB, etc.)_
- A bucket _(S3, GCS, Minio, etc.)_ or SQL Database _(Postgres, MySQL, etc.)_ to use as the [feature registry](https://docs.feast.dev/getting-started/concepts/registry)
- A workflow engine _(Airflow, Kubeflow Pipelines, etc.)_ to [materialize data](https://docs.feast.dev/getting-started/concepts/data-ingestion) and run other Feast jobs

### Installation

To use Feast with Kubeflow, please follow the following steps

1. [__Install the Feast Python package__](https://docs.feast.dev/how-to-guides/feast-snowflake-gcp-aws/install-feast)
1. [__Create a feature repository__](https://docs.feast.dev/how-to-guides/feast-snowflake-gcp-aws/create-a-feature-repository)
1. [__Deploy your feature store__](https://docs.feast.dev/how-to-guides/feast-snowflake-gcp-aws/deploy-a-feature-store)
1. [__Create a training dataset__](https://docs.feast.dev/how-to-guides/feast-snowflake-gcp-aws/build-a-training-dataset)
1. [__Load features into the online store__](https://docs.feast.dev/how-to-guides/feast-snowflake-gcp-aws/load-data-into-the-online-store)
1. [__Read features from the online store__](https://docs.feast.dev/how-to-guides/feast-snowflake-gcp-aws/read-features-from-the-online-store)

Please their [production usage](https://docs.feast.dev/how-to-guides/running-feast-in-production) guide for best practices when running Feast in production.

### Accessing Feast from Kubeflow

Once Feast is installed within the same Kubernetes cluster as Kubeflow, users can access its APIs directly without any additional steps.

Feast APIs can roughly be grouped into the following sections:

- __Feature definition and management__:
- Feast provides both a [Python SDK](https://docs.feast.dev/getting-started/quickstart) and [CLI](https://docs.feast.dev/reference/feast-cli-commands) for interacting with Feast Core.
Feast Core allows users to define and register features and entities and their associated metadata and schemas.
The Python SDK is typically used from within a Jupyter notebook by end users to administer Feast, but ML teams may opt to version control feature specifications in order to follow a GitOps based approach.
- __Model training__:
- The Feast Python SDK can be used to trigger the [creation of training datasets](https://docs.feast.dev/how-to-guides/feast-snowflake-gcp-aws/build-a-training-dataset).
The most natural place to use this SDK is to create a training dataset as part of a [Kubeflow Pipeline](/docs/components/pipelines/overview/) prior to model training.
- __Model serving__:
- The Feast Python SDK can also be used for [online feature retrieval](https://docs.feast.dev/how-to-guides/feast-snowflake-gcp-aws/read-features-from-the-online-store).
This client is used to retrieve feature values for inference with [Model Serving](/docs/components/pipelines/overview/) systems like KFServing, TFX, or Seldon.

Please see their [tutorials page](https://docs.feast.dev/tutorials/tutorials-overview) for more information on how to use Feast.
9 changes: 9 additions & 0 deletions content/en/docs/external-add-ons/feast/website.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
+++
title = "Feast Website"
description = "LINK | Feast Documentation Website"
weight = 998
manualLink = "https://docs.feast.dev/"
icon = "fa-solid fa-arrow-up-right-from-square"
+++

Feast has its own documentation website hosted at [`docs.feast.dev`](https://docs.feast.dev/).
3 changes: 0 additions & 3 deletions content/en/docs/external-add-ons/feature-store/OWNERS

This file was deleted.

5 changes: 0 additions & 5 deletions content/en/docs/external-add-ons/feature-store/_index.md

This file was deleted.

Loading