Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Kubeflow Installation with Standalone Mode #3724

Merged
merged 26 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
e4afb10
Update Kubeflow Installation with Standalone Components
andreyvelich Apr 28, 2024
72ef593
Add helpful message for manifests installation
andreyvelich Apr 29, 2024
4110705
Add explanation for Kubeflow Platform and Kubeflow Standalone Components
andreyvelich May 8, 2024
7420e03
Move Kubeflow explanation to introduction guide
andreyvelich May 10, 2024
26b262a
Add Spark Operator
andreyvelich May 10, 2024
6cf8f38
Add links to the introduction
andreyvelich May 10, 2024
15ed4c7
Remove Manifests WG link
andreyvelich May 13, 2024
a022b34
Modify table column
andreyvelich May 14, 2024
f9d5a52
Order components alphabeticaly
andreyvelich May 14, 2024
8fbad11
Update introduction
andreyvelich May 14, 2024
29391f8
Update Kubeflow intro
andreyvelich May 15, 2024
db5c542
Fix KFP install link
andreyvelich May 20, 2024
5c80be7
Review comments
andreyvelich May 20, 2024
78f683e
Add Kubeflow Platform Header
andreyvelich May 21, 2024
35ea615
Modify headers and text
andreyvelich May 21, 2024
a438b5a
Add Kubeflow Notebooks to Kubeflow Platform tools
andreyvelich May 22, 2024
2495604
Modify the install headers
andreyvelich May 22, 2024
cf7c50b
Update what are Kubeflow Standalone Components
andreyvelich May 22, 2024
224a638
Change to Standalone Kubeflow Components
andreyvelich May 22, 2024
fb14fb1
Add link to Kubeflow Platform
andreyvelich May 22, 2024
e18ac08
Rename Raw Manifests to Kubeflow Manifests
andreyvelich May 23, 2024
7df289c
Change H3
andreyvelich May 24, 2024
90df53d
Rename to quick and easy
andreyvelich May 25, 2024
63c3a61
Remove new line in Introduction
andreyvelich May 25, 2024
b0de894
Modify the install standalone section
andreyvelich May 27, 2024
7e7b91c
Fix install guide
andreyvelich May 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
184 changes: 161 additions & 23 deletions content/en/docs/started/installing-kubeflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,165 @@ weight = 20

+++

## What is Kubeflow?
This guide describes how to install Kubeflow standalone components or Kubeflow Platform using package
distributions or raw manifests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be rewritten or removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbottum Any suggestion how do you think we should re-write it ? I just tried to be consistent with other installation page that we updated with @StefanoFioravanzo and @hbelmiro. E.g. Training Operator or Katib.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
distributions or raw manifests.
distributions or Kustomize manifests.


Kubeflow is an end-to-end Machine Learning (ML) platform for Kubernetes, it provides components for each stage in the ML lifecycle, from exploration through to training and deployment.
Operators can choose what is best for their users, there is no requirement to deploy every component.
Read [the introduction guide](/docs/started/introduction) to understand what are Kubeflow
standalone components and what is Kubeflow Platform.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify this wording for clarity:

Suggested change
Read [the introduction guide](/docs/started/introduction) to understand what are Kubeflow
standalone components and what is Kubeflow Platform.
Read [the introduction](/docs/started/introduction) to learn more about Kubeflow, standalone components, and the Kubeflow Platform.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved.


Learn more about Kubeflow in the [Introduction](/docs/started/introduction/) and
[Architecture](/docs/started/architecture/) pages.
You can install Kubeflow using one of these methods:

## How to install Kubeflow?
- [**Install Kubeflow Components Standalone**](#install-kubeflow-components-standalone)

Anywhere you are running Kubernetes, you should be able to run Kubeflow.
There are two primary ways to install Kubeflow:
- Install Kubeflow Platform
- [**From Packaged Distributions**](#from-packaged-distributions)
- [**From Raw Manifests**](#from-raw-manifests) <sup>(advanced users)</sup>
andreyvelich marked this conversation as resolved.
Show resolved Hide resolved

1. [**Packaged Distributions**](#packaged-distributions-of-kubeflow)
1. [**Raw Manifests**](#raw-kubeflow-manifests) <sup>(advanced users)</sup>
## Install Kubeflow Components Standalone
andreyvelich marked this conversation as resolved.
Show resolved Hide resolved

<a id="packaged-distributions"></a>
<a id="install-a-packaged-kubeflow-distribution"></a>
Some components in the [Kubeflow ecosystem](/docs/started/architecture/#conceptual-overview) may be
deployed as standalone services, without the need to install the full platform. You might integrate
these services as part of your existing AI/ML platform or use them independently.

## Packaged Distributions of Kubeflow
This is a quick and easier method to get started with Kubeflow ecosystem since those components usually
don't require additional management tools used in a Kubeflow Platform.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than saying "easier" we can say "quicker" because that is objectively true, whereas "easier" is subjective.

We can also use this line to explain that users should look at the component docs for specific differences (e.g. Pipelines using embedded MinIO and not supporting multi-user):

Suggested change
This is a quick and easier method to get started with Kubeflow ecosystem since those components usually
don't require additional management tools used in a Kubeflow Platform.
Standalone components provide a quick way to get started with the Kubeflow ecosystem.
However, there are some differences in functionality compared to a full [Kubeflow Platform](#kubeflow-platform), please refer to each component documentation for more details.

Copy link
Member

@thesuperzapper thesuperzapper May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be discussed.

In the community call people were happy to say "a quick way" rather than "easier" as its objectively true, rather than being a comparative statement.

Its also important that we ask users to review the component-specific docs for any differences.

See https://github.com/kubeflow/website/pull/3724/files#r1608934544 for my proposal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already asked community, distribution, users to comment on this PR if they have concerns with word: easier.
We can wait by end of this week for other objections.

As @jbottum said on the community call it is obvious that starting with single component will be easier than install all Kubeflow Platform.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't know what is "easier" for any particular user as this is subjective and depends on what they are trying to do.

@rareddy @kimwnasptd can you please clarify what the positions of Red Hat and Canonical are here? (Or any other distributions)


The specific wording is not important, and it's much less controversial if we just say standalone components are "a quick way to get started with Kubeflow" (note, this is not "quicker").

@andreyvelich I hope you understand why I am opposed to making making superlative statements, and prefer we stick to factual ones.

Although I doubt we will get many responses here because this PR has SOOO much activity on it, and this comment thread will get lost.


The following table lists Kubeflow components that may be deployed in a standalone mode. It also
lists their associated GitHub repository and
corresponding [ML lifecycle stage](/docs/started/architecture/#kubeflow-components-in-the-ml-lifecycle).

<div class="table-responsive distributions-table">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss the ordering of this table.

My proposal is we go by popularity/stars:

  1. Kubeflow Pipelines
  2. Kubeflow Spark Operator
  3. Kubeflow Training Operator
  4. Kubeflow Katib
  5. Kubeflow MPI Operator
  6. Kubeflow Model Registry (we should wait until the first public release)
  7. KServe (should be last, because it's an external add-on)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the user value to order them by popularity and how are we going to track the components popularity in the future ?
E.g. from my point of view, ML Lifecycle makes more sense since order will always be the same, and we can link this table with Kubeflow ML Lifecycle: #3728.
If we don't like that, we can order them alphabetically.
@thesuperzapper @kubeflow/kubeflow-steering-committee @juliusvonkohout Thoughts ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifecycle or alphabetically is low maintenance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on alphabetical ordering - politically correct and low maintenance.

<table class="table table-bordered">
<thead>
<tr>
<th>Component</th>
<th>ML Lifecycle Stage</th>
<th>Source Code</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<a href="/docs/components/katib/installation/#installing-katib">
Kubeflow Katib
</a>
</td>
<td>
Model Optimization and AutoML
</td>
<td>
<a href="https://github.com/kubeflow/katib">
<code>kubeflow/katib</code>
</a>
</td>
</tr>
<tr>
<td>
<a href="https://kserve.github.io/website/master/admin/serverless/serverless">
KServe
</a>
</td>
<td>
Model Serving
</td>
<td>
<a href="https://github.com/kserve/kserve">
<code>kserve/kserve</code>
</a>
</td>
</tr>
andreyvelich marked this conversation as resolved.
Show resolved Hide resolved
<tr>
<td>
<a href="/docs/components/model-registry/installation/#installing-model-registry">
Kubeflow Model Registry
</a>
</td>
<td>
Model Registry
</td>
<td>
<a href="https://github.com/kubeflow/model-registry">
<code>kubeflow/model-registry</code>
</a>
</td>
</tr>
<tr>
<td>
<a href="/docs/components/training/user-guides/mpi/#installation">
Kubeflow MPI Operator
</a>
</td>
<td>
All-Reduce Model Training
</td>
<td>
<a href="https://github.com/kubeflow/mpi-operator">
<code>kubeflow/mpi-operator</code>
</a>
</td>
</tr>
<tr>
<td>
<a href="/docs/components/pipelines/v2/installation/quickstart/">
Kubeflow Pipelines
</a>
</td>
<td>
ML Workflows and Schedules
</td>
<td>
<a href="https://github.com/kubeflow/pipelines">
<code>kubeflow/pipelines</code>
</a>
</td>
</tr>
<tr>
<td>
<a href="https://github.com/kubeflow/spark-operator/tree/master?tab=readme-ov-file#installation">
Kubeflow Spark Operator
</a>
</td>
<td>
Data Preparation
</td>
<td>
<a href="https://github.com/kubeflow/spark-operator">
<code>kubeflow/spark-operator</code>
</a>
</td>
</tr>
<tr>
<td>
<a href="/docs/components/training/installation/#installing-training-operator">
Kubeflow Training Operator
</a>
</td>
<td>
Model Training and Fine-Tuning
</td>
<td>
<a href="https://github.com/kubeflow/training-operator">
<code>kubeflow/training-operator</code>
</a>
</td>
</tr>
</tbody>
</table>
</div>

**Note**. Currently, Kubeflow Notebooks can't be deployed as a standalone application, but Notebooks
WG is working on that as part of [this issue](https://github.com/kubeflow/kubeflow/issues/7549).
andreyvelich marked this conversation as resolved.
Show resolved Hide resolved
andreyvelich marked this conversation as resolved.
Show resolved Hide resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a H2 heading which separates the "Kubeflow Platform" section:

Suggested change
## Kubeflow Platform
When deployed as a platform, Kubeflow provides a comprehensive set of tools for the entire ML lifecycle.
The key difference from standalone components is the [Kubeflow Central Dashboard](/docs/components/central-dash/overview/), which provides a multi-user interface for the entire platform.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thesuperzapper Do you want to move Raw Manifest + Packaged Distribution installation under Kubeflow Platform sub-section ?
We don't need to explain what is Kubeflow Platform since it is already here: https://deploy-preview-3724--competent-brattain-de2d6d.netlify.app/docs/started/introduction/#what-is-kubeflow-platform-.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich we can mark this one as resolved, since we already did this in another commit.

However, we should still discuss what the "intro" text for this heading is, see https://github.com/kubeflow/website/pull/3724/files#r1608938065

## Install Kubeflow Platform

You can use one of the following methods to install Kubeflow Platform to get full suite of Kubeflow
components bundled together with additional integration and management tools.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use this text to explain what the differences are.

We can also highlight that Notebooks is only available as part of Kubeflow Platform.

Suggested change
## Install Kubeflow Platform
You can use one of the following methods to install Kubeflow Platform to get full suite of Kubeflow
components bundled together with additional integration and management tools.
## Kubeflow Platform
When deployed as a platform, Kubeflow provides a comprehensive set of tools for the entire ML lifecycle.
The core difference from the standalone components is the [Central Dashboard](/docs/components/central-dash/overview/),
a multi-user web interface and its associated [profiles](/docs/components/central-dash/profiles/) feature.
Furthermore, [Kubeflow Notebooks](/docs/components/notebooks/overview/) is _only available_ in platform mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@thesuperzapper thesuperzapper May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly believe we need to have a descriptive introduction under this heading, if we don't include the information about Notebooks here, we would need to include it under the standalone components section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly believe we need to have a descriptive introduction under this heading

Why we can't just redirect users to the introduction section and then to the components installation section if they want to understand the differences ?

we would need to include it under the standalone components section.

We are planing to add the Kubeflow Notebooks to the Kubeflow Component Standalone table, once it supports it.
I can modify this Note if @kubeflow/wg-notebooks-leads doesn't like it:

Note. Currently, Kubeflow Notebooks can’t be deployed as a standalone application,
but Notebooks WG is working on that as part of this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage, there are no plans to support a standalone Kubeflow Notebooks as I explained in #3724 (comment).

Furthermore, a high-level explanation to help users understand that Dashboard + Profiles + Notebooks are the key differentiators when using Kubeflow Platform is critical.

@kimwnasptd can confirm this, as we maintain the Dashboard and Notebooks.


Also don't want the "negative" idea about "not being standalone" rather I want the positive idea of "you get Notebooks with Kubeflow Platform". Which better reflects the reality of needing things like multi-user support for Notebooks to make sense in the first place.

Copy link
Member

@thesuperzapper thesuperzapper May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can link people to the introduction "What is Kubeflow Platform?" section, to help them understand the differences:

Suggested change
You can use one of the following methods to install Kubeflow Platform to get full suite of Kubeflow
components bundled together with additional integration and management tools.
You can use one of the following methods to install the Kubeflow Platform.
When deployed as a platform, Kubeflow provides a comprehensive set of tools for the entire ML lifecycle.
For more information about the platform, please see this [introduction](/docs/started/introduction/#what-is-kubeflow-platform).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can modify this as follows:

You can use one of the following methods to install [Kubeflow Platform](/docs/started/introduction/#what-is-kubeflow-platform)
to get full suite of standalone Kubeflow components bundled together with additional tools.

We already linked the introduction page at the beginning of this page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an active tense is better.

You can use one of the following methods to install the [Kubeflow Platform](/docs/started/introduction/#what-is-kubeflow-platform).
When deployed as a platform, Kubeflow provides a comprehensive set of tools for the entire ML lifecycle. 

Which is more similar to what you originally proposed anyway, just with a link to the introduction page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to say that all Standalone Kubeflow Components are included in the Kubeflow Platform.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich I think that is implied. Also, its explicitly said in the /docs/started/introduction/#what-is-kubeflow-platform link if the user clicks.

The key "selling point" of the platform is the When deployed as a platform, Kubeflow provides a comprehensive set of tools for the entire ML lifecycle..

Talking about standalone components in this heading detracts from this idea of using Kubeflow as a platform, I get that your interest is in Kubeflow not being a platform, but many users want a platform, and that's why we have two sections on this page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, can you rephrase this message to explain that all Standalone Kubeflow Components will be included into Kubeflow Platform ?


### From Packaged Distributions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### From Packaged Distributions
### Packaged Distributions


Packaged distributions are maintained by various organizations and typically aim to provide
a simplified installation and management experience for Kubeflow. Some distributions can be
deployed on [all certified Kubernetes distributions](https://kubernetes.io/partners/#conformance),
a simplified installation and management experience for your **Kubeflow Platform**. Some distributions
can be deployed on [all certified Kubernetes distributions](https://kubernetes.io/partners/#conformance),
while others target a specific platform (e.g. EKS or GKE).

{{% alert title="Note" color="warning" %}}
Expand Down Expand Up @@ -200,12 +335,16 @@ The following table lists distributions which are <em>maintained</em> by their r
</table>
</div>

## Raw Kubeflow Manifests
### From Raw Manifests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### From Raw Manifests
### Raw Manifests


The raw Kubeflow Manifests are aggregated by the [Manifests Working Group](https://github.com/kubeflow/community/tree/master/wg-manifests)
and are intended to be used as the **base of packaged distributions**.
The raw Kubeflow Manifests are aggregated by the Manifests Working Group and are intended to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The raw Kubeflow Manifests are aggregated by the Manifests Working Group and are intended to be
The Kustomize Kubeflow Manifests are aggregated by the Manifests Working Group and are intended to be

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about using the Raw Manifest term. I dont believe that we use it in the Manifest docs. Maybe just Manifests ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we chose "raw manifests" historically is because it highlights that they are not ready for production out of the box, and are not the "official" way to deploy Kubeflow (as required by the Manifests WG charter).

The core purpose of the manifests has always been to enable the creation of distributions that are more targeted to specific environments/users.

Obviously, some advanced users have chosen to effectively make "bespoke" distributions for their company, and that's fine but it's important for users to not think that the manifest are an "out of the box" experience.

Copy link
Member

@juliusvonkohout juliusvonkohout May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thesuperzapper I do not see how the name "Kustomize manifests" (since they are actually kustomize manifests) changes that. I did not add "official way to install" or something similar.

@jbottum I am also fine with just "manifests".

In the end Kustomize manifests is the proper technical term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliusvonkohout I really think we should discuss this in a separate PR, where we can make a decision on that branding specifically.

The website currently calls them "raw manifests" and adding more changes to this PR will only make it take longer to merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should call it "kustomize manifests" since that's implementation detail. ICYMI, kustomize now ships with kubectl as a sub-command.

I don't think this is blocking the PR. Maybe a separate discussion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just call them Kubeflow Manifests @juliusvonkohout @terrytangyuan @jbottum @thesuperzapper ?
E.g.

The Kubeflow Manifests are aggregated by the Manifests Working Group and are intended to be used as the
base of packaged distributions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich @terrytangyuan I am more ok with using Kubeflow Manifests, but we should do that change in a separate PR, to allow others to discuss.

This PR is already doing a lot, and is very hard to review for stakeholders.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view, the final decision will be up to the Manifest WG since they maintain this section.
I don't think renaming Raw Manifests to the Kubeflow Manifests is a big deal, tbh, but I am happy to hear other community members objections.

used as the **base of packaged distributions**.

Advanced users may choose to install the manifests for a specific Kubeflow version by following the
Kubeflow Manifests contain all Kubeflow Components, Kubeflow Central Dashboard, and other Kubeflow
applications which makes **Kubeflow Platform**. This installation is helpful when you want to try
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not valid to use "makes" in this context:

Suggested change
applications which makes **Kubeflow Platform**. This installation is helpful when you want to try
applications that comprise the **Kubeflow Platform**. This installation is helpful when you want to try

out the end-to-end Kubeflow Platform capabilities.

Users may choose to install the manifests for a specific Kubeflow version by following the
instructions in the `README` of the [`kubeflow/manifests`](https://github.com/kubeflow/manifests) repository.

- [**Kubeflow 1.8:**](/docs/releases/kubeflow-1.8/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- [**Kubeflow 1.8:**](/docs/releases/kubeflow-1.8/)
  - [`v1.8-branch`](https://github.com/kubeflow/manifests/tree/v1.8-branch#installation) <sup>(development branch)</sup>
  - [`v1.8.0`](https://github.com/kubeflow/manifests/tree/v1.8.0#installation)
- [**Kubeflow 1.9:**](/docs/releases/kubeflow-1.9/)
  - [`v1.9-branch`](https://github.com/kubeflow/manifests/tree/v1.9-branch#installation) <sup>(development branch)</sup>
  - [`v1.9.0`](https://github.com/kubeflow/manifests/tree/v1.9.0#installation)

I think we should either reference the master or 1.9 branch, but 1.7 is not supported anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliusvonkohout Can we update it once we release Kubeflow 1.9 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not link to unreleased versions.

Expand All @@ -224,9 +363,8 @@ If you need support, please consider using a [packaged distribution](#packaged-d
Nevertheless, we welcome contributions and bug reports very much.
{{% /alert %}}

<a id="next-steps"></a>
Copy link
Member

@juliusvonkohout juliusvonkohout May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{% alert title="Warning" color="warning" %}}
Kubeflow is a complex system with many components and dependencies.
Using the Kubeflow manifests requires some understanding of Kubernetes, Istio, and Kubeflow itself.
The Kubeflow community support for Kubeflow manifests is best-effort for environment-specific issues or custom configurations.
Nevertheless, we welcome contributions and bug reports very much.
{{% /alert %}}

@andreyvelich i was not able to make a direct code suggestion here so I am just pasting the full alert.


## Next steps

- Review the Kubeflow <a href="/docs/components/">component documentation</a>
- Explore the <a href="/docs/components/pipelines/sdk/">Kubeflow Pipelines SDK</a>
- Review our [introduction to Kubeflow](/docs/started/introduction/).
- Explore the [architecture of Kubeflow](/docs/started/architecture).
- Learn more about the [components of Kubeflow](/docs/components/).
72 changes: 43 additions & 29 deletions content/en/docs/started/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,44 @@ description = "An introduction to Kubeflow"
weight = 1
+++

The Kubeflow project is dedicated to making deployments of machine learning (ML)
workflows on Kubernetes simple, portable and scalable. Our goal is not to
recreate other services, but to provide a straightforward way to deploy
best-of-breed open-source systems for ML to diverse infrastructures. Anywhere
you are running Kubernetes, you should be able to run Kubeflow.
## What is Kubeflow ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly a typo, and it breaks the # anchor links:

Suggested change
## What is Kubeflow ?
## What is Kubeflow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed all question marks from headers, does it look better ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having question marks makes it easier to read.

But also, the issue is not the question marks, it was the space between them, which results in the anchor link having a - at the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed with @StefanoFioravanzo and @hbelmiro before that it is not necessary to add question marks to the doc headers.

Copy link
Member


Kubeflow is a community and ecosystem of open-source projects to address each stage in the
machine learning (ML) lifecycle. It makes ML on Kubernetes simple, portable, and scalable.
The goal of Kubeflow is to facilitate the orchestration of Kubernetes ML workloads and to empower
users to deploy best-in-class open-source systems to any Cloud infrastructure.
andreyvelich marked this conversation as resolved.
Show resolved Hide resolved
Whether you’re a researcher, data scientist, ML engineer, or a team of developers, Kubeflow offers
modular and scalable tools that cater to all aspects of the ML lifecycle: from building ML models to
deploying them to production for AI applications.

## What are Kubeflow Standalone Components?

Kubeflow is composed of multiple, independent open-source projects which address different aspects
of a ML lifecycle. These standalone components are designed to be usable both within the Kubeflow
Platform and independently. These components can be installed independently on a Kubernetes cluster,
providing flexibility to users who may not require the full capabilities of Kubeflow Platform but
wish to leverage specific ML functionalities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify this wording for clarity:

Suggested change
Kubeflow is composed of multiple, independent open-source projects which address different aspects
of a ML lifecycle. These standalone components are designed to be usable both within the Kubeflow
Platform and independently. These components can be installed independently on a Kubernetes cluster,
providing flexibility to users who may not require the full capabilities of Kubeflow Platform but
wish to leverage specific ML functionalities.
Kubeflow is composed of multiple open-source projects that address different aspects
of the ML lifecycle. Most components may be used both within the Kubeflow Platform or independently.
These components can be installed standalone on a Kubernetes cluster, providing flexibility to users
who may not require the full Kubeflow Platform but wish to leverage specific ML functionalities.

Copy link
Member Author

@andreyvelich andreyvelich May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most components may be used both within the Kubeflow Platform or independently.

Why do we need this sentence ? In this paragraph we are clearly speaking only about Kubeflow Standalone Components.

I added other suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, the first sentence is talking about "all" components, including the ones that can't be used standalone, like Dashboard and Notebooks, and the following sentences are trying to introduce the concept of "standalone" components.

How about this alternative:

Kubeflow is composed of multiple open-source projects that address different aspects of the ML lifecycle. 
Many of these components can be installed standalone on a Kubernetes cluster without the rest of the Kubeflow platform. 
This provides flexibility to users who may not require the full platform but wish to leverage specific ML functionalities.

Copy link
Member Author

@andreyvelich andreyvelich May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about this @thesuperzapper @StefanoFioravanzo @hbelmiro ?

Kubeflow ecosystem is composed of multiple open-source projects that address different aspects of
the ML lifecycle. Many of these projects are designed to be usable both within the
Kubeflow Platform and independently.

These Kubeflow components can be installed standalone on a Kubernetes cluster. It provides
flexibility to users who may not require the full Kubeflow Platform capabilities but wish to
leverage specific ML functionalities such as model training or model serving.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich if we change "Kubeflow ecosystem" to "The Kubeflow ecosystem", and remove the unnecessary paragraph break, that proposal is fine.


## What is Kubeflow Platform ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly a typo, and it breaks # anchor links:

Suggested change
## What is Kubeflow Platform ?
## What is Kubeflow Platform?


The Kubeflow Platform refers to the full suite of Kubeflow components bundled together with
additional integration and management tools. Installing Kubeflow as a platform means deploying a
comprehensive ML toolkit that integrates these components into a cohesive system, optimized for
managing the end-to-end ML lifecycle. This includes the standalone components coupled with these
integrations and management tools:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify this wording for clarity:

Suggested change
The Kubeflow Platform refers to the full suite of Kubeflow components bundled together with
additional integration and management tools. Installing Kubeflow as a platform means deploying a
comprehensive ML toolkit that integrates these components into a cohesive system, optimized for
managing the end-to-end ML lifecycle. This includes the standalone components coupled with these
integrations and management tools:
The Kubeflow Platform refers to the full suite of Kubeflow components bundled together with
additional integration and management tools. Using Kubeflow as a platform means deploying a
comprehensive ML toolkit for the entire ML lifecycle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this.


- Central Dashboard for easy navigation and management.
- Multi-user capabilities and access management.
- Additional tooling and services for data management, visualization, and more.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not mentioning that we also deploy integrated third-party components such as KServe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StefanoFioravanzo we discussed with @kubeflow/kubeflow-steering-committee that we want to include KServe in the components table in the installation guide given the history behind KServe/KFServing. Other contrib components don't have the same history.
Also, we keep KServe as part of Core Kubeflow Components in our docs.

If you want to say that users can deploy other contrib components in Kubeflow Platform like Ray, BentoML, Seldon, I can add it here.
FYI, we don't include these contrib components in the example installation in Raw Manifests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Let's keep the list as-is then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we link to the components which we are talking about, so users can read more about them:

Suggested change
- Central Dashboard for easy navigation and management.
- Multi-user capabilities and access management.
- Additional tooling and services for data management, visualization, and more.
In addition to the standalone components, the Kubeflow Platform includes:
- [__Central Dashboard__](/docs/components/central-dash/overview/) for easy navigation and management, with [profiles](/docs/components/central-dash/profiles/) for access control.
- [__Kubeflow Notebooks__](/docs/components/notebooks/overview/) for interactive data exploration and model development.
- Additional tooling for data management (PVC Viewer), visualization (TensorBoards), and more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it, please take a look. Do we need to include Kubeflow Notebooks here as well @kubeflow/wg-notebooks-leads ?

Copy link
Member

@thesuperzapper thesuperzapper May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich we definitely need to include notebooks, it's one of the main reasons people use the "Kubeflow Platform" and is not available standalone.

Also, it is cleaner to combine the profiles/dashboard into one bullet once we have all three:

- [__Central Dashboard__](/docs/components/central-dash/overview/) for easy navigation and management, with [profiles](/docs/components/central-dash/profiles/) for access control.
- [__Kubeflow Notebooks__](/docs/components/notebooks/overview/) for interactive data exploration and model development.
- Additional tooling for data management (PVC Viewer), visualization (TensorBoards), and more.

PS: don't put extra newlines between bullet points, it makes the formatting weird.


This integrated environment ensures that all the different pieces work together seamlessly,
providing a more robust and streamlined user experience.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this extra paragraph, or if we keep it, it needs to be less "flowery":

Suggested change
This integrated environment ensures that all the different pieces work together seamlessly,
providing a more robust and streamlined user experience.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, removed for now.


Kubeflow Platform can be installed via
[Packaged Distributions](/docs/started/installing-kubeflow/#install-kubeflow-platform-from-packaged-distributions) or
[Raw Manifests](/docs/started/installing-kubeflow/#install-kubeflow-platform-from-raw-manifests).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rename the headings on the other page, the links need to change:

Suggested change
Kubeflow Platform can be installed via
[Packaged Distributions](/docs/started/installing-kubeflow/#install-kubeflow-platform-from-packaged-distributions) or
[Raw Manifests](/docs/started/installing-kubeflow/#install-kubeflow-platform-from-raw-manifests).
The Kubeflow Platform can be installed via
[Packaged Distributions](/docs/started/installing-kubeflow/#packaged-distributions) or
[Raw Manifests](/docs/started/installing-kubeflow/#raw-manifests).


## Getting started with Kubeflow

The following diagram shows the main Kubeflow components to cover each step of ML lifecycle
on top of Kubernetes.
Expand All @@ -17,8 +50,6 @@ on top of Kubernetes.
alt="Kubeflow overview"
class="mt-3 mb-3">

## Getting started with Kubeflow

Read the [architecture overview](/docs/started/architecture/) for an
introduction to the architecture of Kubeflow and to see how you can use Kubeflow
to manage your ML workflow.
Expand All @@ -30,28 +61,6 @@ Watch the following video which provides an introduction to Kubeflow.

{{< youtube id="cTZArDgbIWw" title="Introduction to Kubeflow">}}

## What is Kubeflow?

Kubeflow is _the machine learning toolkit for Kubernetes_.

To use Kubeflow, the basic workflow is:

- Download and run the Kubeflow deployment binary.
- Customize the resulting configuration files.
- Run the specified script to deploy your containers to your specific
environment.

You can adapt the configuration to choose the platforms and services that you
want to use for each stage of the ML workflow:

1. data preparation
2. model training,
3. prediction serving
4. service management

You can choose to deploy your Kubernetes workloads locally, on-premises, or to
a cloud environment.

## The Kubeflow mission

Our goal is to make scaling machine learning (ML) models and deploying them to
Expand Down Expand Up @@ -94,3 +103,8 @@ The following components also have roadmaps:
There are many ways to contribute to Kubeflow, and we welcome contributions!

Read the [contributor's guide](/docs/about/contributing/) to get started on the code, and learn about the community on the [community page](/docs/about/community/).

## Next Steps

- Follow [the installation guide](/docs/started/installing-kubeflow) to deploy Kubeflow standalone
components or Kubeflow Platform.