Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade guide #2216

Closed
rustam-ashurov-mcx opened this issue May 27, 2022 · 14 comments · Fixed by #2717
Closed

Upgrade guide #2216

rustam-ashurov-mcx opened this issue May 27, 2022 · 14 comments · Fixed by #2717
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@rustam-ashurov-mcx
Copy link

Hello,

An upgrade guide would be really useful for us It would not only benefit as documentation for someone who needs guidance and lacks understanding of how to actually upgrade the Kubeflow cluster saving existing data, but also a resource for teams who just decide to try Kubeflow or propose it to be used, and needs to understand the complexity of such maintenance in the future, possible time investments to update existing cluster to a new version, risks related to broken cluster and ways to minimize such risks or have a plan B in case of error

Many thanks

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Aug 24, 2023

This is currently the task of the distributions and professional consulting

@pgpx
Copy link

pgpx commented Aug 24, 2023

That doesn't really fit with the Kubeflow project description though:

Kubeflow is an open, community driven project to make it easy to deploy and manage an ML stack on Kubernetes

@rustam-ashurov-mcx
Copy link
Author

This is the task of the distributions and professional consulting

So, there is no reason to upgrade Kubeflow installation without professional consulting? Sounds strange. Kubeflow has its default distribution (no AWS, GCP, Azure-oriented installation) that works on-premises or locally. Why not at least have a guide on how to update such a setup?

The original post is from 2022, and fortunately, after many mistakes, I was able to find some balanced way to update my clusters and deal with typical issues and data loss. Once again, every new version has its own changes and no one knows better on how to upgrade Kubeflow and what issues to expect, so I'm still sure such a guide is required as for any open source project (even tho I personally don't need it too much anymore, at least until the next upgrade will break something unusual)

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Aug 25, 2023

@pgpx @rustam-ashurov-mcx

"So, there is no reason to upgrade Kubeflow installation without professional consulting? Sounds strange. Kubeflow has its default distribution (no AWS, GCP, Azure-oriented installation) that works on-premises or locally. Why not at least have a guide on how to update such a setup?"

You can do the upgrade on your own, no one is stopping you from doing that. Open source is free speech, not free beer. Users are looking for a way to do in place upgrades. I have some experience with such upgrades and it is possible, but there is no official way, especially no guarantee so far. So it is possible with advanced knowledge, but neither documented nor supported.
The same goes for uninstalling.

Nevertheless we are always looking for new volunteers to improve that situation. If you want to help with the documentation, i am offering free mentoring for new Kubeflow contributors. Please reach out on Slack as well if you are interested on actually improving the situation.

Maybe if the subscribers from #2177 join as well, we will actually get someone willing to spend his personal time on improving the documentation.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Aug 25, 2023

#2299 is also related. Let us focus our efforts here.

@juliusvonkohout
Copy link
Member

#2323 is also related. @jbottum since you are the community manager, can you step in here? There are too many duplicated issues regarding that topic.

@N214
Copy link

N214 commented Aug 29, 2023

Well, it seems that people are all facing the same issue.
If we have available resource, I'm willing to volunteer my time to document the upgrading process.

@juliusvonkohout
Copy link
Member

Well, it seems that people are all facing the same issue. If we have available resource, I'm willing to volunteer my time to document the upgrading process.

then please join the manifests WG meeting to discuss it.

@thesuperzapper
Copy link
Member

thesuperzapper commented Sep 1, 2023

I recommend checking out deployKF, as this kind of issue is going to be very difficult to fix upstream. This is because raw Kubeflow is distributed as a YAML manifests, so any customizations you make would have to be manually reapplied to new versions.

deployKF on the other hand uses a values config file (similar to helm), and the structure of these configs is forwards and backwards compatible, meaning that any customizations you make (for example, to connect your external identity provider or S3 bucket), won't have to be redone for new versions.

Similarly, if a resource is removed or renamed between versions, because deployKF uses ArgoCD Applications, you will be able to easily clean up resources that need to be "pruned" because they are no longer part of the new version.

Finally, uninstalling is as simple as just deleting those Argo CD applications (which will delete their child resources).

I'm interested to hear your feedback!

https://www.deploykf.org/

@pgpx
Copy link

pgpx commented Sep 1, 2023

@thesuperzapper thanks - that looks really interesting. Is the idea that this will support in-place upgrades to subsequent Kubeflow versions as well, or will it just simplify creating configuration for a new cluster with the new Kubeflow version?

@thesuperzapper
Copy link
Member

@thesuperzapper thanks - that looks really interesting. Is the idea that this will support in-place upgrades to subsequent Kubeflow versions as well, or will it just simplify creating configuration for a new cluster with the new Kubeflow version?

@pgpx the idea is that you upgrade in-place.

Upgrading to a new version of deployKF has a very small number of steps:

  1. Take your existing values file, and use it with a new version of deployKF:
    • How to do this will depend on which "mode" you use deployKF in:
      • For "ArgoCD Plugin" mode, this only requires updating the definition of your "app-of-apps application"
      • For "Manifests Repo" mode, you use the our CLI to render and commit the updated manifest to your repo
    • NOTE: This will update the "desired" state of the ArgoCD applications, possibly adding/removing/patching resources in each app, but you still have to "sync" the changes in ArgoCD before they are applied.
  2. Use ArgoCD to "sync" the new state of the the applications
  3. If needed, rolling back an application should be as simple as either:
    1. Using ArgoCD's built-in rollback feature, or
    2. Actually reverting the "desired" state back, like in step 1

Note, there are many behind-the-scenes things happening in deployKF to make in-place upgrades possible. For example, we automatically restart Pods when configs that affect them are changed.

A specific example is that when a secret is updated, any Pod that mounts it as an environment variable is restarted. We do this with Kyverno Policies that add annotations to the PodSpec. For example, here is one that updates the shared client secrets for dex OAUTH.

@juliusvonkohout juliusvonkohout added help wanted Extra attention is needed good first issue Good for newcomers labels Feb 29, 2024
@juliusvonkohout
Copy link
Member

@diegolovison I think we are now at a state where we can probably describe the rough upgrade guide in 5-10 bullet points or so. It is definitely possible because most people I know just use kustomize overlays and components on top of the manifests, such that they do not have to start from scratch. Modyfing the manifests directly is very bad. They just modify the example.yaml file from here example/kustomization.yaml instead . Pruning is a different question but also there you can use kubectl --prune and even with a dry run to just list prunable resources. For advanced user that should really be doable.

This was referenced May 16, 2024
@juliusvonkohout
Copy link
Member

Will be closed when #2717 is merged.

@juliusvonkohout juliusvonkohout linked a pull request May 21, 2024 that will close this issue
@juliusvonkohout
Copy link
Member

Closed by #2717

Pease open a new issue, if you need refinements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants