-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade guide #2216
Comments
This is currently the task of the distributions and professional consulting |
That doesn't really fit with the Kubeflow project description though:
|
So, there is no reason to upgrade Kubeflow installation without professional consulting? Sounds strange. Kubeflow has its default distribution (no AWS, GCP, Azure-oriented installation) that works on-premises or locally. Why not at least have a guide on how to update such a setup? The original post is from 2022, and fortunately, after many mistakes, I was able to find some balanced way to update my clusters and deal with typical issues and data loss. Once again, every new version has its own changes and no one knows better on how to upgrade Kubeflow and what issues to expect, so I'm still sure such a guide is required as for any open source project (even tho I personally don't need it too much anymore, at least until the next upgrade will break something unusual) |
"So, there is no reason to upgrade Kubeflow installation without professional consulting? Sounds strange. Kubeflow has its default distribution (no AWS, GCP, Azure-oriented installation) that works on-premises or locally. Why not at least have a guide on how to update such a setup?" You can do the upgrade on your own, no one is stopping you from doing that. Open source is free speech, not free beer. Users are looking for a way to do in place upgrades. I have some experience with such upgrades and it is possible, but there is no official way, especially no guarantee so far. So it is possible with advanced knowledge, but neither documented nor supported. Nevertheless we are always looking for new volunteers to improve that situation. If you want to help with the documentation, i am offering free mentoring for new Kubeflow contributors. Please reach out on Slack as well if you are interested on actually improving the situation. Maybe if the subscribers from #2177 join as well, we will actually get someone willing to spend his personal time on improving the documentation. |
#2299 is also related. Let us focus our efforts here. |
Well, it seems that people are all facing the same issue. |
then please join the manifests WG meeting to discuss it. |
I recommend checking out deployKF, as this kind of issue is going to be very difficult to fix upstream. This is because raw Kubeflow is distributed as a YAML manifests, so any customizations you make would have to be manually reapplied to new versions. deployKF on the other hand uses a values config file (similar to helm), and the structure of these configs is forwards and backwards compatible, meaning that any customizations you make (for example, to connect your external identity provider or S3 bucket), won't have to be redone for new versions. Similarly, if a resource is removed or renamed between versions, because deployKF uses ArgoCD Applications, you will be able to easily clean up resources that need to be "pruned" because they are no longer part of the new version. Finally, uninstalling is as simple as just deleting those Argo CD applications (which will delete their child resources). I'm interested to hear your feedback! |
@thesuperzapper thanks - that looks really interesting. Is the idea that this will support in-place upgrades to subsequent Kubeflow versions as well, or will it just simplify creating configuration for a new cluster with the new Kubeflow version? |
@pgpx the idea is that you upgrade in-place. Upgrading to a new version of deployKF has a very small number of steps:
Note, there are many behind-the-scenes things happening in deployKF to make in-place upgrades possible. For example, we automatically restart Pods when configs that affect them are changed. A specific example is that when a secret is updated, any Pod that mounts it as an environment variable is restarted. We do this with Kyverno Policies that add annotations to the PodSpec. For example, here is one that updates the shared client secrets for dex OAUTH. |
@diegolovison I think we are now at a state where we can probably describe the rough upgrade guide in 5-10 bullet points or so. It is definitely possible because most people I know just use kustomize overlays and components on top of the manifests, such that they do not have to start from scratch. Modyfing the manifests directly is very bad. They just modify the example.yaml file from here example/kustomization.yaml instead . Pruning is a different question but also there you can use kubectl --prune and even with a dry run to just list prunable resources. For advanced user that should really be doable. |
Will be closed when #2717 is merged. |
Closed by #2717 Pease open a new issue, if you need refinements. |
Hello,
An upgrade guide would be really useful for us It would not only benefit as documentation for someone who needs guidance and lacks understanding of how to actually upgrade the Kubeflow cluster saving existing data, but also a resource for teams who just decide to try Kubeflow or propose it to be used, and needs to understand the complexity of such maintenance in the future, possible time investments to update existing cluster to a new version, risks related to broken cluster and ways to minimize such risks or have a plan B in case of error
Many thanks
The text was updated successfully, but these errors were encountered: