Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rootless Kubeflow #2528

Open
Tracked by #2763
juliusvonkohout opened this issue Sep 13, 2023 · 10 comments · Fixed by #2819, #2836, #2757, #2768 or #2787
Open
Tracked by #2763

Rootless Kubeflow #2528

juliusvonkohout opened this issue Sep 13, 2023 · 10 comments · Fixed by #2819, #2836, #2757, #2768 or #2787
Assignees
Labels
help wanted Extra attention is needed lifecycle/frozen

Comments

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Sep 13, 2023

Rootless Kubeflow

This is an issue to track the progress of the following proposal https://github.com/kubeflow/manifests/blob/master/proposals/20200913-rootlessKubeflow.md.

Goals

We want to run Kubeflow as rootless as possible according to CNCF/Kubernetes best practices.
Most enterprise environments will require this as well.

Implementation details

The main steps are adding an additional profile for istio-cni and later ambient mesh, updating the documentation and manifest generation process.
Only istio-cni or istio ambient mesh can run rootless as explained here https://istio.io/latest/docs/setup/additional-setup/cni/.
Istio-cni will still nedd a deamonset in kube-system, but that is completly isolated from user workloads.
The ambient mesh should get rid of this as well and also has the benefit of removing the istio initcontainers and sidecars altogether.
Then adding the baseline and restricted PSS as kustomize component to /contrib and extending the profile controller to annotate user namespaces with configurable PSS labels.

We want to use a staged approach.

First Stage:

  1. Implement Istio 1.17.5 and use it by default, because 1.17. is what we have planned to use for Kubeflow 1.8.
  2. Implement istio-cni (--set components.cni.enabled=true --set components.cni.namespace=kube-system) as second option.
  3. Add simple tests similar to tests/gh-actions/install_istio.sh and tests/gh-actions/install_knative.sh for istio-cni and support both rootfull and rootless istio at the same time and give users one release to test

Second stage:
4. Add pod security standards (https://kubernetes.io/docs/concepts/security/pod-security-standards/) base/restricted to manifests/contrib
5. Enforce PSS baseline (here you can still build OCI containers via Podman and buildah). It works with any istio
6. Enable Warnings for violations of restricted PSS
7. Add tests to make sure that the PSS are used and tested in the CICD
8. Optionally Enforce PSS restricted (this is where minor corner cases are affected)

Third stage:
9. Upgrade Istio to 1.19 to make the ambient mesh available
10. Add istio-ambient as an option to the next Kubeflow release.

fourth stage:
11. Use the ambient service mesh by default in Kubeflow 1.10.

Non-Goals

This does not cover Application level CVEs, only cluster level security.

Does this break any existing functionality?

So far not. Only PSS restricted may block the incredibly dangerous and unprofessional Docker in Docker.
This is a rarely used feature from the KFP SDK.
With PSS baseline you can still build OCI images with Podman for example.
We should replace Docker with the cli compatible podman in the KFP SDK https://kubeflow-pipelines.readthedocs.io/en/1.8.22/source/kfp.containers.html?highlight=kfp.containers.build_image_from_working_dir#kfp.containers.build_image_from_working_dir.

Does this fix/solve any outstanding issues?

We are not following best practices and this is forbidden in most enterprise environments.
The progress is tracked in #2528

@juliusvonkohout
Copy link
Member Author

The first stage has been implemented in #2455

@juliusvonkohout
Copy link
Member Author

superseedes #2014

@kimwnasptd
Copy link
Member

Adding a comment after our live discussion with @juliusvonkohout . Let's confirm we can't run baseline PSS with the default Istio, to further evaluate how/if to enable Istio CNI by default for 1.9.

@juliusvonkohout juliusvonkohout added the help wanted Extra attention is needed label Jan 11, 2024
@peterj
Copy link

peterj commented Feb 13, 2024

I have experience with Istio/Ambient mesh and would love to help out if kubeflow is planning to adopt ambient in the future.

@juliusvonkohout
Copy link
Member Author

@peterj please reach out on the Kubeflow Slack or Linkedin.

@diegolovison
Copy link
Contributor

Should we have also a definition of done for this issue being able to deploy using kind and podman?

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented May 13, 2024

Istio 1.22 #2714 is related

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented May 16, 2024

@biswajit-9776 is working on this issue within GSOC

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@juliusvonkohout
Copy link
Member Author

/lifecycle frozen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment