-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create deploy job #673
Create deploy job #673
Conversation
/assign @jlewi |
bootstrap/Dockerfile
Outdated
|
||
ARG github_token | ||
|
||
RUN export GITHUB_TOKEN=${github_token} && cd / && ks init kubeflow && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably pin to a version of Kubeflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should pin in release - like in kubeless (example below) - either way works for me
bootstrap/deploy_job.yaml
Outdated
@@ -0,0 +1,31 @@ | |||
# Deploy kubeflow to namespace "kubeflow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a job?
Why isn't kubectl run sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubectl run would require person to deal with problems that this wont have:
- GITHUB_TOKEN - first hurdle that's hard to debug
- ksonnet -> k8s configuration - it's not trivial to run ksonnet from within kubernetes
- multiple commands, first kubectl run <> bash -> ks generate, ks apply (again, figure out how to run ks in cluster)
vs
kubectl create -f https://raw.githubusercontent.com/kubeflow/kubeflow/master/bootstrap/deploy_job.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you misunderstand me. All a K8s job is doing is running a docker container. So everything you are doing with a K8s job you could probably do just by doing kubectl run.
Either way you end up running the same container. So if the container has the Kubeflow registry baked in then they don't need to worry about GITHUB_TOKEN.
All your deploy_job is doing is running a shell script so the same thing could be done via kubectl run.
The reason I like kubectl run as opposed to
kubectl create -f ....
is because I think a YAML file is an invitation to end up back where we started. i.e. with a YAML file its tempting to keep adding environment variables to expose more parameters options. Which eventually leads to us picking an appropriate template and packing solution and we're back where we started.
If we do kubectl run
and/or docker run
then I think its more likely that we will keep it simple; i.e. restrict the number of parameters to the point its easy to type out the command line.
For anything more complex, I think the solution should be 'here's the ksonnet app created by the bootstraper' now you can customize it a hundred different ways.
So instead of having the bootstrapper expose more parameters to the user (e.g. via a config map).
I would like to make the bootstrapper
- better at picking good parameters for the app based on the user's setup
- making the resulting ksonnet app available to the user (e.g. by pushing to git or a shared filesystem)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is difference in kubectl run vs job - kubectl run is defined by user 100% and job is defined by us - therefore easier.
It won't invalidate kubectl run path - but it will provide very quick, no sweat, way to deploy default environment - perfect for quickstart. kubectl run will require few non-trivial steps.
For kubectl run it's connecting ks with underlying cluster
For docker run it's 1. having docker locally and 2. Exposing kubernetes creds to docker.
Job definition is not replacement of your bootstrapper approach, it's alternative or improvement.
If we run bootstraper to do what it's meant to be doing - determining params for deploy, then we can put these params into configmap or something and run this job with it, still less clicks than kubectl run, ks init, ... especially if you don't want to have ksonnet repos baked in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My contention is there is no good default environment.
The default environment is dependent on the cluster. This the lesson of trying to set the PV for JupyterHub. Not all clusters have default storage class.
So the point is to run the bootstrapper to determine what the optimal defaults should be.
Please see my other comment. If you want to create a job/statefulset to automatically deploy a cluster I'm fine with it but subject to the following
- The job should emit the ksonnet app to allow advanced customization
- The ksonnet app should be accessible after the job finishes
- The ksonnet app should be created by running the bootstraper.
bootstrap/deploy_job.yaml
Outdated
- name: kubeflow-deploy | ||
image: gcr.io/kubeflow-images-staging/bootstraper:latest | ||
env: | ||
- name: PVC_MOUNT_PATH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just a layer of indirection? e.g. you're taking the parameters exposed by ksonnet and just turning them into environment variables? Why not just tell users to run ks param set if they want to set it.
The point of the bootstraper is to solve the problem of Kubeflow exposes X parameters how do I set those parameters optimally for my deployment?
Forcing users to manually select those parameters (whether as environment variables or ks parameters) doesn't solve that problem. All this is doing is telling users that instead of doing ks param set
they need to edit key-value paris in a YAML file.
The bootstrapper is trying to do something fundamentally different to create a different experience. In particular, the boot strapper figures out based on the users setup how to set different values.
The current bootstrapper has a pretty simplistic example. The bootstrapper will check if the user has PVC and depending on the result set the value of jupyterNotebookPVCMount accordingly.
So setting jupyterNotebookPVCMount should be unnecessary except in the case the user wants to override it for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are few options how to approach that. We can create set of ENV like here, easiest. Better way would be to create configmap + shim layer to translate optiosn from configmap to params. Best way would be to have ksonnet app definition with all the configurations available (can be created by bootstrapper) and then consume it in job, so user can do sth like that:
- bootstrapper - create template ksonnet app in app.yaml
- edit app.yaml to your taste
- kubectl create -f app.yaml
- kubectl create -f https://raw.githubusercontent.com/kubeflow/kubeflow/master/bootstrap/deploy_job.yaml
And, in case you just want to roll with defaults, only step 4 is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need another shim layer?
What is the long term plan? Would there be a 1:1 mapping from environment variables to ksonnet parameters?
Is the fact that the environment variable name is different from the ksonnet parameter intentional? Or is it just an oversight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oversight/trying to make it look like ENV variable rather than camel-cased one.
envs is just one way to pass params to deployment and I'm by no means saying it's best one. I think best one would be bootstraper-driven CRD or configmap.
End goal I think would be something like this:
run boostraper locally or in container (since it's golang app it can totally be just wget->run).
bootstraper will create CRD or configmap or any other k8s resource definition (ksonnet/app?) that will contain all the params it figures out are best - I'd say it should output editable file or something to allow operator modify them prior to running deployment.
operator runs deploy_job which will look for bootstraper-driven configmap and deploy accordingly.
Now if we also add safe defaults to deploy_job, bootstraper steps are optional for super-quickstart
bootstrap/Dockerfile
Outdated
|
||
RUN export GITHUB_TOKEN=${github_token} && cd / && ks init kubeflow && \ | ||
cd kubeflow && \ | ||
ks registry add kubeflow github.com/kubeflow/kubeflow/tree/master/kubeflow && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should bake the ksonnet app dir into the container. We ultimately want to expose the app to users so they can do more advanced customization and check into source.
I think a better solution is that ksonnet 0.10 will support using a file location as a repository. Should we just wait for ksonnet 0.10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to eliminate the need for people to manually install additional packages as much as possible. For that reason I'm also going to push back on "why don't we just tell them to set ksonnet variables".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, right now GITHUB_TOKEN is huge hurdle I think. Also makes it virtually impossible to deploy kubeflow on non-internet-connected env (finantial, military...) - we might not have these users yet but we might in the future.
Pulling stuff on runtime is long and it's antipattern really. I think if we can bake static files into container, we should. If user wants to create their own customized image they can always FROM this one and only override this one dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not suggesting we don't bake the Kubeflow registry into the container; I'm just suggesting that doing it by creating a ksonnet app may not be the best approach.
@nickchase We are providing them a docker container that contains ksonnet so regardless of whether they run ks param set or you use environment variables they don't have to install any additional tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inc0 @nickchase regarding ""why don't we just tell them to set ksonnet variables"
My goal with the bootstrapper is to create a different experience than what we have today.
Specifically, right now a user has 12 different parameters for Kubeflow and this will only grow.
I think the big problem for users is "What is the correct value for these parameters".
I agree there's friction over ksonnet and we should fix that and I like a lot of what this PR does in that regard (e.g. by eliminating GITHUB_TOKEN and baking in the registry).
But whether a user sets 12 parameters via ks param set
or setting 12 different environment variables
that's syntactic sugar (perhaps ks param should just support taking a file) and absent that maybe we should write a shell script to do that.
But for the bootstrapper the problem I'd like to solve is trying figure out what are the correct values for the parameters automatically based on the user's setup so the user doesn't have to think about it.
If we're going to get info from the user I think it should be of a fundamentally different kind (from what the ksonnet parameters currently expose) so that we create a fundamentally different experience from what the user gets by calling ks param set
.
For example, instead of just turning the ksonnet parameters into environment variables we should provide a wizard that ask a series of questions
- Are you working in a team or single user?
- How big is your data?
- Do you use deep learning?
The bootstraper can then tune the ksonnet parameters based on the user input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it would be terrific to have a wizard that asks questions rather than having people manually tune parameters, but in the meantime I think it's even BETTER to have something that can install Kubeflow with a single command -- at least in the short run. Ideally, you could install it with a single command, and THEN we can add the ability to tune things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the data scientist end user, grad student in a lab, or ML user with little dev ops experience, simplicity is key and avoiding jargon related to k8 and systems. These users need a solution that: a) let's them get to work with ML tools quickly, b) does not place an upfront burden on them to make decisions or settings to get a running environment, c) allows for additional configuration after a while or alternatively an easy way to spin up a new system tailored to their growing needs and familiarity with the systems that run beneath the ML tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willingc I agree and that was basis for my environment proposal - kubeflow/community#62 - Operator creates environments for students from common template and just gives namespace for them
@jlewi I agree we shouldn't do ks param set like this. It's a quick PoC and we should solve it asap. Look at my comment above for one potential solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
bootstrap/deploy_job.yaml
Outdated
|
||
# TODO(inc0): Create rbac role bindings | ||
--- | ||
apiVersion: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running in cluster as a job seems like it creates RBAC issues which is one of the current problems I'd like the bootstrapper to deal with.
https://github.com/kubeflow/kubeflow/blob/master/user_guide.md#rbac-clusters
Deploying Kubeflow requires high level permissions e.g. because you need to create things like RBAC roles. You also might want to create keys to use as secrets for GCP/S3.
So if you run on cluster asynchronously then you're forcing users to go through the hurdle of creating appropriate service accounts and provisioning them; which adds the kind of complexity we're trying to avoid.
The bootstraper aims to avoid this by being designed to run interactively. If it runs interactively then the bootstraper can use the user's credentials. This allows us to use the end user's credentials to boot strap the cluster.
For example, in the current execution mode the bootstrapper runs locally in docker on the user's computer so we can have the user go through whatever login flow is neccessary to get a credential and then use that. So we can use the user's credential to create cluster roles, Cloud secrets etc.
We can eventually allow this to be run on cluster e.g. via kubectl exec -ti but we need to be careful about how we handle the user credential because we wouldn't want to store the user credential in a pod unless the pod is properly locked down.
One solution is to make it a web app. In that case the refresh token can be stored client side and the web app can just get short lived access tokens to forward on to various services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current pattern is to create RBAC bindings in bootstrap. Look at this: https://github.com/kubeless/kubeless/releases/download/v0.6.0/kubeless-v0.6.0.yaml
I think kubeless is good example of what I'd like to see - they specify required ClusterRoles and make it work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inc0 @jlewi We've been doing a lot of prototyping on separating bootstrapper into specific phases:
- create cluster level resources: clusterroles, clusterbindings, namespace, pod security contexts, service accounts, PVs. In short, isolate ksonnet libs into those which require cluster-admin equivalent RBAC roles.
- create a single-signon that uses IAP or other authentication providers. This auth provider will use web-token authentication (we've experimented with oauth/openid tokens as well) and will likely be based on or use https://github.com/appscode/guard.
- define RBAC roles based on the provider. For example roles for 'data scientist' users, leads, etc. One thing guard does is import github users in an organization including teams. Depending on the auth provider, a RBAC mapping of teams could be automated so that data scientists have a 'zero-admin' experience.
We see bootstrapper as having to address 2 concerns: 'cloud-providers' and 'authentication providers'. Each are distinct and could be bundled as ksonnet libs similarly to IAP. Having a framework in bootstrapper that can mix and match cloud providers with auth providers is how we've been thinking about this problem. Part of the difficulty we've been having is under GKE it's not straightforward to update an api-server with the equivalent of boot flags like --authentication-token-webhook-config-file or --oidc-issuer-url. We've done a fair amount of prototyping using kops, dex, and guard. @jlewi it looks like currently there is an option to use cloud endpoints and ESP to delegate to a provider that does IAP under GKE. We've gone down this path as well to use other providers. We're working on a document you both can review (and others) but wanted to have enough concrete prototyping done to have answers for questions. Is there a particular preference for how the bootstrapper is implemented? It seems like the go implementation has the most potential to get at features not normally exposed in CLI tools like ks, gcloud, kubectl, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also another problem is on prem which can be any mixture of this. I'm not sure if we should go as far as try to autodetect this. I's going to be really tricky as every on prem is different.
RBAC is slightly different problem tho (imho). One is managing users (I'm user X) and another is managing roles (user X is allowed to do Y). 1st is external to k8s (but also we shouldn't solve this ourselves, there are projects doing this), another is just k8s - RBAC.
In any case I don't think we should discuss it here, it's broader discussion that we need to have and will affect much more (auth to jupyterhub?). I'd just say we that we can for now create namespace and rolebindings to this namespace, which user gets there roles - let's discuss that separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kkasravi I agree with you there are multiple phases. My current thinking is
- Cluster prep
- tune ksonnet parameters
- deploy
What's the point of dividing manifests into ones requiring cluster admin and ones that don't? My expectation is that whoever is deploying the app has appropriate privileges to deploy Kubeflow in its entirety.
Why is creating RBAC roles a separate phase? RBAC roles for datascientists/users/ etc... should be defined in your ksonnet app and deployed when you deploy Kubeflow.
I think Auth (e.g. IAP) should be handled as a combination of the cluster prep and ksonnet parameter stages.
Ideally to enable certain kinds of auth e.g. IAP you would just add appropriate manifests to your ksonnet APP and these would then be deployed when you deploy your app. This is what we currently do for IAP. i.e if you set the appropriate parameters for IAP we will deploy envoy and some other endpoints.
But there are likely some things that you can't manage as K8s resources. For example, on GCP you might want to create a static IP. So in bootstrapper we'd just make an RPC to the appropriate GCP API.
I think this matches what you are saying about cloud providers and auth providers and mixing and matching. If you have an OAuth provider that is sufficiently general that it can run on multiple clouds then a user could select it and ksonnet parameters would be set to enable that manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify a bit on @jlewi's suggestion to make sure that I am understanding it correctly:
kubectl create -f kubeflow_cli.yaml
wherekubeflow_cli.yaml
has defaults so a data scientist does not have to decide them to deploy.kubectl exec -ti kubeflow-cli-0
opens a shell for the user to interact withkubectl
orks
.
a) If the user doesn't wish to interact with the shell, what is displayed to them?
b) Kubeflow is fully deployed at this point (i.e. I can open a notebook or start a ML job)
c) ks
is still available if I am familiar with how use it to make modifications
Forgive me, I'm not sure what "PD" stands for. Assuming its some sort of persistent storage or persistent disk (PD?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubectl create -f kubeflow_cli.yaml where kubeflow_cli.yaml has defaults so a data scientist does not have to decide them to deploy.
Correct in that user doesn't have to specify anything.
a) If the user doesn't wish to interact with the shell, what is displayed to them?
I think it depends on what we decide to do.
We could either require the user to always login via kubectl exec and then do "quick_start.sh" to deploy Kubeflow with automatic defaults.
Or we could just have kubectl crete -f kubeflow_cli.yaml automatically run quick_start.sh so user would only have to start a shell and use kubectl/ks if they weren't happy with the auto-configured deployment.
Forgive me, I'm not sure what "PD" stands for. Assuming its some sort of persistent storage or persistent disk (PD?).
Correct; persistent disk.
I'm basically just suggesting a couple modifications to @inc0's original PR
-
We should by default store the ksonnet app on the K8s cluster using the cluster default PVC (if cluster doesn't have a default, that could be handled as an exceptional case requiring the user to make a change to the YAML file).
-
Instead of running the container as a job and exiting, we should have the container enter an idle loop so that users can just do
kubectl exec
to start a shell and interact with the app and kubeflow deployment after the initial deployment is complete -
I don't think we should expose ksonnet parameters in kubeflow_cli.yaml. Instead, we should try to make the bootstrap go program more intelligent and pick better defaults for the ksonnet app. If users want to customize this deployment then we should tell customers how to do this using ksonnet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jlewi for the clarifications.
I'll let you iterate on the details. If possible #3, does seem appealing for an end user. Overall, simplicity and stability are key for the end user data scientist persona. By stability, I mean, what the user needs to do to troubleshoot when/if something goes wrong on initial install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlewi for 2. above could kubectl exec ...
also act as a kind of REPL? That is the data scientist doesn't have to explicitly enter a shell but could do something like kubectl exec ... tfjob start job1
. This would operate as a type of CLI if the bootstrap go program is given the REPL tfjob start job1
and would know that it needs to call the ks api to generate and apply the tfjob package. Maybe this is what you had in mind with 3. with making the go program more intelligent and could accommodate customizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inc0 Did you see my comment above
@kkasravi That's an interesting idea. It seems like that would just work since kubectl exec
allows executing binaries in a container that is already running. So it would work for any CLI we install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this PR folks :-)
bootstrap/deploy_job.yaml
Outdated
|
||
# TODO(inc0): Create rbac role bindings | ||
--- | ||
apiVersion: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify a bit on @jlewi's suggestion to make sure that I am understanding it correctly:
kubectl create -f kubeflow_cli.yaml
wherekubeflow_cli.yaml
has defaults so a data scientist does not have to decide them to deploy.kubectl exec -ti kubeflow-cli-0
opens a shell for the user to interact withkubectl
orks
.
a) If the user doesn't wish to interact with the shell, what is displayed to them?
b) Kubeflow is fully deployed at this point (i.e. I can open a notebook or start a ML job)
c) ks
is still available if I am familiar with how use it to make modifications
Forgive me, I'm not sure what "PD" stands for. Assuming its some sort of persistent storage or persistent disk (PD?).
So I'm thinking how to incorporate bootstrapper to this. Problem I have with bootstrapper (as it stands today) is that it, in the runtime, pulls packages etc. That means you can't have immutable image that will always deploy the same env. How about we move bootstrapper to different role? Next step would be to run deploy job which would use this image, have kubeflow pulled down already, then for every param in configmap run ks param set it and run ks apply.
Thoughts? |
If you always want to get the same environment you do this by checking in the ksonnet app into source control. If you want to consistently deploy the same Kubeflow environment you would do that by pulling the app from source control and deploying it. This is GitOps. There are a number of tools attempting to solve it (e.g. Argo CD, Weave etc...). I don't think consistently deploying the same environment is the goal of the bootstrapper;there are other tools for that problem. The goal of the bootstraper is to make it as simple as possible for users to get an optimized Kubeflow deployment. For this there are two problems we want to solve
We solve 1. by giving users a YAML manifest that runs a prebuilt image that invokes "quick_start.sh". We solve 2. by having quick_start.sh invokes the bootstrapper to produce a ksonnet app. At this point if users want to consistently get the same environment they can and should check their ksonnet app into source control. |
I think there are more friction points tho:
now what will be pretty hard to model is config file management - if whole app is essentially a config it's going to be really hard to maintain - imagine diffing 2 setups of kubeflow... I think we should be able to spawn kubeflow from single config file which can be extended and maintained by operator. That's why configmaps are for. Having to maintain local git repo just to have repeatable builds seems like overkill to be honest. |
I think you are conflating issues. Nothing I'm suggesting requires GitHub
We're not building anything. What we have is a set of config files describing the Kubeflow deployment as ksonnet. This is checked into source control. Those manifests are fully specified; they do not have any dependencies on tools outside source control. You do not need to check in
This is expanding the scope of the PR well beyond what I'm suggesting. You are effectively using a different mechanism from ksonnet to specify Kubeflow deployments. This is going beyond what I'm suggesting of
This isn't hard for me to imagine at all. Since the K8s community is embracing GitOps, I expect there are or will be numerous tools that will support intelligent diffs of YAML manifests specifying applications. WeaveWorks for example provides kubediff. These tools should operate on the final YAML output; not whatever templates(jsonnet, jinja2, go templates) were used to generate them.
See comment above Here are the changes I'd like to see so we can check it in and iterate
I suggest moving the following changes to follow on PRs just to scope it down and narrow the discussion.
|
@jlewi a compromise please;) This is statefulset so workflow will look like this:
That's temporary, because when #720 happens (blocked on ksonnet issue) we will inject bootstrapper into this workflow. Does this sound better? |
Also I didn't add PVC here since it doesn't make sense without bootstrapper - I'd need to pre-populate PVC with existing ksonnet app. This seems like overkill for what is temporary solution anyway |
I can live with that but can we still use PVC. Even without the use of the bootstrapper you still have a ksonnet app that is worth preserving. So you could just copy the ksonnet app from the container filesystem to the PVC if it doesn't exist. I'm fine submitting just that and not using the bootstraper in this PR but I wonder if the following would work
|
Reviewed 1 of 7 files at r1, 1 of 7 files at r4. deploy.sh, line 1 at r5 (raw file):
Why do we have deploy.sh & bootstraph.sh? bootstrap/Dockerfile, line 45 at r1 (raw file): Previously, inc0 (Michał Jastrzębski) wrote…
What do you mean in release? bootstrap/Dockerfile, line 47 at r1 (raw file): Previously, inc0 (Michał Jastrzębski) wrote…
Resolving this thread because I think we've reached consensus in other channels. bootstrap/Dockerfile, line 52 at r5 (raw file):
Lets add a TODO to come up with a better solution #722 bootstrap/Dockerfile, line 61 at r5 (raw file):
Since Kunming updated the ksonnet version does file repository work? bootstrap/start.sh, line 13 at r5 (raw file):
Add a TODO for issue #722 to come up with a valid solution for creating a .kube/config file in cluster bootstrap/start.sh, line 15 at r5 (raw file):
Why do we need a proxy? bootstrap/start.sh, line 19 at r5 (raw file):
Why do we need to use the PROXY? Can't we use the downard API to set an environment variable with the master ip? (I think TF job operator does this) Comments from Reviewable |
Reviewable causes me issues, so I'll reply here:)
publishing containers with baked in releases - bootstrapper:0.1.0 for example
I'll check, but regardless we still need #722 to be fixed.
I think proper solution would be to use this: https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod - there is golang binding that would do this, but we need to handle this in bootstrapper or ksonnet. I'd suggest to move everything to InClusterConfig if .kube/config is not present.
I'll check, proxy was straightforward that's why I did that. It's easy if we add mechanism to bootstrapper code tho |
A couple minor nits but I think this is largely ready to go. |
clusterIP: "None" | ||
|
||
--- | ||
apiVersion: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inc0 i ran into problems of a missing PV. I removed PersistentVolumeClaim and changed StatefulSet below. See https://cloud.google.com/kubernetes-engine/docs/how-to/stateful-apps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PV is there for resiliency of app - this is meant to be long running pod that should survive restart of node - hence need for PV
storage: 5Gi | ||
|
||
--- | ||
apiVersion: apps/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inc0 I used the following def (ignore the image path which uses an AIPG project)
---
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
name: kubeflow-toolbox
namespace: kubeflow
spec:
selector:
matchLabels:
app: kubeflow-toolbox
serviceName: kubeflow-toolbox
template:
metadata:
name: kubeflow-toolbox
labels:
app: kubeflow-toolbox
spec:
containers:
- name: kubeflow-toolbox
image: gcr.io/constant-cubist-173123/bootstrapper:v20180426-v0.1.1-10-g02b7d5a-e3b0c4
env:
- name: NAMESPACE
value: "kubeflow"
- name: DEPLOY_JOB
value: "1"
volumeMounts:
- name: kubeflow-toolbox-pvc
mountPath: /opt/kubeflow/app
volumeClaimTemplates:
- metadata:
name: kubeflow-toolbox-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
deploy.sh
Outdated
@@ -0,0 +1 @@ | |||
ks apply default --token=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have deploy.sh & bootstrap/deploy.sh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do either of these scripts actually get used anywhere?
@@ -0,0 +1 @@ | |||
ks apply default --token=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a shebang line and a description of what this is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well reason I created deploy.sh is just this ks apply is pretty long and non-trivial to figure out. it's a shortcut. it's easy to remember ./deploy.sh :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I'll remove bootstrap/deploy one, I think it's leftover. but good point about shebang
bootstrap/start.sh
Outdated
cp -R /kubeflow /opt/kubeflow/app | ||
fi | ||
cd /opt/kubeflow/app/kubeflow | ||
sleep 1000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tail -f /dev/null is better I think for run for ever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah can do, this runs for years so well...tomato tomato
- name: NAMESPACE | ||
value: "kubeflow" | ||
- name: DEPLOY_JOB | ||
value: "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 1? Why not true/false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has to be string with whatever in it. Can be true
metadata: | ||
name: kubeflow | ||
|
||
# Headless service because StatefulSet requires it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we only needed a service for a stateful set if we wanted to associate stable DNS names for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To deploy statefulset you need a service. So I had to create this mock service. Look at servicename field in statefulset definition, it won't run without it.
# TODO(inc0): Create rbac role bindings | ||
--- | ||
apiVersion: v1 | ||
kind: Namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Namespace should be inside the YAML file. This is too dangerous IMO.
If you do kubeflow delete -f kubeflow_toolbox.yaml I don't think we should automatically delete the namespace and everything in the namespace. I only think we should delete the statefulset.
I think users should manually create the namespace and manually delete it if they want to tear all resources down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well point is to be able to quickly pave whole thing. This definition doesn't allow parameters so it will be either kubeflow namespace or default namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's beneficial to keep it until we make it fully parametrized (for example via bootstrapper).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can still deploy it into namespace kubeflow and just tell the users to create the namespace kubeflow manually.
kubectl create namespace kubeflow
kubectl create -f https://.../toolbox.yaml
Consider the case where the user had a preexisting namespace kubeflow in their cluster and now they do kubectl delete -f yourmanifest
The idea that this could delete a preexisting namespace in my cluster is not all expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I'll deploy toolbox in default ns and show how to define own ns
I think its much better to deploy in the non-default namespace. This way its easy for users to clean up. Plus if you specify I really don't think telling users to create the namespace manually is that big a deal. I'll leave it to you to decide. /lgtm Cancel the hold if you want to submit as is without using the non-default namespace. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jlewi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
* Create deploy job * move job to examples and remove options * make job a statefulset * move app to pvc * add TODOs * address review
* modify some dockerfile to support power * update more dockerfiles for v2 * modify wrong manager-rest name
* move metadata-db from base to separate overlay use parameter files for db configuration * kfdef files use db overlay for metadata component * update metadata tests * disable namesuffixhash for generated resources * add prefix to generated resources * update unit test
Once image is available somewhere (for testing you can use inc0/bootstraper:latest, just edit the job def) whole setup of kubeflow will be as easy as kubectl create -f deploy_job.yaml.
One missing thing is RBAC role bindings, to avoid them for now you can use https://kubernetes.io/docs/admin/authorization/rbac/#permissive-rbac-permissions
This change is