-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Isolate kubelet from etcd #860
Proposal: Isolate kubelet from etcd #860
Conversation
|
||
## Preferred solution: | ||
|
||
Implement the first parts of option 3 - an efficient watch API for the pod, service, and endpoints data for the Kubelet and Kube Proxy. Authorization and authentication are planned in the future - when a solution is available, implement a custom authorization scope that allows API access to be restricted to only the data about a single minion or the service endpoint data. Replace the event publishing mechanism in the kubelet with a polling mechanism or a simple API endpoint and guard it similarly to the other minion specific requests, and ensure the data is correctly attributed to the source. Make the apiserver stateless - this is already a desirable outcome. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of this approach well. @jbeda and I were just suggesting migrating to watch on top of the API server today in a different context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of the result. I can't help but feel that etcd will need to solve this ANYWAY and the idea that we're both going to arrive at very similar results is somewhat annoying. I mean, etcd is an API, just a bit lower level. Can that be fixed with less net work?
3. Implement efficient "watch for changes over HTTP" to offer comparable function with etcd | ||
4. Ensure that the apiserver can scale at or above the capacity of the etcd system. | ||
5. Implement authorization scoping for the minions that limits the data they can view | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Teach etcd how to do identity and access-control directly.
1874549
to
e54e076
Compare
I've updated with option 4, teach etcd how to do identity and acl. Of note is that it would be a different identity and acl solution than the apiserver, which means that auth(z|n) would have to be managed differently for those calls. It's also less advantageous for isolating the details of the data store from the implementation of the clients. I suspect option 3 is going to be useful in general for other people anyway since it allows apiserver clients to watch for changes to pods on minions for accounting / monitoring / service discovery purposes. |
Is there any further feedback on this proposal? @lavalamp and I have sorted through most of the remaining issues for Pods, and before that final push I want agreement on the pattern. |
One open question to @thockin - should what the kubelet "sees" of a pod be fundamentally different than what the client sets for a pod? We discussed in the past having two different ways of looking at pods, but then in other conversations I keep seeing things that are relevant for the kubelet to care about (labels, resourceVersion, id, etc). Assuming this is accepted, perhaps we should do a quick proposal on what the kubelet -> apiserver interface should be and get agreement on that as well prior to me closing out my issue. |
Just going to add this rather than wait for comments... :) Modeling the interface that the Kubelet would see to determine the block of pods scheduled for it:
There are two primary ways to model that:
The subtle difference between the two is that under the covers we are modeling the scheduled pods using a single atomic key so that we can impose constraints atomically. I think that's an important distinction - we guarantee under the covers that you see an atomic set of pods at all times. The latter proposal therefore has to convert an atomic list into a stream of update notifications, which requires it to keep in memory the previous state and do a delta between them. The latter also assumes that the Pods in the API are the same as the pods that are scheduled, and if there is every any delta between the two, then the Pods API would return different values for different states. After having tried both implementations, I lean much more strongly towards the former. It allows the pods on the minion to be clearly versioned and watched (watch returns a PodList or similar, rather than. It more directly exposes the fact that scheduled pods are an atomic block, and matches the underlying model more closely. It also allows the API of /minionPods to vary from /pods and to be API versioned differently. |
Reading now, sorry this one got lost. On Wed, Sep 3, 2014 at 9:14 AM, Clayton Coleman notifications@github.com
|
re: API, just a quick sketch of how my brain wants the API to be factored. https://github.com/thockin/kubernetes/compare/api_proposal On Wed, Sep 3, 2014 at 10:39 PM, Tim Hockin thockin@google.com wrote:
|
Can't comment directly, doing it here This seems like PodTemplate Agree this is a good separation. I think I'd been leaning this way as well, in which case we're talking about a distinct resource for pods anyway and we need to clearly call it out as separate. I don't see pods being bound to multiple servers in the Kube system (disagreement?). In general I don't disagree with an alignment like that. |
See also #1178 -- API support for diff'ing, which also includes some cleanup/refactoring. It covers all the objects, not just pods, and proposes nesting of JSONBase within a field and moving labels there, among other things. The pod template issue is #170. I agree that PodSpec looks similar. I don't get why it includes JSONBase. It should be embedded into both Pod and PodTemplate. Why is the manifest in PodStatus? Does BoundPod really need to be different from Pod? I suppose eventually we'll want Kubelet to return different CurrentStatus for the pod than the apiserver. If we split desired and current state into completely separate messages/types, we'd only need to fork the current state types, but I suppose that would be less RESTful. Ideally, Kubelet would follow the same API conventions as our other APIs, however. I'd like it to be possible to target "free-range" Kubelets with the config system. |
It feels like we are converging on a largish API overhaul. Who is going to be responsible for owning the process of collecting changes, weighing options, and writing up a proposal? I'll do it, but I'm afraid I am already stretched thin... |
I've got some bandwidth to do so. Right now I see the diff issue, name and namespace, pod -> minion, and resource version on all operations. Others? |
And pod template |
If we're doing a large API overhaul, we should cut version v1beta2 right now with our current api. (so that these big changes will land in v1beta3.) I can improve the conversion functions to add any needed functionality, right now it's not super easy to move info between hierarchical levels. |
@lavalamp What changes would go into v1beta2? @dchen1107 also urgently needs some API changes. |
OK, I annoint @smarterclayton as the cat-herder for API proposals for On my list, so far
We should set a relatively short time horizon to get ideas written up, some On Fri, Sep 5, 2014 at 1:41 PM, bgrant0607 notifications@github.com wrote:
|
Discussion on each issue started next week, with references in Kubernetes-dev. Rough concrete examples of API syntax for folks to debate next week. Identification of anything out of scope next week. Debate and refinement into week after? |
Can one of the admins verify this patch? |
Could we update this doc with the current plan and put it in the design doc directory? |
Yes, will do. |
Discusses the current security risks posed by the kubelet->etcd pattern and discusses some options. Triggered by kubernetes#846 and referenced in kubernetes#859
e54e076
to
99977ce
Compare
Updated (minor edits, can be more drastic if needed). |
Proposal: Isolate kubelet from etcd
Always to a live lookup of version info instead of caching.
Automatic merge from submit-queue (batch tested with PRs 61803, 64305, 64170, 64361, 64339). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Improve the help of kubeadm completion **What this PR does / why we need it**: Add note that 'bash-completion' is required on Linux too. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes [kubernetes/kubeadm/#860](kubernetes/kubeadm#860) **Special notes for your reviewer**: cc @neolit123 **Release note**: ```release-note NONE ```
Makefile: Cleanup, alpine and amd64 only UDP
Discusses the current security risks posed by the kubelet->etcd pattern
and discusses some options.
Triggered by #846 and referenced in #859
Does not have to be merged, for discussion and review