Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-127: graduate to Beta for 1.30 #4439

Merged
merged 10 commits into from
Feb 8, 2024
Merged

Conversation

giuseppe
Copy link
Member

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 25, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 25, 2024
@giuseppe giuseppe mentioned this pull request Jan 25, 2024
26 tasks
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
So it is rendered properly on github.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
@giuseppe giuseppe force-pushed the rata/userns branch 2 times, most recently from 91377aa to d7e437c Compare January 26, 2024 14:46
@thockin thockin self-assigned this Jan 31, 2024
@thockin
Copy link
Member

thockin commented Jan 31, 2024

FTR - Prs with no assignee often fall thru cracks. :(

@rata
Copy link
Member

rata commented Jan 31, 2024

/assign @mrunalp @wojtek-t

@thockin thanks, let's see if this command works, then :)

Comment on lines +596 to +635
- Get review from VM container runtimes maintainers (not blocker, as VM runtimes should just ignore
the field, but nice to have)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fidencio @littlejawa have you or anyone from kata community taken a look?

- Detection: How can it be detected via metrics? Stated another way:
how can an operator troubleshoot without logging into a master or worker node?

In this case the Kubelet will fail to start with a clear error message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is through subuids binary don't we fallback to default range?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fallback to default ranges only if there is no kubelet user configured, or there is no getsubids binary present.

If the kubelet user is defined and we fail to read its configuration, an error should be reported.

If one API server is upgraded while others aren't, the pod will be accepted (if the apiserver is >=
1.25). If it is scheduled to a node that the kubelet has the feature flag activated and the node
meets the requirements to use user namespaces, then the pod will be created with the namespace. If
it is scheduled to a node that has the feature disabled, it will be scheduled without the user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scheduled or created?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rata @giuseppe - please take a look at my comments from the PRR

- Should we allow any way for users to for "more" IDs mapped? If yes, how many more and how?
- Should we allow the user to ask for specific mappings?
- Get review from VM container runtimes maintainers
- Gather and address feedback from the community
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we resolve the remaining unresolved part in this KEP (being support for Windows and runtimes that don't support linux namespaces)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows support was resolved several months ago. Windows maintainers commented that they don't want this feature in windows and it looked fine anyways from their point of view.

We will try to gather feedback from VM runtimes, yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to update with the links later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"don't want this feature in windows"

Love it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That SGTM - just please update the KEP to reflect it :)

@@ -678,7 +686,7 @@ well as the [existing list] of feature gates.
-->

- [x] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name: UserNamespacesStatelessPodsSupport
- Feature gate name: UserNamespacesPodsSupport
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't comment on unchanged lines, so commenting here:

  1. Please fill-in Upgrade/Downgrade strategy

  2. For version skew, can you please also refer to the case where kube-apiserver understands the new field (spec.hostUsers) and kubelet doesn't [and vice-versa]

  3. "Old runtime and new kubelet" - the fact that we create the pod in host namespace is somewhat counter-intuitive - as an operator I would expect that if I turned on the FG it works and is not silently dropped. Can we at least emit some error / event / metric / sth - in that case?

  4. "Are there any tests for feature enablement/disablement?"
    Were those added? If not can you please prioritize and add sth like that explicitly as beta criteria?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gate seems to actually be named "UserNamespacesSupport"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Update/Downgrade is not filled in at all
  2. For version skew I explicit wrote - please describe the case what happens if feature is enabled in kube-apiserver and not in kubelet and vice-versa.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Oh, that is one of the few section that doesn't have any comments, it is such a small text that I didn't see it! We will add it now, sorry!

@@ -753,6 +761,18 @@ This section must be completed when targeting beta to a release.

###### How can a rollout or rollback fail? Can it impact already running workloads?

The rollout is just a feature flag on the kubelet and the kube-apiserver.

If one API server is upgraded while others aren't, the pod will be accepted (if the apiserver is >=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this sentence - it doesn't matter how many apiservers were upgraded, rather if the one I'm talking to now was, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry. This should be something like:

If one APIserver is upgraded while other's aren't and you are talking to a not upgraded APIServer...

namespace.

On a rollback, pods created while the feature was active (created with user namespaces) will have to
be restarted to be re-created without user namespaces. Just a re-creation of the pod will do the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly misleading to me - what does "restarted" mean here?
If we recommend recreation, let's make that explicit and remove the reference to "restarting"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

On Kubernetes side, the kubelet should start correctly.

On the node runtime side, a pod created with pod.spec.HostUsers=false should be on RUNNING state if
all node requirements are met.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this answer - if I'm managing 1000s of clusters, it's not something I can reasonably monitor. Can we rely on easier to monitor conditions - ideally some metrics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add metrics to signal on the errors listed here in the KEP (in the section "What are other known failure modes?").

This can potentially cover a lot at the kubelet layer. But things like one of the the filesystems used by the pod doesn't support idmap mounts on the kernel running will not be possible to detect from the kubelet. The runtime will return a clear error on that case, that is shown to the user, but I don't think we can have a metric for that.

Does doing that sounds like a good plan for you?

I'm not convinced about this, though. All errors are returned to the user, so it's very visible if it fails. And having a proliferation of metrics can also harm, especially long-term. Do you think this is definitely a "yes, this metrics are useful" or do you have mixed feelings about it too?

Also, do you think this is needed for beta graduation or can be added while the feature is in beta and still disabled by default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - let me clarify my concern as I think I wasn't clear enough.

For anything that returns a clear error to usage, we're good - I don't think we need anything else.

What I'm worried about is the fact of silently not fulfilling user intents.
If (as a I user) I'm requesting to run in user namespace (by setting the new podspec field) and my pod goes to RUNNING state, I expect it to be running in a user namespace. If it's running in a host namespace, it's a big problem and at the very least I would like to see that exposed in an easy to digest for the operator way (metric).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so forget those metrics, then :-)

If we do what we suggested here, that basically the pod with userns only will be created if the CRI runtime knows about userns (an error will be returned otherwise), will that do it?

@@ -803,6 +839,11 @@ logs or events for this purpose.

###### How can someone using this feature know that it is working for their instance?

Check if any pod has the pod.spec.HostUsers field set to false and is on RUNNING state on a node
that meets all the requirements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How as an operator I can check if the node meets all the requirements?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if any pod has the pod.spec.HostUsers field set to false and is on RUNNING state on a node
that meets all the requirements.

There are step-by-step examples in the Kubernetes documentation too.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a link to this: https://kubernetes.io/docs/tasks/configure-pod-container/user-namespaces/

I didn't do it because links here can easily become stale over time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

keps/sig-node/127-user-namespaces/README.md Show resolved Hide resolved
levels that could help debug the issue?
Not required until feature graduated to beta.

The error is returned on pod-creation, no need to search for logs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather on an attempt to start the pod?

IIUC, Kubelet will get an error from the runtime and will propagate it to the pod, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify in the text?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. Added it now! Here and in the rest of the errors, that this is visible on the pod events.

- Testing: Are there any tests for failure mode? If not, describe why.

It is part of the system configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great answer playbook entry - the best I've seen so far - thanks!

@@ -803,6 +839,11 @@ logs or events for this purpose.

###### How can someone using this feature know that it is working for their instance?

Check if any pod has the pod.spec.HostUsers field set to false and is on RUNNING state on a node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Check if any pod has the pod.spec.HostUsers field set to false and is on RUNNING state on a node
Check if any pod has the `.spec.ghstUsers` field set to false and is on RUNNING state on a node

- Condition name:
- Other field:
- [x] Other (treat as last resort)
- Details: check pods with pod.spec.HostUsers field set to false, and see if they are in RUNNING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Details: check pods with pod.spec.HostUsers field set to false, and see if they are in RUNNING
- Details: check pods with `.spec.hostUsers` field set to false, and see if they are in RUNNING

- Detection: How can it be detected via metrics? Stated another way:
how can an operator troubleshoot without logging into a master or worker node?

In this case the Kubelet will fail to start with a clear error message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
In this case the Kubelet will fail to start with a clear error message.
In this case the kubelet will fail to start with a clear error message.

pod had to be unique for every pod in the cluster, easily reaching a limit when the cluster is "big
enough" and the UID space runs out. However, with idmap mounts the IDs assigned to a pod just needs
to be unique within the node (and with 64k ranges we have 64k pods possible in the node, so not
really an issue). IOW, by using idmap mounts, we changed the IDs limit to be node-scoped instead of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
really an issue). IOW, by using idmap mounts, we changed the IDs limit to be node-scoped instead of
really an issue). In other words: by using idmap mounts, we changed the IDs limit to be node-scoped instead of

Comment on lines +1375 to +1438
There are no known use cases for longer mappings that we know of. The 16bit range (0-65535) is what
is assumed by all POSIX tools that we are aware of. If the need arises, longer mapping can be
considered in a future KEP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no known use cases?

Two examples come to mind:

  • running a container tool inside a Pod, where that container tool wants to use a UID range
  • running an application inside a Pod where the application uses UIDs above 65535 by default; I've worked with some of these, usually avoiding the 0-65535 range to avoid the risk of conflict with a local system user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The first example doesn't necessarily need bigger mappings.
  2. We didn't hear of any such application that wasn't possible to configure it. The point stands: that is out of scope for this KEP, it can be worked on in a future KEP.


There are no known use cases for longer mappings that we know of. The 16bit range (0-65535) is what
is assumed by all POSIX tools that we are aware of. If the need arises, longer mapping can be
considered in a future KEP.

### Allow runtimes to pick the mapping?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
### Allow runtimes to pick the mapping?
### Allow runtimes to pick the mapping

Alternatives aren't questions, they're things we have ruled out and can provide a reason for that exclusion.

@@ -1149,15 +1358,23 @@ KEPs can explore this path if they so want to.

### 64k mappings?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should frame this as an alternative that we have ruled out.

@rata
Copy link
Member

rata commented Feb 2, 2024

@wojtek-t thanks for the detailed review! I'm on my train to FOSDEM now, so I'm not sure I'll be able to answer all now

rata and others added 2 commits February 7, 2024 15:21
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@rata
Copy link
Member

rata commented Feb 7, 2024

@wojtek-t PTAL, updated the KEP

keps/sig-node/127-user-namespaces/README.md Show resolved Hide resolved
keps/sig-node/127-user-namespaces/README.md Show resolved Hide resolved
will be accepted (if the apiserver is >= 1.25). If it is scheduled to a node that the kubelet has
the feature flag activated and the node meets the requirements to use user namespaces, then the
pod will be created with the namespace. If it is scheduled to a node that has the feature disabled,
it will be created without the user namespace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it's no longer true after our discussion - the pod should fail to start in the second case, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t No, this is still true. If the feature is disabled in the kubelet, the kubelet completely ignores that field. The pod will only fail if the kubelet has the feature activated and the container runtime doesn't support it.

I don't think the kubelet can act on a field that should only act if the feature is enabled. We use this a lot on the error cases mentioned in detail in some other section, where you can just disable the feature gate to stop any bleeding that there might be related to this.

@@ -753,6 +815,17 @@ This section must be completed when targeting beta to a release.

###### How can a rollout or rollback fail? Can it impact already running workloads?

The rollout is just a feature flag on the kubelet and the kube-apiserver.

If one APIserver is upgraded while other's aren't and you are talking to a not upgraded the pod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this first sentence - I don't think it matters really and it doesn't bring value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Sorry, I thought it would help as context, as there are lot of keps that added a field already and we all rely on existing plumbing to handle this correctly. But removed :)

it will be created without the user namespace.

On a rollback, pods created while the feature was active (created with user namespaces) will have to
be re-created without user namespaces.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... If those weren't recreated, they will continue to run in user namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!


On the node runtime side, a pod created with pod.spec.hostUsers=false should be on RUNNING state if
all node requirements are met. If the CRI runtime or the handler do not support the feature, the kubelet
returns an error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This puts the pod into Failed state, right?

So the number of Failed pods is effectively what we can look at, right?

[Not a native metric, but something measurable]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the following to clarify.


When a pod hits this error returned by the kubelet, the status in kubectl is shown as
ContainerCreating and the pod events shows:

  Warning  FailedCreatePodSandBox  12s (x23 over 5m6s)  kubelet            Failed to create pod sandbox: user namespaces is not supported by the runtime

The following kubelet metrics are useful to check:

  • kubelet_running_pods: Shows the actual number of pods running
  • kubelet_desired_pods: The number of pods the kubelet is trying to run

If these metrics are very different, it means there are desired pods that can't be set to running.
If that is the case, checking the pod events to see if they are failing for user namespaces reasons
(like the errors shown in this KEP) is advised, in which case it is recommended to rollback or
disable the feature gate.

@@ -803,6 +899,10 @@ logs or events for this purpose.

###### How can someone using this feature know that it is working for their instance?

If the runtime doesn't support user namespaces an error is returned by the kubelet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the pod will effectively transition to Failed state, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a FailedCreatePodSandBox event. kubectl is shown as
ContainerCreating and the pod events shows:

  Warning  FailedCreatePodSandBox  12s (x23 over 5m6s)  kubelet            Failed to create pod sandbox: user namespaces is not supported by the runtime

keps/sig-node/127-user-namespaces/README.md Show resolved Hide resolved
levels that could help debug the issue?
Not required until feature graduated to beta.

The error is returned on pod-creation, no need to search for logs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify in the text?

@@ -579,14 +628,11 @@ use container runtime versions that have the needed changes.

##### Beta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think critest should be updated during Beta:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

@rata
Copy link
Member

rata commented Feb 8, 2024

@wojtek-t Fixed the latest review commets. PTAL! I've added it as a separate commit, so you can easily see the diff

Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
pod.spec.hostUsers and has the feature gate enabled.

If the kube-apiserver doesn't support the feature at all (< 1.25), a pod with userns will be
rejected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true - kube-apiserver is not rejecting the pod, but rather is silently dropping this field.

Also - this isn't only about the release <1.25, but rather also when the feature gate is disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is rejecting the pod. This is what happens when a field is known, for example:

$ kubectl apply -f mypod2.yaml 
Error from server (BadRequest): error when creating "mypod2.yaml": Pod in version "v1" cannot be handled as a Pod: strict decoding error: unknown field "spec.nonExistingField"

I added the pod.spec.nonExistingField: true to the pod.

Am I missing something? Or are we talking about different things?

Regarding the feature gate disabled, that is mentioned a few lines above: 3766329#diff-a9ca9fbce1538447b92f03125ca2b8474e2d875071f1172d2afa0b1e8cadeabaR684-R686

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strange - the whole point of having the dropDisabledFields pattern is to handle and drop the field if the FG is disabled.
@liggitt - can you please shed some light on it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, this is the apiserver as started with the local-up-cluster.sh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. kubectl opts into strict mode, where unknown fields are flagged / rejected. kubectl apply ... --validate=false shows what normal API clients (like controllers) would experience, where unknown fields are just dropped.
  2. emulating the behavior where unknown fields are dropped on create when a feature is disabled is the standard approach taken.
  3. rarely, for a security-related feature, we will reject attempts to use the new security-related field if the feature is disabled rather than dropping the field. The only one I can remember off the top of my head was the windows hostProcess field

Copy link
Member

@rata rata Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt does that mean that, then, this is handled as expected by the apiserver? We aren't doing anything special in the apiserver for this field (pod.spec.hostUsers) on the apiserver.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we even need this sentence? 1.25 has been out of support for a while, and we'd also not support a 1.30->1.25 skew between kubelet and KAS

Copy link
Member

@liggitt liggitt Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't doing anything special in the apiserver for this field (pod.spec.hostUsers) on the apiserver.

Once you add the field, you need to start doing things with it in the apiserver when the feature is disabled (either drop the field data or reject the request)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to specify the kube-apiserver will drop the field if it's unknown

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does it look @liggitt @wojtek-t

If the kube-apiserver doesn't support the feature at all (< 1.25), a pod with userns will be
rejected.

If the kubelet doesn't support the feature (< 1.25), it will ignore the pod.spec.hostUsers field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Kubelet is on old version not knowing the field - this is fine.

I'm more interested what happens if Kubelet is new enough and understands this field (say even 1.30), but it has feature gate disabled?
Will it ignore the field completely? [this isn't expected]
Or rather, will it reject the pod because it can't fulfill it? - which is what I would expect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should reject the pod in that case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current proposal is that it will ignore it completely. So feature disabled and an old kubelet that doesn't even know about this field, behave the same way.

We can change it to reject the pod in that case too (it is trivial). I proposed it this way, so by disabling the feature gate you can really stop any bleeding related to user namespaces. It seems more risky in case there is some bug, IMHO.

We can change it to that if you prefer, of course. As I said, it is trivial to do that too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we should reject (as I wrote above).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, changed the text to do that instead. Thanks! :)

If it is scheduled to a node where the kubelet has the feature flag activated
and the node meets the requirements to use user namespaces, then the pod will be
created with the namespace. If it is scheduled to a node that has the feature
disabled, it will be created without the user namespace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for version skew.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to that!

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Signed-off-by: Peter Hunt <pehunt@redhat.com>
@wojtek-t
Copy link
Member

wojtek-t commented Feb 8, 2024

OK - with the recent changes, I'm fine with it from PRR pov.

/lgtm
/approve PRR

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, mrunalp, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 8, 2024
@k8s-ci-robot k8s-ci-robot merged commit 07953da into kubernetes:master Feb 8, 2024
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

10 participants