Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3085: Add KEP for Pod network ready condition #3087

Merged
merged 1 commit into from
Jun 22, 2022

Conversation

ddebroy
Copy link
Member

@ddebroy ddebroy commented Dec 14, 2021

  • One-line PR description:
    KEP to surface PodHasNetwork condition in pods to indicate when pod sandbox creation and network configuration of pod through CRI runtime is complete.
  • Other comments:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 14, 2021
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Dec 14, 2021
@ddebroy ddebroy changed the title KEP-3085: Initial KEP draft for pod sandbox conditions KEP-3085: Initial KEP draft for pod sandbox conditions [WIP] Dec 14, 2021
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 4, 2022
@ddebroy ddebroy changed the title KEP-3085: Initial KEP draft for pod sandbox conditions [WIP] KEP-3085: Initial KEP draft for pod sandbox conditions Jan 4, 2022
@ddebroy ddebroy changed the title KEP-3085: Initial KEP draft for pod sandbox conditions KEP-3085: KEP draft for pod sandbox conditions Jan 4, 2022
@marosset
Copy link
Contributor

marosset commented Jan 4, 2022

I think this would be useful since it can be hard to determine why pods are not running if there are errors during CreatePodSandbox - kubernetes/kubernetes#104635

@ehashman
Copy link
Member

ehashman commented Jan 4, 2022

/assign

@jsturtevant
Copy link
Contributor

I think this would be useful since it can be hard to determine why pods are not running if there are errors during CreatePodSandbox - kubernetes/kubernetes#104635

some more info here: kubernetes/kubernetes#105984

@ddebroy ddebroy changed the title KEP-3085: KEP draft for pod sandbox conditions KEP-3085: KEP for pod sandbox conditions Jan 5, 2022
@ddebroy ddebroy force-pushed the pod-status branch 4 times, most recently from ff5fc47 to cd53e91 Compare January 5, 2022 22:10
Copy link

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds reasonable and useful, a few minor nitpicks though.

@ddebroy
Copy link
Member Author

ddebroy commented Jan 9, 2022

/assign @derekwaynecarr
(based on sig-node discussion)

@sftim
Copy link
Contributor

sftim commented Jun 16, 2022

Some flavors of PodHasNetwork might need more values than just true and false. We could define the condition to be true if-and-only-if the networking is fully up, or we could use an unknown condition (:grimacing:) to cover cases such as “IPv6 is set up already but IPv4 still coming up”.

We might also want to define how to represent this condition if using a CNIs that implements NetworkPolicy. That might be better left to a separate condition, though, to avoid holding up this enhancement?

@ddebroy
Copy link
Member Author

ddebroy commented Jun 17, 2022

Some flavors of PodHasNetwork might need more values than just true and false. We could define the condition to be true if-and-only-if the networking is fully up, or we could use an unknown condition (😬) to cover cases such as “IPv6 is set up already but IPv4 still coming up”.

Setting up of the network concludes the pod "sandbox" creation process coordinated between Kubelet => CRI runtime => CNI plugins. Thus, in the context of this KEP, the new PodHasNetwork condition is essentially surfacing the state of the pod "sandbox" (whether the Kubelet => CRI runtime => CNI plugin invocation chain made it successfully) while allowing for other similar conditions in the future (like PodHasVolumes indicating all Kubelet => in-tree/csi plugin invocations have successfully configured and mounted the volumes) if needed.

If we want more granular network-related conditions (which is not within scope of the KEP), individual CNI plugins (with a API server client) can mark any necessary condition on the pod using extended conditions - for example, cni.kubernetes.io/pod-has-ipv4 and cni.kubernetes.io/pod-has-ipv6.

Note that until all CNI plugins in the configured chain do succeed, CRI runtime won't return an "in-process"/intermediate outcome to Kubelet. Either all networking configuration and IP allocation succeeded (i.e. networking is fully up for the pod) OR network configuration failed and the entire CRI sandbox creation + network configuration has to be retried.

@ddebroy
Copy link
Member Author

ddebroy commented Jun 17, 2022

@derekwaynecarr I have updated the KEP with the suggested PodHasNetwork condition. PTAL when you get a chance.

@sftim
Copy link
Contributor

sftim commented Jun 17, 2022

/retitle KEP-3085: Add KEP for Pod network ready condition

Hope that's OK.

@k8s-ci-robot k8s-ci-robot changed the title KEP-3085: KEP for pod sandbox ready condition KEP-3085: Add KEP for Pod network ready condition Jun 17, 2022
@sftim
Copy link
Contributor

sftim commented Jun 17, 2022

@ddebroy consider omitting “draft” from the PR description at this point.

Signed-off-by: Deep Debroy <ddebroy@gmail.com>
@johnbelamaric
Copy link
Member

PRR looks fine, I will wait until there is SIG approval though because I am root in this repo

becomes especially relevant in multi-tenant clusters where individual tenants
own the pod specs (including the set of init containers) while the cluster
administrators are in charge of storage plugins, networking plugins and
container runtime handlers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out the mismatching phase interprets on Initialized between pods with init containers and without. But the issue isn't obvious to the users since we hide the pod sandbox creation completely in the implementation. To the end user, it is more like all init containers are successfully run, now it is the time to start the app containers. With the newly proposed PodCondition: PodHasNetwork, shouldn't the issue surface to the end users now?

With init containers: PodHasNetwork -> Initialized -> ... -> Ready
Without init containers: Initialized -> PodHasNetwork -> ... -> Ready

Can we consolidate above flows? We can move Initialized after PodHasNetwork to indicate Kubelet now can start any app containers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the issue isn't obvious to the users since we hide the pod sandbox creation completely in the implementation. To the end user, it is more like all init containers are successfully run, now it is the time to start the app containers. With the newly proposed PodCondition: PodHasNetwork, shouldn't the issue surface to the end users now?

The inconsistency already surfaces with the way things are today when CSI/CRI/CNI is unable to mount volumes or launch the pod sandbox upon encountering errors: without init containers, today, the pod status reports the Initialized condition to be true but Kubelet also reports error events like FailedCreatePodSandBox, FailedMount, etc against the pod (which are reported as part of kubectl describe etc)

Can we consolidate above flows? We can move Initialized after PodHasNetwork to indicate Kubelet now can start any app containers.

We can certainly consider that. I will add that to the KEP post-merge of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a discussion on this in sig-node on July 5th. The suggestion was to carry on with the PodHasNetwork condition as described in this KEP as the ordering of transition of pod conditions should not be important. We can separately address the situation with Initialized condition for pods without init containers (mentioned above) in a separate KEP as that feels like an independent problem.

What is out of scope for this KEP? Listing non-goals helps to focus discussion
and make progress.
-->
- Modify the meaning of the existing `Initialized` condition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my above comment.

@dchen1107
Copy link
Member

Add some more comments, but overall KEP is well written and lgtm. To unblock the progress for such useful feature, I will approve this to meet the deadline.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 22, 2022
@johnbelamaric
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, ddebroy, ehashman, johnbelamaric, qiutongs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 22, 2022
@ddebroy
Copy link
Member Author

ddebroy commented Jun 22, 2022

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 22, 2022
@k8s-ci-robot k8s-ci-robot merged commit 201eb5b into kubernetes:master Jun 22, 2022
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddebroy - Nice features - I just added two minor comments. If you could adjust them in the follow up PR it would be great.

automations, so be extremely careful here.
-->

No changes to any default behavior should result from enabling the feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully agree with this - by default each pod will get a new condition set.
So if someone is doing equality comparisons, they will start seeing changes.

[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
-->

No
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may potentially increase pod-startup-time (which includes reporting the state) if kubelet is starved on QPS limits.

[This is fine - I'm just pointing this out for completeness...]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note is that the Kubelet Status Manager caches all status updates and queues and sends them to API server in an async fashion. So the updates to the pod conditions (existing ones as well as the new one added in this KEP) do not happen synchronously with actual pod creation and initialization activities. Given this, do you think the above may still be a concern @wojtek-t ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I knew that. But still - if the conditions are not changing fast enough and individual phases take a bit of time, we may need to do more API calls. And in that case, it may increase pod startup if kubelet is cpu-starved.

As I mentioned - I don't treat it as a concern or anything blocking - I just wanted it to be added for completeness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/enhancements Issues or PRs related to the Enhancements subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.