-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: CSIStorageCapacity #21634
storage: CSIStorageCapacity #21634
Conversation
Deploy preview for kubernetes-io-vnext-staging processing. Building with commit 19b8f84 https://app.netlify.com/sites/kubernetes-io-vnext-staging/deploys/5f0ff929e708250008a533a4 |
/milestone 1.19 |
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some early feedback
4fd0b37
to
f188ed2
Compare
Pinging:
|
f188ed2
to
a76ae73
Compare
- pohly | ||
title: Ephemeral Volumes | ||
content_type: concept | ||
weight: 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to check the weight....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and remove this file! It's supposed to be merged via #22438
a76ae73
to
247d9ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a quick look. I wonder if this actually belongs inside https://kubernetes.io/docs/concepts/scheduling-eviction/ (with lots of inbound signposting from the storage concepts section).
@pohly what do you think?
I will also try to find time to take a longer look at this PR.
/sig storage |
I'm undecided. We could have used storage topology support as guiding example, but I couldn't find any documentation for that. Also related are node-specific volume limits (https://kubernetes.io/docs/concepts/storage/storage-limits/), which is documented under storage although it is a scheduler feature. Following that example and because the implementation is owned by SIG-Storage, I'd prefer to keep storage capacity under concepts/storage. Just my 2 cents, I'm also fine with moving it. |
OK, makes sense. Please consider signposting to this from any relevant pages about scheduling! |
referenced in a Pod via a `PersistentVolumeClaim` object. | ||
A `csi` volume can be used in a pod in three different ways: | ||
- through a reference to a [`persistentVolumeClaim`](#persistentvolumeclaim) | ||
- with a [generic ephemeral volume](/docs/concepts/storage/ephemeral-volumes/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a chunk that should be in generic ephemeral volume PR. And generic ephemeral volumes are not limited to CSI, in-tree volumes can be used too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, right. Removed.
This page describes how Kubernetes keeps track of storage capacity and | ||
how the scheduler uses that information to schedule pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I miss some "why" - why does the scheduler need to use the capacity information? To provision volumes on nodes (or topology segment) that actually have some free space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added ".... scheduler uses that information to schedule pods onto nodes
that have access to enough storage capacity for the remaining missing
volumes. Without storage capacity tracking, it is random whether the selected
node can run the Pod and multiple scheduling retries may be needed."
includes the node. Without storage capacity tracking, nodes are picked | ||
without this check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe consequence of missing this check - like pods can be scheduled to nodes that do not have any free space for dynamic provisioning of a new volume, resulting in the pod not running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After extending the introduction, this became a bit redundant. I just removed the "Without..." part here entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
247d9ac
to
7953775
Compare
how the scheduler uses that information to schedule Pods onto nodes | ||
that have access to enough storage capacity for the remaining missing | ||
volumes. Without storage capacity tracking, it is random whether the | ||
selected node has enough storage for the Pod and multiple scheduling retries may be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it is random
? Could this be reworded.
Without storage capacity tracking, the way a node is selected ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
Without storage capacity tracking, the scheduler may choose a node that doesn't have enough capacity to provision a volume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
another volume. Manual intervention is necessary to recover from this, | ||
for example by increasing capacity or deleting the volume that was | ||
already created. [Further | ||
work](https://github.com/kubernetes/enhancements/pull/1703) is needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to list a link to a pull request? This link is not stable.
Could you list this issue in the release notes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the KEP hasn't even be merged as provisional yet, all we have is this link to the proposal.
I'll make sure to list it in the release notes.
|
||
- For more information on the design, see the | ||
[Storage Capacity Constraints for Pod Scheduling KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1472-storage-capacity-tracking/README.md). | ||
- For more information on further development of this feature, see the [enhancement tracking issue #1472](https://github.com/kubernetes/enhancements/issues/1472). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a concept
page, linking to the enhancement tracking issue does not add much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I beg to differ here. For alpha features, one key question for users is when the feature will graduate or how it will change during future development. The enhancement issue is where they can find answers to those questions.
I can remove it in a follow-up PR if this argument is convincing, but let's merge this PR first, okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
needed. | ||
|
||
Tracking storage capacity is supported for [{{< glossary_tooltip | ||
text="Container Storage Interface" term_id="csi" >}} (CSI) drivers](https://kubernetes-csi.github.io/docs/drivers.html) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Did the link work out as expected? The tooltip is working.
I clicked on CSI
and was forwarded to https://deploy-preview-21634--kubernetes-io-vnext-staging.netlify.app/docs/concepts/storage/volumes/#csi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it looks like the tooltip overrides the link destination. I've dropped the link. We can add it back under "enabling" once that table has information about drivers that support the feature (assuming that we consider it worth calling out there).
## API | ||
|
||
There are two API extensions for this feature: | ||
- [CSIStorageCapacity](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csistoragecapacity-v1alpha1-storage-k8s-io) objects: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these resource links will work once v1.19 is released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
Thanks @pohly . Noted a few questions. Generally, looks good. |
Hi @pohly 👋 1.19 docs shadow here, a friendly reminder that docs deadline is tomorrow. Pinging you to address the review suggestions above so we can get approvals from both tech/docs to merged in! thank you |
05e9530
to
7f06adc
Compare
/lgtm |
@pohly , Would you rebase? I am looking again. |
This is the initial documentation for one new feature: - kubernetes/enhancements#1472 Co-authored-by: Tim Bannister <tim@scalefactory.com>
7f06adc
to
19b8f84
Compare
@kbhawkey: rebased. |
@pohly , Thanks! |
@savitharaghunathan , Ready to merge? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: savitharaghunathan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is the documentation for CSIStorageCapacity, which was merged for 1.19 today:
The PR used to contain also documentation for GenericEphemeralVolume, but I took that out because documentation wasn't as closely tied as I first thought and the featured hasn't been merged yet.
Therefore this PR is now ready for review and (eventually) merging.