-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update huge pages KEP for container isolation of huge pages #1199
Conversation
Welcome @bg-chun! |
Hi @bg-chun. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
74028e4
to
2c9ee00
Compare
2c9ee00
to
54911ed
Compare
/ok-to-test |
@bg-chun: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
keps/sig-node/20190129-hugepages.md
Outdated
- [Phase 2](#Phase-2) | ||
- [Support container isolation of huge pages](#support-container-isolation-of-huge-pages) | ||
- [Enhance Node Allocatable feature to reserve huge pages for system](#enhance-node-allocatable-feature-to-reserve-huge-pages-for-system) | ||
- [cAdviser changes(Phase 2)](#cAdviser-changes(Phase-2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/cAdviser/cAdvisor/g
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed :)
keps/sig-node/20190129-hugepages.md
Outdated
To support NUMA, `cAdviser` should discover and store pre-allocated huge pages per NUMA node. The `v3` version of `MachineInfo` will be introduced. | ||
|
||
#### Enhance Node Allocatable feature to reserve huge pages for system | ||
Some system services like `OVS-DPDK` comsume huge pages per NUMA node, to determine the allocatalbe number of huge pages in `kubelet`, `Node Allocatable feature` should support to reserve huge pages per NUMA node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understand this correct; do you mean that huge pages per NUMA node should be node resources in k8s in the same way as huge pages (and cpu, memory, and empherial storage) is today? Or do you mean that the data should be stored internally, so that it can be utilized by the topology manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, the data which represents pre-allocated hugepages per NUMA node will be additional information for future usage(Memory Manager or maybe Topology Manager).
So, I mean the data should be stored internally.
And I believe that the node resources that you mentioned are tightly coupled with node scheduler.
@derekwaynecarr |
/ok-to-test |
Oh... it seems that CI does not allow updating the table of contents of existing KEP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle memory restriction per NUMA node it's necessary feature for DPDK like applications. But not so clear how to guarantee in common case the memory allocation from specific NUMA node (e.g. in case of anonymous hugepages). The parent process of the container can bind NUMA node for memory allocation (e.g. with libnuma), but process itself can rebind it.
keps/sig-node/20190129-hugepages.md
Outdated
|
||
#### Support container isolation of huge pages | ||
|
||
Container isolation of huge pages should be supported to avoid competition between containers to consume huge pages. Currently, `kubelet` sets the agregated huge pages limits on pod's cgroup of hugetlb subsystem. This should be enhanced to set limits on container's cgroup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of cgroup are you going to use? Current memory cgroup doesn't have support of limiting NUMA nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Container isolation of hugepages does not mean limiting NUMA nodes; it's a related issue but totally different.
At this point, Kubelet supports hugepages in pod isolation, not container isolation.
This is mentioned on an official website(see future section).
And this issue provides more details.
Supporting Pod isolation of hugepage means that the below situation that can happen.
- The pod has two containers
(container A and B). - Each of the containers requests 2 of 1GB-Hugepages as a resource.
(container A: 2GB / container B : 2GB) - But Kubelet set a limitation as 4GB on pod's cgroup of hugetlb subsystem.
- It means a single container in a pod can consume 4GB maximally.
- If a container consumes 2GB maximally, there will be no issue.
But if a container consumes 4GB, another one cannot consume any hugepage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of cgroup are you going to use?
=> To limiting NUMA nodes, I am going to use cpuset subsystem(cpuset.mems).
But this KEP and updates will not cover it, Memory Manager KEP will cover it.
Below PR updates cAdvisor to discover and store pre-allocated huge pages per NUMA node as mentioned in KEP updates. |
255b299
to
294a606
Compare
It seems that to set a limit of hugepages over CRI message, But, I'm not sure whether the additional update of This week, I will check it then I will leave the result of this work. |
@derekwaynecarr , @bart0sh, @kad , @odinuge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the update. merging the kep update, and we can discuss implementation separately.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bg-chun, derekwaynecarr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@bg-chun for hugetlbfs, the writing container will be charged for its usage (similar to memory). assuming more than one container wants to write to hugetlbfs, requests alone should be sufficient, as limits should have been limiting the local container. |
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
update cri-api vendor to include hugepages changes KEP: kubernetes/enhancements#1199 CRI: kubernetes/kubernetes#83614 Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
`hugepages-2Mi: 4Mi`, but invalid to request `hugepages-2Mi: 3Mi`. | ||
|
||
The request and limit for `hugepages-<hugepagesize>` must match. Similar to | ||
memory, an application that requests `hugepages-<hugepagesize>` resource is at | ||
minimum in the `Burstable` QoS class. | ||
|
||
If multiple containers consume huge pages in the same pod, the request must be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would want to differentiate between init containers and regular containers, since the init containers are all guaranteed to be done when the regular containers start up.
Propose new enhancement for hugepage.
More information is available in below issue.
kubernetes/kubernetes#80716