Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CPU Manager related configs #756

Merged
merged 2 commits into from
Aug 23, 2024

Conversation

brandboat
Copy link
Contributor

@brandboat brandboat commented Jun 13, 2024

Problem:
This is a prerequisite pr for implementing CPU Pinning. We need to add the cpu manager configs first. system-reserved, kube-reserved, cpu-manager-policy.

Solution:
Introduced two configs files:

  • rke2-99-z00-harvester-reserved-resources.yaml
    • include two configssystem-reserved and kube-reserved, both of which follow GKE CPU reservation formula for calculating reserved CPU resources.
  • rke2-99-z01-harvester-cpu-manager.yaml
    • include cpu manager policy setting, currently the value is "none" (which is the default value)

Note

Here I add these files instead of adding configs in 90-harvester-server.yaml or 90-harvester-worker.yaml since we will have 99-max-pods.yaml (after upgrade to a newer harvester version) which overrides kubelet args to "max-pods=200".
https://github.com/harvester/harvester/blob/226f72aa58532071232ad548e27970c1cec55733/package/upgrade/upgrade_node.sh#L490-L495

Related Issue:
related to harvester/harvester#2305
HEP: harvester/harvester#5803

Test Plan:
Setup harvester with one node which has 8 CPUs and check if the two config files exist under /etc/rancher/rke2/config.yaml.d/ and have correct settings.
in 99-z00-harvester-reserved-resources.yaml. The content should be

kubelet-arg+:
- "system-reserved=cpu=490m"
- "kube-reserved=cpu=490m"

in 99-z01-harvester-cpu-manager.yaml. The content should be

kubelet-arg+:
- "cpu-manager-policy=none"

@brandboat brandboat force-pushed the add-cpu-pinning-conf branch 3 times, most recently from 2dfd038 to 3645ce8 Compare June 20, 2024 06:38
@brandboat brandboat changed the title [DRAFT] Add cpu pinning related configs Add cpu pinning related configs Jun 24, 2024
@brandboat brandboat marked this pull request as ready for review June 24, 2024 07:55
@brandboat brandboat changed the title Add cpu pinning related configs Add CPU Manager related configs Jul 3, 2024
Copy link
Contributor

@tserong tserong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @brandboat! I have a few questions/comments, just to make sure I understand everything :-)

  • We're setting system-reserved and kube-reserved to the same value. AIUI these are cumulative, so in the case of (for example) a 2 core system, this would reserve 60 + 10 + 400 == 470 for each of those values, meaning a total of 940 mcpu reserved for system + kubernetes usage?

  • Those two config files (/etc/rancher/rke2/config.yaml.d/99-z00-harvester-reserved-resources.yaml and /etc/rancher/rke2/config.yaml.d/99-z01-harvester-cpu-manager.yaml) will be embedded in /oem/90_custom.yaml, which means they'll be rewritten on every system boot, i.e. each host will always start with system-reserved and kube-reserved set as they were during initial installation, and cpu-manager-policy=none. I know we need to later set cpu-manager-policy=static to use CPU pinning, but how do we persist that over a reboot?

  • I assume we'll also need a separate PR for harvester to handle the upgrade case, to add those two config files for existing systems?

@brandboat
Copy link
Contributor Author

Thanks for the review !

We're setting system-reserved and kube-reserved to the same value. AIUI these are cumulative, so in the case of (for example) a 2 core system, this would reserve 60 + 10 + 400 == 470 for each of those values, meaning a total of 940 mcpu reserved for system + kubernetes usage?

Yes you are right, like you mentioned under a two core system, harvester reserved almost one core.

Those two config files (/etc/rancher/rke2/config.yaml.d/99-z00-harvester-reserved-resources.yaml and /etc/rancher/rke2/config.yaml.d/99-z01-harvester-cpu-manager.yaml) will be embedded in /oem/90_custom.yaml, which means they'll be rewritten on every system boot, i.e. each host will always start with system-reserved and kube-reserved set as they were during initial installation, and cpu-manager-policy=none. I know we need to later set cpu-manager-policy=static to use CPU pinning, but how do we persist that over a reboot?

Good catch, I will remove 99-z01-harvester-cpu-manager.yaml from this PR as it shouldn't be override after reboot like you mentioned.

I assume we'll also need a separate PR for harvester to handle the upgrade case, to add those two config files for existing systems?

Yep we do have another PR handle the upgrade path, see https://github.com/harvester/harvester/pull/6190/files#diff-43150fc991c91f07946cd41d2d0311c58a9f5deb4c061e1cf3ed11b9cd3dde3b

@brandboat brandboat marked this pull request as draft August 7, 2024 08:39
@Vicente-Cheng
Copy link
Contributor

Vicente-Cheng commented Aug 8, 2024

Thanks for the review !

We're setting system-reserved and kube-reserved to the same value. AIUI these are cumulative, so in the case of (for example) a 2 core system, this would reserve 60 + 10 + 400 == 470 for each of those values, meaning a total of 940 mcpu reserved for system + kubernetes usage?

Yes you are right, like you mentioned under a two core system, harvester reserved almost one core.

I also have some question about that, just check with below case:
2 cores -> 470 for system/kubernetes -> total 940
8 cores -> 60 + 10 + 10 + 10 + 400 = 490 for both system/kubernetes > total 980
16 cores -> 60 + 10 + 10 + 30 + 400 = 520 for both system/kubernetes > total 1040

Seems most common configuration is reserved almost 1 core for the amount of system/kube service (ratio, 1:1).
And, I checked some info from https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/
Looks the system service could be lower than kube service, because the system service looks only include like ssh/udev.
The kube service, like kubelet, might need more CPU resources.

Maybe we could adjust the system: kube service ratio to 2:3?
WDYT?

Those two config files (/etc/rancher/rke2/config.yaml.d/99-z00-harvester-reserved-resources.yaml and /etc/rancher/rke2/config.yaml.d/99-z01-harvester-cpu-manager.yaml) will be embedded in /oem/90_custom.yaml, which means they'll be rewritten on every system boot, i.e. each host will always start with system-reserved and kube-reserved set as they were during initial installation, and cpu-manager-policy=none. I know we need to later set cpu-manager-policy=static to use CPU pinning, but how do we persist that over a reboot?

Good catch, I will remove 99-z01-harvester-cpu-manager.yaml from this PR as it shouldn't be override after reboot like you mentioned.

And who will handle the config cpu-manager-policy=static when we enable the CPU pinning?
harvester controller?

I assume we'll also need a separate PR for harvester to handle the upgrade case, to add those two config files for existing systems?

Yep we do have another PR handle the upgrade path, see https://github.com/harvester/harvester/pull/6190/files#diff-43150fc991c91f07946cd41d2d0311c58a9f5deb4c061e1cf3ed11b9cd3dde3b

@brandboat
Copy link
Contributor Author

brandboat commented Aug 8, 2024

Thanks for the comment @Vicente-Cheng !

I also have some question about that, just check with below case:
2 cores -> 470 for system/kubernetes -> total 940
8 cores -> 60 + 10 + 10 + 10 + 400 = 490 for both system/kubernetes > total 980
16 cores -> 60 + 10 + 10 + 30 + 400 = 520 for both system/kubernetes > total 1040

Seems most common configuration is reserved almost 1 core for the amount of system/kube service (ratio, 1:1).
And, I checked some info from https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/
Looks the system service could be lower than kube service, because the system service looks only include like ssh/udev.
The kube service, like kubelet, might need more CPU resources.

Maybe we could adjust the system: kube service ratio to 2:3?
WDYT?

Fair enough. I'll make systmReserved = kubeReserved * (2/3). c.c. @bk201, @ibrokethecloud, @tserong

And who will handle the config cpu-manager-policy=static when we enable the CPU pinning?
harvester controller?

Yes, cpu manager controller will submit job to write cpu manager policy config to /etc/rancher/rke2/config.yaml.d/99-z01-harvester-cpu-manager.yaml

@brandboat brandboat force-pushed the add-cpu-pinning-conf branch 2 times, most recently from d620ffc to 24ce709 Compare August 14, 2024 12:21
@brandboat brandboat marked this pull request as ready for review August 15, 2024 03:59
@brandboat brandboat requested review from tserong and Vicente-Cheng and removed request for Vicente-Cheng August 15, 2024 03:59
@bk201 bk201 self-requested a review August 22, 2024 03:06
Copy link
Contributor

@Vicente-Cheng Vicente-Cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, for the ratio between system/kubernetes, we could keep the 1:1 at this moment.
Sorry, I noticed you already changed the ratio.
Now, the system will have 2/3 kube reserved, but the kube reserved did not gain the extra 1/3 from the system.
Could you recheck it?

@Vicente-Cheng Vicente-Cheng self-requested a review August 22, 2024 09:19
@tserong
Copy link
Contributor

tserong commented Aug 23, 2024

Now, the system will have 2/3 kube reserved, but the kube reserved did not gain the extra 1/3 from the system.

Yeah, that's my reading too. Working through a previous example, we started with a system:kube ratio of 1:1:

2 cores -> 470 for system/kubernetes -> total 940

With the most recent change it looks like this gives us a system:kube ration of ~1:1.5, but the overall total is lower than before:

2 cores -> 470 * 2/3 = 313 system + 470 kube -> total 783

I really don't know what the best value or ratio is here... :-/

Add new config file:
- rke2-99-z00-harvester-reserved-resources.yaml:
  which includes system-reserved and kube-reserved CPU resources,
  kube-reserved follow the GKE CPU reservation formula for calculating
  reserved CPU resources, and system-reserved is 2/3 kube-reserved cpu
  resource.

Signed-off-by: Cooper Tseng <cooper.tseng@suse.com>
Signed-off-by: Cooper Tseng <cooper.tseng@suse.com>
Copy link
Contributor

@Vicente-Cheng Vicente-Cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

Copy link
Member

@bk201 bk201 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our supported CPU requirement is 16 for production and 8 for testing. I guess reserving ~1 core for kube/system is OK and we can fine-tune the value if needed.

@brandboat brandboat merged commit 7940f0d into harvester:master Aug 23, 2024
7 checks passed
@brandboat
Copy link
Contributor Author

@Mergifyio backport v1.4

Copy link

mergify bot commented Aug 23, 2024

backport v1.4

✅ Backports have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants