-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libcontainer: intelrdt: add support for Intel RDT/MBA in runc #1632
Conversation
204b4d2
to
e4e20ea
Compare
e0e1f8f
to
8b46289
Compare
@crosbymichael @cyphar @mrunalp @hqhq @rjnagal @vmarmol @dqminh Hi Maintainers, This PR is the runc part of #1596 |
8b46289
to
b087922
Compare
@crosbymichael @cyphar @mrunalp @hqhq @rjnagal @vmarmol @dqminh I have rebased this PR to current master branch. This PR is the runc part of proposal #1596 |
LGTM but needs rebase, @xiaochenshen |
Thank you very much for helping code review! |
@crosbymichael @hqhq Thank you for helping code review. This PR has vendor dependency of runtime-spec for Intel RDT/MBA config (opencontainers/runtime-spec#932), But the runtime-spec change has not been updated in runc master branch yet. Currently, this commit is to address runtime-spec dependency temporarily: 45ff600 [Don't merge] vendor: specs-go: update runtime-spec for Intel RDT/MBA Do you have any suggestion to make the vendor dependency ready in runc? To update
Thank you for help. |
You need to update the vendor in runc |
Update runtime-spec to get Intel RDT/MBA Linux configs which will be used in successive commits. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature of Intel Resource Director Technology (RDT) which is supported on some Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate throttle over memory bandwidth for the software. A user controls the resource by indicating the percentage of maximum memory bandwidth. Hardware details of Intel RDT/MBA can be found in section 17.18 of Intel Software Developer Manual: https://software.intel.com/en-us/articles/intel-sdm In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and `mba` will be set in /proc/cpuinfo. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ |-- info | |-- L3 | | |-- cbm_mask | | |-- min_cbm_bits | | |-- num_closids | |-- MB | |-- bandwidth_gran | |-- delay_linear | |-- min_bandwidth | |-- num_closids |-- ... |-- schemata |-- tasks |-- <container_id> |-- ... |-- schemata |-- tasks For MBA support for `runc`, we will reuse the infrastructure and code base of Intel RDT/CAT which implemented in opencontainers#1279. We could also make use of `tasks` and `schemata` configuration for memory bandwidth resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. The file `schemata` has a list of all the resources available to this group. Each resource (L3 cache, memory bandwidth) has its own line and format. Memory bandwidth schema: It has allocation values for memory bandwidth on each socket, which contains L3 cache id and memory bandwidth percentage. Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." The minimum bandwidth percentage value for each CPU model is predefined and can be looked up through "info/MB/min_bandwidth". The bandwidth granularity that is allocated is also dependent on the CPU model and can be looked up at "info/MB/bandwidth_gran". The available bandwidth control steps are: min_bw + N * bw_gran. Intermediate values are rounded to the next control step available on the hardware. For more information about Intel RDT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the minimum memory bandwidth of 10% with a memory bandwidth granularity of 10%. Tasks inside the container may use a maximum memory bandwidth of 20% on socket 0 and 70% on socket 1. "linux": { "intelRdt": { "memBwSchema": "MB:0=20;1=70" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Double check if Intel RDT sub-features are available in "resource control" filesystem. Intel RDT sub-features can be selectively disabled or enabled by kernel command line (e.g., rdt=!l3cat,mba) in 4.14 and newer kernel. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
@crosbymichael @hqhq I have rebased this PR against latest master branch with two changes:
$ git log --pretty=oneline |
This PR is the runc part of #1596
The runtime-spec part: opencontainers/runtime-spec#932
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.
Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm
In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags
rdt_a
andmba
will be set in /proc/cpuinfo.Intel RDT "resource control" filesystem hierarchy:
For MBA support for
runc
, we will reuse the infrastructure and codebase of Intel RDT/CAT which implemented in #1279. We could also make
use of
tasks
andschemata
configuration for memory bandwidthresource constraints.
The file
tasks
has a list of tasks that belongs to this group (e.g.,<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.
The file
schemata
has a list of all the resources available to thisgroup. Each resource (L3 cache, memory bandwidth) has its own line and
format.
Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.
For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt
An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.
Signed-off-by: Xiaochen Shen xiaochen.shen@intel.com