Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcontainer: intelrdt: add support for Intel RDT/MBA in runc #1632

Merged
merged 6 commits into from
Oct 16, 2018

Conversation

xiaochenshen
Copy link
Contributor

@xiaochenshen xiaochenshen commented Oct 31, 2017

This PR is the runc part of #1596
The runtime-spec part: opencontainers/runtime-spec#932

Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags rdt_a and
mba will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:

mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for runc, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in #1279. We could also make
use of tasks and schemata configuration for memory bandwidth
resource constraints.

The file tasks has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file schemata has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen xiaochen.shen@intel.com

@xiaochenshen
Copy link
Contributor Author

@crosbymichael @cyphar @mrunalp @hqhq
/cc @rjnagal @vmarmol @dqminh

Could you help code review at your convenience?
Thank you.

@xiaochenshen
Copy link
Contributor Author

xiaochenshen commented Sep 5, 2018

@crosbymichael @cyphar @mrunalp @hqhq @rjnagal @vmarmol @dqminh

Hi Maintainers,
This PR has been pending for a long time. Could you help code review at your convenience? Thank you!
I have rebased this PR to current master branch.

This PR is the runc part of #1596
The runtime-spec part: opencontainers/runtime-spec#932

@xiaochenshen
Copy link
Contributor Author

@crosbymichael @cyphar @mrunalp @hqhq @rjnagal @vmarmol @dqminh

I have rebased this PR to current master branch.
Could you help code review at your convenience? Thank you!

This PR is the runc part of proposal #1596
The runtime-spec part: opencontainers/runtime-spec#932

@crosbymichael
Copy link
Member

crosbymichael commented Sep 17, 2018

LGTM

Approved with PullApprove

@xiaochenshen
Copy link
Contributor Author

@crosbymichael
Really appreciated for your kind code review.

@cyphar @mrunalp @hqhq @rjnagal @vmarmol @dqminh
Could you help code review and comment at your convenience?
Thank you in advance!

@hqhq
Copy link
Contributor

hqhq commented Oct 14, 2018

LGTM but needs rebase, @xiaochenshen

Approved with PullApprove

@xiaochenshen
Copy link
Contributor Author

@hqhq

LGTM but needs rebase, @xiaochenshen

Thank you very much for helping code review!
I have rebased this PR against latest master branch. Thank you.
$ git log --pretty=oneline
61817b9 libcontainer: intelrdt: Add more check if sub-features are enabled
f82b45a libcontainer: intelrdt: add test cases for Intel RDT/MBA
8cbac68 libcontainer: intelrdt: add update command support for Intel RDT/MBA
e475370 libcontainer: intelrdt: add support for Intel RDT/MBA in runc
c17973f libcontainer: intelrdt: add Intel RDT/MBA docs in SPEC.md
45ff600 [Don't merge] vendor: specs-go: update runtime-spec for Intel RDT/MBA

@hqhq
Copy link
Contributor

hqhq commented Oct 15, 2018

LGTM Thanks for you work!

Approved with PullApprove

@xiaochenshen
Copy link
Contributor Author

@crosbymichael @hqhq
/cc @cyphar @mrunalp @rjnagal @vmarmol @dqminh

Thank you for helping code review.

This PR has vendor dependency of runtime-spec for Intel RDT/MBA config (opencontainers/runtime-spec#932), But the runtime-spec change has not been updated in runc master branch yet.

Currently, this commit is to address runtime-spec dependency temporarily: 45ff600 [Don't merge] vendor: specs-go: update runtime-spec for Intel RDT/MBA

Do you have any suggestion to make the vendor dependency ready in runc? To update vendor.conf?

- github.com/opencontainers/runtime-spec v1.0.0
+ github.com/opencontainers/runtime-spec 5684b8af48c1ac3b1451fa499724e30e3c20a294

Thank you for help.

@crosbymichael
Copy link
Member

crosbymichael commented Oct 15, 2018

LGTM

Approved with PullApprove

@crosbymichael
Copy link
Member

You need to update the vendor in runc

Update runtime-spec to get Intel RDT/MBA Linux configs which will be
used in successive commits.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and
`mba` will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in opencontainers#1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Double check if Intel RDT sub-features are available in "resource
control" filesystem. Intel RDT sub-features can be selectively disabled
or enabled by kernel command line (e.g., rdt=!l3cat,mba) in 4.14 and
newer kernel.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
@xiaochenshen
Copy link
Contributor Author

@crosbymichael @hqhq
/cc @cyphar @mrunalp @rjnagal @vmarmol @dqminh

I have rebased this PR against latest master branch with two changes:

  1. Update vendor runtime-spec to support Intel RDT/MBA (commit bd90541)
  2. Address the conflict with newly merged PR Disable rootless mode except RootlessCgMgr when executed as the root in userns (fix Docker-in-LXD regression) #1862 (commit 27560ac)

$ git log --pretty=oneline
d59b17d libcontainer: intelrdt: Add more check if sub-features are enabled
f097339 libcontainer: intelrdt: add test cases for Intel RDT/MBA
1ed597b libcontainer: intelrdt: add update command support for Intel RDT/MBA
27560ac libcontainer: intelrdt: add support for Intel RDT/MBA in runc
c1cece7 libcontainer: intelrdt: add Intel RDT/MBA docs in SPEC.md
bd90541 vendor: bump runtime-spec to 5684b8af48c1

@hqhq
Copy link
Contributor

hqhq commented Oct 16, 2018

LGTM

Approved with PullApprove

1 similar comment
@crosbymichael
Copy link
Member

crosbymichael commented Oct 16, 2018

LGTM

Approved with PullApprove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants