Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcontainer: intelrdt: add update command support #1590

Conversation

xiaochenshen
Copy link
Contributor

@xiaochenshen xiaochenshen commented Sep 8, 2017

Add runc update command support for Intel RDT/CAT.

for example:
runc update --l3-cache-schema "L3:0=f;1=f" container-id

For more information about Intel Resource Director Technology (RDT)
and Cache Allocation Technology (CAT), please refer to #1279

Signed-off-by: Xiaochen Shen xiaochen.shen@intel.com

update.go Outdated
@@ -254,6 +259,18 @@ other options are ignored.
config.Cgroups.Resources.MemorySwap = *r.Memory.Swap
config.Cgroups.Resources.PidsLimit = r.Pids.Limit

if intelrdt.IsEnabled() && config.IntelRdt != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to just set the value and let the libcontainer rtd code validate and return an error if its not enabled? Right now, it would just fail silently and no one would know if its set or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crosbymichael

I have made changes according to your suggestion to avoid silent failure.
There are several cases in error paths:
(1) IsEnabled == false && intelRdt is specified in config
(2) IsEnabled == true && intelRdt is not specified in config
(3) IsEnabled == true && intelRdt is specified in config && update value is invalid

(1): It is handled by validate.Validator(). runc create/run will fail. It never pass to update command.
(2): We'd better return an error that the update command is failed before container.Set(). Just passing through container.Set() can't handle this case:
(3): We just set the value, intelrdt.Set() will return an error that the update value is invalid.

@xiaochenshen xiaochenshen force-pushed the rdt-cat-support-update-command branch 2 times, most recently from 5c51b38 to 6056263 Compare September 9, 2017 00:45
@crosbymichael
Copy link
Member

crosbymichael commented Sep 11, 2017

You are making things wayyy more complicated than they need to be.

if val := context.String("l3-cache-schema"); val != "" {
    config.IntelRtd = &configs.IntelRdt{
        L3CacheSchema: val,
    }
}

or if you have to check if it already exists in the config:

if val := context.String("l3-cache-schema"); val != "" {
    if config.IntelRtd == nil {
        config.IntelRtd = &configs.IntelRtd{}
    }
    config.IntelRtd.L3CacheSchema = val
}

@xiaochenshen
Copy link
Contributor Author

@crosbymichael

Thank you. I have updated the code according to your suggestion.
And also added error return in front to avoid silent failure passing through container.Set().

update.go Outdated
@@ -254,6 +260,17 @@ other options are ignored.
config.Cgroups.Resources.MemorySwap = *r.Memory.Swap
config.Cgroups.Resources.PidsLimit = r.Pids.Limit

if val := context.String("l3-cache-schema"); val != "" {
// Avoid silent failure in container.Set()
if !intelrdt.IsEnabled() || config.IntelRdt == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would config.IntelRdt == nil be a case for an error?

Copy link
Contributor Author

@xiaochenshen xiaochenshen Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crosbymichael

Why would config.IntelRdt == nil be a case for an error?

Sorry, this is not straightforward.

  1. If linux.intelRdt is not specified in config.json, always config.IntelRdt == nil (https://github.com/opencontainers/runc/blob/master/libcontainer/specconv/spec_linux.go#L253-L258)
  2. In update.go, config := container.Config() (https://github.com/opencontainers/runc/blob/master/update.go#L150), we get config.IntelRdt == nil, which implies linux.intelRdt is not specified in config (even though intelrdt.IsEnabled() == true).
  3. In my understanding, this is an error case in updateCommand because intelRdt is not managed in container processes management life cycle - no Apply(), Destroy() and etc.,
  4. If it passes to container.Set(), when linux.intelRdt is not specified in config.json, intelRdtManager == nil, IntelRdtManager.Set() will never be invoked to update config or to check the input (https://github.com/opencontainers/runc/blob/master/libcontainer/container_linux.go#L208-L216), and this is a silent failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a bug to me. If someone wants to add an intelrdt limit after the fact (where they didn't specify any options in the original configuration) that should be possible -- is this a kernel bug or just a limitation of runc at the moment? We allow you to do that with all of the other cgroups from memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyphar

That sounds like a bug to me. If someone wants to add an intelrdt limit after the fact (where they didn't specify any options in the original configuration) that should be possible -- is this a kernel bug or just a limitation of runc at the moment? We allow you to do that with all of the other cgroups from memory.

This is what I intended to do for Intel RDT/CAT support in runc as a tradeoff. In my opinion, this is neither a kernel bug nor a runc limitation. Let me explain here:

Unlike cgroups' hierarchy, Intel RDT resource control filesystem in kernel supports only single level filesystem layout. (see details in section "Intel RDT "resource control" filesystem hierarchy" in #433 ). For Cache Allocation Technology (CAT) feature, it only supports limited number of groups. It is hardware limitation by nature. The limitation is indicated in /sys/fs/resctrl/info/L3/num_closids.

In runc case, we could only create limited number <container_id> directories under /sys/fs/resctrl via Apply(). So we need a tradeoff: we Apply() intelrdt constraint on a container only if hardware and kernel support Intel RDT feature and intelRdt is specified in original configuration. Otherwise, if hardware and kernel support the feature, we will run out of the limitation quickly when started a lot of containers instances, even though the user doesn't care about Intel RDT/CAT feature or doesn't want to add intelrdt constraint for some container instances.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaochenshen Alright, that makes sense (as in why it's not that way at the moment) -- but that just sounds like we need code to create a new rdtgroup if we're doing an update? I think just using Apply might be enough, but I would have to check the code again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyphar

but that just sounds like we need code to create a new rdtgroup if we're doing an update? I think just using Apply might be enough, but I would have to check the code again.

I will have a look how to do it in update command. Thank you for the suggestion.

@crosbymichael
Copy link
Member

crosbymichael commented Sep 12, 2017

LGTM

Approved with PullApprove

update.go Outdated
}
if config.IntelRdt == nil {
config.IntelRdt = &configs.IntelRdt{}
}
Copy link
Contributor Author

@xiaochenshen xiaochenshen Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crosbymichael
This if condition if config.IntelRdt == nil is always false, because error returns above when config.IntelRdt == nil. I will remove the if condition lines.

Then the code will look like:

        Action: func(context *cli.Context) error {
                ...

                if val := context.String("l3-cache-schema"); val != "" {
                        // Avoid silent failure in container.Set()
                        if !intelrdt.IsEnabled() || config.IntelRdt == nil {
                                return fmt.Errorf("l3 cache schema is not enabled or not specified in config, update value '%s' is discarded", val)
                        }
                        config.IntelRdt.L3CacheSchema = val
                }

                return container.Set(config)
        },

update.go Outdated
if val := context.String("l3-cache-schema"); val != "" {
// Avoid silent failure in container.Set()
if !intelrdt.IsEnabled() || config.IntelRdt == nil {
return fmt.Errorf("l3 cache schema is not enabled or not specified in config, update value '%s' is discarded", val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error should be shorter, just remove the last bit , update value '%s' is discarded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyphar

Thank you for review. This sounds fine to me.

In current implementation:
Either Intel RDT is not enabled by hardware and kernel, or intelRdt is
not specified in original config, we don't init IntelRdtManager in the
container to handle intelrdt constraint. It is a tradeoff that Intel RDT
has hardware limitation to support only limited number of groups.

This patch makes a minor change to support update command:
Whether or not intelRdt is specified in config, we always init
IntelRdtManager in the container if Intel RDT is enabled. If intelRdt is
not specified in original config, we just don't Apply() to create
intelrdt group or attach tasks for this container.

In update command, we could re-enable through IntelRdtManager.Apply()
and then update intelrdt constraint.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Add runc update command support for Intel RDT/CAT.

for example:
runc update --l3-cache-schema "L3:0=f;1=f" <container-id>

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
@xiaochenshen
Copy link
Contributor Author

@crosbymichael @cyphar
I have submitted 2 new commits according to @cyphar 's suggestion.
Please help review. Thank you.

NOTE:
Without the first patch, either Intel RDT is not enabled by hardware and kernel, or intelRdt is not specified in original config, we don't init IntelRdtManager in the container to handle intelrdt constraint. It is a tradeoff that Intel RDT has hardware limitation to support only limited number of groups.

The first patch makes a minor change to support update command:
Whether or not intelRdt is specified in config, we always init IntelRdtManager in the container if Intel RDT is enabled. If intelRdt is not specified in original config, we just don't Apply() to create intelrdt group or attach tasks for this container.

In the second patch, in update command, we could re-enable through IntelRdtManager.Apply() and then update intelrdt constraint.

@xiaochenshen
Copy link
Contributor Author

Ping @crosbymichael @cyphar

Could you have a look at your convenience? Thank you.
Please find the explanation in "NOTE" in my previous comment for how it works in update command.

@mrunalp
Copy link
Contributor

mrunalp commented Oct 10, 2017

LGTM

Approved with PullApprove

1 similar comment
@crosbymichael
Copy link
Member

crosbymichael commented Oct 10, 2017

LGTM

Approved with PullApprove

@crosbymichael crosbymichael merged commit 4693fae into opencontainers:master Oct 10, 2017
xiaochenshen added a commit to xiaochenshen/runc that referenced this pull request Oct 17, 2017
This fixes a GetStats() issue introduced in opencontainers#1590:
If Intel RDT is enabled by hardware and kernel, but intelRdt is not
specified in original config, GetStats() will return error unexpectedly
because we haven't called Apply() to create intelrdt group or attach
tasks for this container. As a result, runc events command will have no
output.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants