Skip to content

Commit

Permalink
libcontainer: add support for Intel RDT/CAT in runc
Browse files Browse the repository at this point in the history
About Intel RDT/CAT feature:
Intel platforms with new Xeon CPU support Intel Resource Director Technology
(RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which
currently supports L3 cache resource allocation.

This feature provides a way for the software to restrict cache allocation to a
defined 'subset' of L3 cache which may be overlapping with other 'subsets'.
The different subsets are identified by class of service (CLOS) and each CLOS
has a capacity bitmask (CBM).

For more information about Intel RDT/CAT can be found in the section 17.17
of Intel Software Developer Manual.

About Intel RDT/CAT kernel interface:
In Linux kernel, the interface is defined and exposed via "resource control"
filesystem, which is a "cgroup-like" interface.

Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|       |-- cbm_mask
|       |-- num_closids
|-- cpus
|-- schemata
|-- tasks
|-- <container_id>
    |-- cpus
    |-- schemata
    |-- tasks

For runc, we can make use of `tasks` and `schemata` configuration for L3 cache
resource constraints.

 The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file  (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent. If a pid is not in any sub group, it
Is in root group.

The file `schemata` has allocation bitmasks/values for L3 cache on each socket,
which contains L3 cache id and capacity bitmask (CBM).
	Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0`
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.

The valid L3 cache CBM is a *contiguous bits set* and number of bits that can
be set is less than the max bit. The max bits in the CBM is varied among
supported Intel Xeon platforms. In Intel RDT "resource control" filesystem
layout, the CBM in a group should be a subset of the CBM in root. Kernel will
check if it is valid when writing. e.g., 0xfffff in root indicates the max bits
of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM
values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.

For more information about Intel RDT/CAT kernel interface:
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/cache&id=f20e57892806ad244eaec7a7ae365e78fee53377

An example for runc:
There are two L3 caches in the two-socket machine, the default CBM is 0xfffff
and the max CBM length is 20 bits. This configuration assigns 4/5 of L3 cache
id 0 and the whole L3 cache id 1 for the container:

"linux": {
	"resources": {
		"intelRdt": {
			"l3CacheSchema": "L3:0=ffff0;1=fffff"
		}
	}
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
  • Loading branch information
xiaochenshen committed Nov 22, 2016
1 parent 57568a1 commit 540606e
Show file tree
Hide file tree
Showing 15 changed files with 726 additions and 66 deletions.
21 changes: 16 additions & 5 deletions events.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,12 @@ type event struct {

// stats is the runc specific stats structure for stability when encoding and decoding stats.
type stats struct {
Cpu cpu `json:"cpu"`
Memory memory `json:"memory"`
Pids pids `json:"pids"`
Blkio blkio `json:"blkio"`
Hugetlb map[string]hugetlb `json:"hugetlb"`
Cpu cpu `json:"cpu"`
Memory memory `json:"memory"`
Pids pids `json:"pids"`
Blkio blkio `json:"blkio"`
Hugetlb map[string]hugetlb `json:"hugetlb"`
IntelRdt intelRdt `json:"intelRdt"`
}

type hugetlb struct {
Expand Down Expand Up @@ -95,6 +96,12 @@ type memory struct {
Raw map[string]uint64 `json:"raw,omitempty"`
}

type intelRdt struct {
// The read-only default "schemas" in root, for reference
L3CacheSchemaRoot string `json:"l3CacheSchemaRoot,omitempty"`
L3CacheSchema string `json:"l3CacheSchema,omitempty"`
}

var eventsCommand = cli.Command{
Name: "events",
Usage: "display container events such as OOM notifications, cpu, memory, and IO usage statistics",
Expand Down Expand Up @@ -223,6 +230,10 @@ func convertLibcontainerStats(ls *libcontainer.Stats) *stats {
for k, v := range cg.HugetlbStats {
s.Hugetlb[k] = convertHugtlb(v)
}

is := cg.IntelRdtStats
s.IntelRdt.L3CacheSchemaRoot = is.IntelRdtRootStats.L3CacheSchema
s.IntelRdt.L3CacheSchema = is.IntelRdtGroupStats.L3CacheSchema
return &s
}

Expand Down
3 changes: 3 additions & 0 deletions libcontainer/cgroups/cgroups.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ type Manager interface {

// Sets the cgroup as configured.
Set(container *configs.Config) error

// Get non-cgroup resource path
GetResourcePath() string
}

type NotFoundError struct {
Expand Down
102 changes: 78 additions & 24 deletions libcontainer/cgroups/fs/apply_raw.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ var (
&PerfEventGroup{},
&FreezerGroup{},
&NameGroup{GroupName: "name=systemd", Join: true},
// If Intel RDT is enabled, will append IntelRdtGroup later
}
HugePageSizes, _ = cgroups.GetHugePageSize()
)
Expand Down Expand Up @@ -62,9 +63,11 @@ type subsystem interface {
}

type Manager struct {
mu sync.Mutex
Cgroups *configs.Cgroup
Paths map[string]string
mu sync.Mutex
Cgroups *configs.Cgroup
Paths map[string]string
ContainerId string
ResourcePath string
}

// The absolute path to the root of the cgroup hierarchies.
Expand Down Expand Up @@ -94,10 +97,11 @@ func getCgroupRoot() (string, error) {
}

type cgroupData struct {
root string
innerPath string
config *configs.Cgroup
pid int
root string
innerPath string
config *configs.Cgroup
pid int
containerId string
}

func (m *Manager) Apply(pid int) (err error) {
Expand All @@ -109,7 +113,7 @@ func (m *Manager) Apply(pid int) (err error) {

var c = m.Cgroups

d, err := getCgroupData(m.Cgroups, pid)
d, err := getCgroupData(m.Cgroups, pid, m.ContainerId)
if err != nil {
return err
}
Expand All @@ -131,23 +135,38 @@ func (m *Manager) Apply(pid int) (err error) {
}

paths := make(map[string]string)

// If Intel RDT is enabled, append IntelRdtGroup to subsystems
if IsIntelRdtEnabled() && m.Cgroups.Resources.IntelRdtL3CacheSchema != "" {
subsystems = append(subsystems, &IntelRdtGroup{})
intelRdtPath, err := GetIntelRdtPath(m.ContainerId)
if err != nil {
return err
}
m.ResourcePath = intelRdtPath
}

for _, sys := range subsystems {
if err := sys.Apply(d); err != nil {
return err
}
// TODO: Apply should, ideally, be reentrant or be broken up into a separate
// create and join phase so that the cgroup hierarchy for a container can be
// created then join consists of writing the process pids to cgroup.procs
p, err := d.path(sys.Name())
if err != nil {
// The non-presence of the devices subsystem is
// considered fatal for security reasons.
if cgroups.IsNotFound(err) && sys.Name() != "devices" {
continue

// Intel RDT "resource control" filesystem is not in cgroup path
if sys.Name() != "intel_rdt" {
// TODO: Apply should, ideally, be reentrant or be broken up into a separate
// create and join phase so that the cgroup hierarchy for a container can be
// created then join consists of writing the process pids to cgroup.procs
p, err := d.path(sys.Name())
if err != nil {
// The non-presence of the devices subsystem is
// considered fatal for security reasons.
if cgroups.IsNotFound(err) && sys.Name() != "devices" {
continue
}
return err
}
return err
paths[sys.Name()] = p
}
paths[sys.Name()] = p
}
m.Paths = paths
return nil
Expand All @@ -163,6 +182,12 @@ func (m *Manager) Destroy() error {
return err
}
m.Paths = make(map[string]string)

// Intel RDT "resource control" filesystem
if m.ResourcePath != "" {
return os.RemoveAll(m.ResourcePath)
}
m.ResourcePath = ""
return nil
}

Expand All @@ -173,6 +198,13 @@ func (m *Manager) GetPaths() map[string]string {
return paths
}

func (m *Manager) GetResourcePath() string {
m.mu.Lock()
path := m.ResourcePath
m.mu.Unlock()
return path
}

func (m *Manager) GetStats() (*cgroups.Stats, error) {
m.mu.Lock()
defer m.mu.Unlock()
Expand All @@ -186,6 +218,24 @@ func (m *Manager) GetStats() (*cgroups.Stats, error) {
return nil, err
}
}

// Intel RDT "resource control" filesystem stats
if IsIntelRdtEnabled() && m.Cgroups.Resources.IntelRdtL3CacheSchema != "" {
intelRdtPath, err := GetIntelRdtPath(m.ContainerId)
if err != nil || !cgroups.PathExists(intelRdtPath) {
return nil, err
}
sys, err := subsystems.Get("intel_rdt")
if err == errSubsystemDoesNotExist {
// In case IntelRdtGroup is not appended to subsystems
subsystems = append(subsystems, &IntelRdtGroup{})
}
sys, _ = subsystems.Get("intel_rdt")
if err := sys.GetStats(intelRdtPath, stats); err != nil {
return nil, err
}
}

return stats, nil
}

Expand All @@ -199,6 +249,9 @@ func (m *Manager) Set(container *configs.Config) error {
paths := m.GetPaths()
for _, sys := range subsystems {
path := paths[sys.Name()]
if sys.Name() == "intel_rdt" {
path = m.GetResourcePath()
}
if err := sys.Set(path, container.Cgroups); err != nil {
return err
}
Expand Down Expand Up @@ -241,7 +294,7 @@ func (m *Manager) GetAllPids() ([]int, error) {
return cgroups.GetAllPids(paths["devices"])
}

func getCgroupData(c *configs.Cgroup, pid int) (*cgroupData, error) {
func getCgroupData(c *configs.Cgroup, pid int, containerId string) (*cgroupData, error) {
root, err := getCgroupRoot()
if err != nil {
return nil, err
Expand All @@ -262,10 +315,11 @@ func getCgroupData(c *configs.Cgroup, pid int) (*cgroupData, error) {
}

return &cgroupData{
root: root,
innerPath: innerPath,
config: c,
pid: pid,
root: root,
innerPath: innerPath,
config: c,
pid: pid,
containerId: containerId,
}, nil
}

Expand Down
16 changes: 8 additions & 8 deletions libcontainer/cgroups/fs/apply_raw_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ func TestInvalidCgroupPath(t *testing.T) {
Path: "../../../../../../../../../../some/path",
}

data, err := getCgroupData(config, 0)
data, err := getCgroupData(config, 0, "")
if err != nil {
t.Errorf("couldn't get cgroup data: %v", err)
}
Expand Down Expand Up @@ -51,7 +51,7 @@ func TestInvalidAbsoluteCgroupPath(t *testing.T) {
Path: "/../../../../../../../../../../some/path",
}

data, err := getCgroupData(config, 0)
data, err := getCgroupData(config, 0, "")
if err != nil {
t.Errorf("couldn't get cgroup data: %v", err)
}
Expand Down Expand Up @@ -84,7 +84,7 @@ func TestInvalidCgroupParent(t *testing.T) {
Name: "name",
}

data, err := getCgroupData(config, 0)
data, err := getCgroupData(config, 0, "")
if err != nil {
t.Errorf("couldn't get cgroup data: %v", err)
}
Expand Down Expand Up @@ -117,7 +117,7 @@ func TestInvalidAbsoluteCgroupParent(t *testing.T) {
Name: "name",
}

data, err := getCgroupData(config, 0)
data, err := getCgroupData(config, 0, "")
if err != nil {
t.Errorf("couldn't get cgroup data: %v", err)
}
Expand Down Expand Up @@ -150,7 +150,7 @@ func TestInvalidCgroupName(t *testing.T) {
Name: "../../../../../../../../../../some/path",
}

data, err := getCgroupData(config, 0)
data, err := getCgroupData(config, 0, "")
if err != nil {
t.Errorf("couldn't get cgroup data: %v", err)
}
Expand Down Expand Up @@ -184,7 +184,7 @@ func TestInvalidAbsoluteCgroupName(t *testing.T) {
Name: "/../../../../../../../../../../some/path",
}

data, err := getCgroupData(config, 0)
data, err := getCgroupData(config, 0, "")
if err != nil {
t.Errorf("couldn't get cgroup data: %v", err)
}
Expand Down Expand Up @@ -217,7 +217,7 @@ func TestInvalidCgroupNameAndParent(t *testing.T) {
Name: "../../../../../../../../../../some/path",
}

data, err := getCgroupData(config, 0)
data, err := getCgroupData(config, 0, "")
if err != nil {
t.Errorf("couldn't get cgroup data: %v", err)
}
Expand Down Expand Up @@ -250,7 +250,7 @@ func TestInvalidAbsoluteCgroupNameAndParent(t *testing.T) {
Name: "/../../../../../../../../../../some/path",
}

data, err := getCgroupData(config, 0)
data, err := getCgroupData(config, 0, "")
if err != nil {
t.Errorf("couldn't get cgroup data: %v", err)
}
Expand Down
Loading

0 comments on commit 540606e

Please sign in to comment.