Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for rsvd accounting hugetlb cgroup #2360

Closed

Conversation

odinuge
Copy link
Contributor

@odinuge odinuge commented Apr 28, 2020

The previous non-rsvd max/limit_in_bytes does not account for reserved
huge page memory, making it possible for a processes to reserve all the
huge page memory, without being able to allocate it (due to cgroup
restrictions).

In practice this makes it possible to successfully mmap more huge page
memory than allowed via the cgroup settings, but when using the memory
the process will get a SIGBUS and crash. This is bad for applications
trying to mmap at startup (and it succeeds), but the program crashes
when starting to use the memory. eg. postgres is doing this by default.

This also keeps writing to the old max/limit_in_bytes, to make sure some
applications read the wrong value.

More info can be found here: https://lkml.org/lkml/2020/2/3/1153


Do we have to edit the runtime-spec in order to do this?

Also, this will fix patroni/patroni#1393 (ref. the postgres part at the top ^)

The previous non-rsvd max/limit_in_bytes does not account for reserved
huge page memory, making it possible for a processes to reserve all the
huge page memory, without being able to allocate it (due to cgroup
restrictions).

In practice this makes it possible to successfully mmap more huge page
memory than allowed via the cgroup settings, but when using the memory
the process will get a SIGBUS and crash. This is bad for applications
trying to mmap at startup (and it succeeds), but the program crashes
when starting to use the memory. eg. postgres is doing this by default.

This also keeps writing to the old max/limit_in_bytes, to make sure some
applications read the wrong value.

More info can be found here: https://lkml.org/lkml/2020/2/3/1153

Signed-off-by: Odin Ugedal <odin@ugedal.com>
@odinuge odinuge force-pushed the hugetlb-reservation-accounting branch from d8fe1b1 to 5c84b1a Compare April 28, 2020 13:08
func (s *HugetlbGroup) Set(path string, cgroup *configs.Cgroup) error {
supportsReservationAccounting := s.HasReservationAccountingSupport(path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the best way to check, or should we try to "cache" the value like we do with HugePageSizes?

for _, pagesize := range hugePageSizes {
usage := strings.Join([]string{"hugetlb", pagesize, "current"}, ".")
filenamePrefix := strings.Join([]string{"hugetlb", pagesize}, ".")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe it would be better to have it as

filenamePrefix := "hugetlb."+pagesize

(for readability)

filenamePrefix += ".rsvd"
}

usage := fmt.Sprintf("%s.current", filenamePrefix)
value, err := fscommon.GetCgroupParamUint(dirPath, usage)
if err != nil {
return errors.Wrapf(err, "failed to parse hugetlb.%s.current file", pagesize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message from GetCgroupParamUint already contain file name, so you can return the error as-is, no need to wrap it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the error now returns the wrong file name in case supportsReservationAccounting is set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing this should be done as a separate first patch I think.

value, err := fscommon.GetCgroupParamUint(dirPath, usage)
if err != nil {
return errors.Wrapf(err, "failed to parse hugetlb.%s.current file", pagesize)
}
hugetlbStats.Usage = value

fileName := strings.Join([]string{"hugetlb", pagesize, "events"}, ".")
fileName := fmt.Sprintf("%s.events", filenamePrefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: using fileName := filenamePrefix + ".events" would be faster

but either way is fine

// is supported. This is supported from linux 5.7
// https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/hugetlb.html
func HasReservationAccountingSupport(dirPath string) bool {
hugePageSizes, err := cgroups.GetHugePageSize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it makes sense to do this check once, using sync.Once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or not... since different cgroups can have different controls I guess...

@@ -65,6 +70,58 @@ func TestHugetlbSetHugetlb(t *testing.T) {
}
}

func TestHugetlbSetHugetlbWithReservedAccounting(t *testing.T) {
helper := NewCgroupTestUtil("hugetlb", t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this test be skipped if !HasReservationAccountingSupport()?

if len(HugePageSizes) == 0 {
return false
}
_, err := fscommon.ReadFile(path, strings.Join([]string{"hugetlb", HugePageSizes[0], "rsvd", "limit_in_bytes"}, "."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use cgroups.PathExists here

if err != nil || len(hugePageSizes) == 0 {
return false
}
_, err = fscommon.ReadFile(dirPath, strings.Join([]string{"hugetlb", hugePageSizes[0], "rsvd", "max"}, "."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use cgroups.PathExists()

@kolyshkin
Copy link
Contributor

Do we have to edit the runtime-spec in order to do this?

I'm afraid yes. Reservation and use are two different properties, and we should not mix them together.

@kolyshkin
Copy link
Contributor

So, @odinuge, I think this should start with a PR to https://github.com/opencontainers/runtime-spec. Once merged, we can open a PR here (and most of the comments that I left reviewing this are still valid).

@kolyshkin
Copy link
Contributor

I'm working on reviving this PR now, once the spec is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bus error during database system initialization
2 participants