Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid TLB configuration still creates cgroup, and youki process #328

Closed
YJDoc2 opened this issue Sep 24, 2021 · 4 comments · Fixed by #333
Closed

Invalid TLB configuration still creates cgroup, and youki process #328

YJDoc2 opened this issue Sep 24, 2021 · 4 comments · Fixed by #333
Labels
help wanted Extra attention is needed

Comments

@YJDoc2
Copy link
Collaborator

YJDoc2 commented Sep 24, 2021

@utam0k @Furisto @tsturzl please take a look

I was implementing HugeTLB test as part of #282, and for the invalid TLB version, as per https://github.com/opencontainers/runtime-tools/blob/master/validation/linux_cgroups_hugetlb/linux_cgroups_hugetlb.go#L62-L89 I found an issue :
When testing with runc, the runc process starts, founds out that the TLB config is invalid, and exists with message :

time="2021-09-24T18:47:07+05:30" level=warning msg="signal: killed"
time="2021-09-24T18:47:07+05:30" level=error msg="container_linux.go:349: 
starting container process caused "process_linux.go:449: container init caused "process_linux.go:415:  setting cgroup config for procHooks process caused 
"failed to write "314572800" to "/sys/fs/cgroup/hugetlb/dbac89a0-effa-7e71-8df4-c98facbca452/hugetlb.3MB.limit_in_bytes": 
open /sys/fs/cgroup/hugetlb/dbac89a0-effa-7e71-8df4-c98facbca452/hugetlb.3MB.limit_in_bytes: permission denied"

container_linux.go:349: starting container process caused "process_linux.go:449: 
container init caused \"process_linux.go:415: setting cgroup config for procHooks process caused 
"failed to write "314572800" to "/sys/fs/cgroup/hugetlb/dbac89a0-effa-7e71-8df4-c98facbca452/hugetlb.3MB.limit_in_bytes": 
open /sys/fs/cgroup/hugetlb/dbac89a0-effa-7e71-8df4-c98facbca452/hugetlb.3MB.limit_in_bytes: permission denied"

The runc does not create a cgroup, and trying to get the state using state command also give error that container does not exist.

On the other hand the youki process starts, does not create the container, but still creates the cgroup, and when checking the state, it shows that the container is in the 'stopped' state, when if fact is was not created/ should not have been created.

Another difference I noticed was that runc creates cgroup with container id as name in the root of respective cgroup, for eg :
for id 123abc, runc creates cgroup at path /sys/fs/cgroup/hugetlb/123abc
whereas youki creates a cgroup at path /sys/fs/cgroup/hugetlb/youki/123abc

I'm not sure if the exact path needs to be same across runtime implementations, but if the tools need to manually remove cgroups at end of testing, this will cause issues, as for each runtime, the path might be different.
Can someone verify if the path is same or not with some other runtime ?

@YJDoc2 YJDoc2 added bug help wanted Extra attention is needed labels Sep 24, 2021
@YJDoc2 YJDoc2 changed the title Invalid TLB configuration still creates cgroup, and youki prcess Invalid TLB configuration still creates cgroup, and youki process Sep 24, 2021
@Furisto
Copy link
Collaborator

Furisto commented Sep 24, 2021

Thanks for the report, I will take a look.

Another difference I noticed was that runc creates cgroup with container id as name in the root of respective cgroup, for eg :
for id 123abc, runc creates cgroup at path /sys/fs/cgroup/hugetlb/123abc
whereas youki creates a cgroup at path /sys/fs/cgroup/hugetlb/youki/123abc

If the path is absolute it has to be "cgroup mount point/cgroup_path", so that would not be correct. If it is relative, this can be added to a runtime determined location in the hierarchy but for us this should also be "cgroup mount point/cgroup_path".

@Furisto
Copy link
Collaborator

Furisto commented Sep 24, 2021

@YJDoc2 Are you providing a cgroup path? "Youki" should only be added if no cgroup path is provided. Runc deletes the cgroup if an error is encountered during container creation. I will adapt our code so that it behaves similarly.

@YJDoc2
Copy link
Collaborator Author

YJDoc2 commented Sep 25, 2021

@Furisto No, I am not providing any explicit path for cgroups. To be exact, the command that is run is

runtime_cmd --root /tmp/ID/runtime create ID --bundle /tmp/ID/bundle

where ID is replace by the container id, and rumtime_cmd is either runc or youki.

This exact command makes runc create a directory in /tmp with ID, and with correct config, a cgroups at path /sys/fs/cgroup/hugetlb/ID and keep a runc process running in background for valid configuration. For invalid tlb size, it gives an error and exit without creating cgroup. The state command run after this command also gives error as container does not exist.

For youki this command also creates a directory in /tmp with ID and keeps running a background process for correct config. But even with incorrect config, it still creates cgroups, and creates it at path /sys/fs/cgroup/hugetlb/youki/ID and keeps a youki process running in background, which belong to the ID cgroup. This makes deleting the cgroup not possible, as a process is still part of it, unless we manually kill the process.

Also running the state command after the create command shows container status as stopped, when in fact it should not have been created at all.

Hope this helps, let me know if you need any more information regarding this. The invalid TLB config I am trying is same as runtime-tools , of size 3MB.

@Furisto
Copy link
Collaborator

Furisto commented Sep 25, 2021

I think runc creates the cgroup but then deletes it on error. I kind of liked that containers were grouped under the youki directory, but I guess we should follow the lead of runc here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants