-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race against systemd #1683
Merged
Merged
Fix race against systemd #1683
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/cc @mrunalp |
Looks ok, /cc @cyphar |
You need to sign your commit @vikaschoudhary16 |
- T0: runc triggers a systemd unit creation asynchronously from [here](https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/systemd/apply_systemd.go#L298) - T1: runc then moves ahead and starts creating cgroup paths(.scope directories), [here](https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/systemd/apply_systemd.go#L348). Kernel creates .scope directory and cgroup.procs file(along with other default files) in the directory automatically, in an atomic manner. - T3: systemd execution thread which was invoked at time `T0`, is still in the process of unit creation. systemd also trying to create cgroup paths and deletes the `.scope` directory which is created at time `T1` by runc from [here](https://github.com/systemd/systemd/blob/v219/src/shared/cgroup-util.c#L1630) in the code Signed-off-by: vikaschoudhary16 <choudharyvikas16@gmail.com>
vikaschoudhary16
force-pushed
the
runc-systemd-race
branch
from
January 8, 2018 14:38
1e64325
to
d5b4a3e
Compare
CI issues is addressed in #1682 |
vikaschoudhary16
pushed a commit
to vikaschoudhary16/kubernetes
that referenced
this pull request
Jan 11, 2018
This fixes a race condition in runc/systemd at container creation time opencontainers/runc#1683 Signed-off-by: vikaschoudhary16 <vichoudh@redhat.com>
vikaschoudhary16
added a commit
to vikaschoudhary16/kubernetes
that referenced
this pull request
Jan 12, 2018
This fixes a race condition in runc/systemd at container creation time opencontainers/runc#1683 Signed-off-by: vikaschoudhary16 <vichoudh@redhat.com>
openshift-merge-robot
added a commit
to openshift/origin
that referenced
this pull request
Jan 15, 2018
Automatic merge from submit-queue (batch tested with PRs 18040, 18097, 18098, 18106, 18087). UPSTREAM: opencontainers/runc: 1683: Fix race against systemd Thanks to @vikaschoudhary16 for doing all the legwork upstream on this! Fixes #16246 opencontainers/runc#1683 google/cadvisor#1861 kubernetes/kubernetes#58117 @derekwaynecarr
k8s-github-robot
pushed a commit
to kubernetes/kubernetes
that referenced
this pull request
Jan 15, 2018
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Bump runc to d5b4a3e **What this PR does / why we need it**: This fixes a race condition in runc/systemd at container creation time opencontainers/runc#1683 **Special notes for your reviewer**: Depends on google/cadvisor#1861 `pull-kubernetes-verify` is failing because of this dependency only. **Release note**: ```release-note None ``` /sig node /cc @derekwaynecarr @sjenning @aveshagarwal @vishh @dchen1107 @dashpole @jeremyeder
filbranden
added a commit
to filbranden/runc
that referenced
this pull request
Mar 31, 2018
The channel was introduced in opencontainers#1683 to work around a race condition. However, the check for error in StartTransientUnit ignores the error for an already existing unit, and in that case there will be no notification from DBus (so waiting on the channel will make it hang.) Later PR opencontainers#1754 added a timeout, which worked around the issue, but we can fix this correctly by only waiting on the channel when there is no error. Fix the code to do so. The timeout handling was kept, since there might be other cases where this situation occurs (https://bugzilla.redhat.com/show_bug.cgi?id=1548358 mentions calling this code from inside a container, it's unclear whether an existing container was in use or not, so not sure whether this would have fixed that bug as well.)
filbranden
added a commit
to filbranden/runc
that referenced
this pull request
Apr 9, 2018
The channel was introduced in opencontainers#1683 to work around a race condition. However, the check for error in StartTransientUnit ignores the error for an already existing unit, and in that case there will be no notification from DBus (so waiting on the channel will make it hang.) Later PR opencontainers#1754 added a timeout, which worked around the issue, but we can fix this correctly by only waiting on the channel when there is no error. Fix the code to do so. The timeout handling was kept, since there might be other cases where this situation occurs (https://bugzilla.redhat.com/show_bug.cgi?id=1548358 mentions calling this code from inside a container, it's unclear whether an existing container was in use or not, so not sure whether this would have fixed that bug as well.) Signed-off-by: Filipe Brandenburger <filbranden@google.com>
filbranden
added a commit
to filbranden/runc
that referenced
this pull request
Apr 14, 2018
So that, if a timeout happens and we decide to stop blocking on the operation, the writer will not block when they try to report the result of the operation. This should address Issue opencontainers#1780 and it's a follow up for PR opencontainers#1683, PR opencontainers#1754 and PR opencontainers#1772.
filbranden
added a commit
to filbranden/runc
that referenced
this pull request
Apr 14, 2018
So that, if a timeout happens and we decide to stop blocking on the operation, the writer will not block when they try to report the result of the operation. This should address Issue opencontainers#1780 and it's a follow up for PR opencontainers#1683, PR opencontainers#1754 and PR opencontainers#1772. Signed-off-by: Filipe Brandenburger <filbranden@google.com>
mrunalp
added a commit
to projectatomic/runc
that referenced
this pull request
Jun 12, 2018
opencontainers/runc#1683 opencontainers/runc#1754 opencontainers/runc#1772 opencontainers/runc#1781 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
mrunalp
added a commit
to projectatomic/runc
that referenced
this pull request
Jun 12, 2018
opencontainers/runc#1683 opencontainers/runc#1754 opencontainers/runc#1772 opencontainers/runc#1781 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
@vikaschoudhary16 I still see this issue on runc v1.1.2, containerd 1.6.6, systemd 245 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
T0
, is still in the process of unit creation. systemd also trying to create cgroup paths and deletes the.scope
directory which is created at timeT1
by runc from here in the codeFixes openshift/origin#16246