Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.13.1] fix init race #56

Merged

Conversation

kolyshkin
Copy link
Collaborator

@kolyshkin kolyshkin commented Sep 30, 2021

This cherry-picks commits from the following upstream PRs (modulo unrelated test fix):

to docker-1.13.1-rhel branch, in an attemp to fix https://bugzilla.redhat.com/show_bug.cgi?id=2000782

There was a single conflict while cherry-picking commit 2bea4c8, which is caused by missing commit 7335635 (possibly a few more). Resolved manually rather than adding more backported PRs.

Compile-tested only.

The fix needs to be vendored into docker-1.13.1 in extras-rhel-7.9.

wking and others added 6 commits October 4, 2021 09:40
So we can extract more than the start time with a single read.

Signed-off-by: W. Trevor King <wking@tremily.us>
(cherry picked from commit 439eaa3)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
And convert the various start-time properties from strings to uint64s.
This removes all internal consumers of the deprecated
GetProcessStartTime function.

Signed-off-by: W. Trevor King <wking@tremily.us>
(cherry picked from commit 75d98b2)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
And Stat_t.PID and Stat_t.Name while we're at it.  Then use the new
.State property in runType to distinguish between running and
zombie/dead processes, since kill(2) does not [1].  With this change
we no longer claim Running status for zombie/dead processes.

I've also removed the kill(2) call from runType.  It was originally
added in 13841ef (new-api: return the Running state only if the init
process is alive, 2014-12-23), but we've been accessing
/proc/[pid]/stat since 14e95b2 (Make state detection precise,
2016-07-05, #930), and with the /stat access the kill(2) check is
redundant.

I also don't see much point to the previously-separate
doesInitProcessExist, so I've inlined that logic in runType.

It would be nice to distinguish between "/proc/[pid]/stat doesn't
exist" and errors parsing its contents, but I've skipped that for the
moment.

The Running -> Stopped change in checkpoint_test.go is because the
post-checkpoint process is a zombie, and with this commit zombie
processes are Stopped (and no longer Running).

[1]: opencontainers/runc#1483 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
(cherry picked from commit 2bea4c8)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When starting a container with `runc start` or `runc run`, the stub
process (runc[2:INIT]) opens a fifo for writing. Its parent runc process
will open the same fifo for reading. In this way, they synchronize.

If the stub process exits at the wrong time, the parent runc process
will block forever.

This can happen when racing 2 runc operations against each other: `runc
run/start`, and `runc delete`. It could also happen for other reasons,
e.g. the kernel's OOM killer may select the stub process.

This commit resolves this race by racing the opening of the exec fifo
from the runc parent process against the stub process exiting. If the
stub process exits before we open the fifo, we return an error.

Another solution is to wait on the stub process. However, it seems it
would require more refactoring to avoid calling wait multiple times on
the same process, which is an error.

Signed-off-by: Craig Furman <cfurman@pivotal.io>
(cherry picked from commit 8d3e6c9)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Craig Furman <cfurman@pivotal.io>
(cherry picked from commit 5c0af14)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Jordan Liggitt <liggitt@google.com>
(cherry picked from commit 8541d9c)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin merged commit 4e46071 into projectatomic:docker-1.13.1-rhel Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants