-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pidfd: Cannot restore multiple processes pointing to a common dead pidfd #2496
Comments
@bsach64 Thank you for working on this! Would it be possible to create a page describing how checkpoint/restore of pidfd works in https://criu.org/Category:Under_the_hood? |
Sure @rst0git! I will do so in the near future! |
bsach64
added a commit
to bsach64/criu
that referenced
this issue
Oct 28, 2024
This patch ensures that the process that creates the tmp process is the one that kills and waits for it when all pidfds have been opened. We do this by keeping track of the count of dead pidfds that each process has opened. When the count for the creator of the tmp process reaches 0, it waits for all other processes to open pidfds and then kills and waits for the tmp process. Fixes: checkpoint-restore#2496 Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
bsach64
added a commit
to bsach64/criu
that referenced
this issue
Oct 28, 2024
This patch ensures that the process that creates the tmp process is the one that kills and waits for it when all pidfds have been opened. We do this by keeping track of the count of dead pidfds that each process has opened. When the count for the creator of the tmp process reaches 0, it waits for all other processes to open pidfds and then kills and waits for the tmp process. Fixes: checkpoint-restore#2496 Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
bsach64
added a commit
to bsach64/criu
that referenced
this issue
Nov 6, 2024
This patch ensures that the process that creates the tmp process is the one that kills and waits for it when all pidfds have been opened. We do this by keeping track of the count of dead pidfds that each process has opened. When the count for the creator of the tmp process reaches 0, it waits for all other processes to open pidfds and then kills and waits for the tmp process. Fixes: checkpoint-restore#2496 Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
bsach64
added a commit
to bsach64/criu
that referenced
this issue
Nov 7, 2024
Currently, the `waitpid()` call on the tmp process can be made by a process which is not its parent. This causes restore to fail. This patch instead selects one process to create the tmp process and open all the fds that point to it. These fds are sent to the correct process(es). Fixes: checkpoint-restore#2496 Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
bsach64
added a commit
to bsach64/criu
that referenced
this issue
Nov 8, 2024
Currently, the `waitpid()` call on the tmp process can be made by a process which is not its parent. This causes restore to fail. This patch instead selects one process to create the tmp process and open all the fds that point to it. These fds are sent to the correct process(es). Fixes: checkpoint-restore#2496 Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
bsach64
pushed a commit
to bsach64/criu
that referenced
this issue
Nov 8, 2024
Currently, the `waitpid()` call on the tmp process can be made by a process which is not its parent. This causes restore to fail. This patch instead selects one process to create the tmp process and open all the fds that point to it. These fds are sent to the correct process(es). Fixes: checkpoint-restore#2496 Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
mihalicyn
pushed a commit
to bsach64/criu
that referenced
this issue
Nov 12, 2024
Currently, the `waitpid()` call on the tmp process can be made by a process which is not its parent. This causes restore to fail. This patch instead selects one process to create the tmp process and open all the fds that point to it. These fds are sent to the correct process(es). Fixes: checkpoint-restore#2496 Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
avagin
added a commit
that referenced
this issue
Nov 12, 2024
Currently, the `waitpid()` call on the tmp process can be made by a process which is not its parent. This causes restore to fail. This patch instead selects one process to create the tmp process and open all the fds that point to it. These fds are sent to the correct process(es). Fixes: #2496 Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Bhavik Sachdev <b.sachdev1904@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Please read the reproducer code: https://gist.github.com/bsach64/fcbdaec357fd7ed4212a86ac81b2e8bb
Relevant Restore Failure Logs
I think the issue stems from the fact that the process that creates the tmp process is not the which is waiting for it.
A different process may kill this tmp process, but the one that creates it should be the one that waits for it to exit.
I will try to raise a PR with the fix over the next couple of weeks.
CRIU logs and information:
CRIU full restore logs:
The text was updated successfully, but these errors were encountered: