-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make exec behaviour consistent with runc's exec #1252
Conversation
Rather than drilling down RawFd from tenant to init_process, this instead adds an enum in ContainerArgs, containing RawFd
pub enum ContainerType { | ||
InitContainer, | ||
TenantContainer { | ||
detached: bool, | ||
exec_notify_fd: RawFd, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this idea ❤️
@YJDoc2 Thanks for your hard work 🙏 🙇 |
Hi, @Furisto. May I ask you to review it? I would like to review it with you both. |
Thank you for your kind words 😊 |
Hey, really sorry, I have not found the time yet to look at this. I will take a stab at this tomorrow. |
I only see this dealing with the exec/tenant case, is my understanding correct? You can also detach from an init container (see https://github.com/opencontainers/runc/blob/main/docs/terminals.md for the supported modes). I am concerned that we introduce code that makes it harder to support this case in the future. Can you think of a way to support both? |
Hey, yes this currently only deals with exec/tenant case, primarily because the main intention to do this PR was to pass the containerd integration tests. I do partially agree with you that it'd be better to unify the detach handling for init and tenant cases, but I'm not sure why the current way might make it harder to handle the init container detach. Most of the changes in this PR are in the case of when the container is tenant only, and otherwise do not change the way other parts work. One of the changes I would add is that instead of keeping detach in TenantContainer enum, extract it to the builder, as it will be required in case of run / detached console as well. However, the readpipe is specific to the exec case, so that should remain in the enum. I can't seem to think about what part of current code might make it harder to implement the console/terminal case which you have noted, so can you please explain a bit more on it , so I can try changing the code accordingly? Thanks :) |
Maybe I am overthinking it, I am trusting your judgement. Moving detach to the builder is the right call as we will be needing it later. Lets do this and then we are good to go I think. |
af6c05d
to
bed057f
Compare
@utam0k Apologies for the delay, I have extracted the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YJDoc2 💯 Thanks for your great work!
related : #531
NOTE : even though this does solve the issue, I'm not particularly happy with the way this is implemented. Comments and suggestions for improvements are most welcome.
This helps in passing following containerd's integration tests which were previously failing :
🎉
Background
runc's exec has 4 distinct outcomes in terms of return value:
--detach
and with a valid commannd, it waits and exits with the command's exit code--detach
and with an invalid command, it exits with and error, 255 exit code--detach
and with a valid command, it exits immediately after launching the process (tenant) and exits with a 0--detach
and with an invalid command, it exits with and error and exit code of 255The 4rth case is subtle and made the main difference of why youki was not originally passing containerd's tests.
On a sidde note, originally youki's exec did not respect the
--detach
flag and that was observed by the failing of TestContaienrExec after #1018 . This PR also fixes that.Now, the reason behind the 4rth case exiting with error and not 0 like the 3rd, even though both are given detach flags, is runc executes the exec command similar to using process::spawn in Rust. Thus the exec occurs in two steps :
Unlike it, youki uses fork-exec, which makes it difficult to do similar two-step exec as runc, because unlike process::spawn we do not really have a "handle" on the fork-exec'ed process, at best we have parent-child waitpid relationship.
This PR
This PR solves the issue by making sure the exec invocation done by youki binary gets the status of exec, AND add conditionality for waiting.
The issue with the implementation is that the only way I could make it work was to open a pipe with O_CLOEXEC flag, and in the intermediate's fork for init process, if the start_init_process returns, treat it as error (as the exec call should change process image and start the actual exec command, thus never returning). If the error occurs write the error message to the pipe. If exec succeeds, then the pipe is closed by OS. The tenant builder, after notifying start blocks by read-ing on this pipe, and if the read returns 0, indicating all write ends are closed, then the exec call has succeeded, otherwise, the buffer given to error contains the error message written by the init_process.
For this I had to drill down the RawFd from the tenant builder to the init process, and as the final
.build
uses the same implementation for init process and exec process, I had to introduce an enum to indicate the type AND pass the Fd. The problem is I don't feel this is much intuitive, and we have to make sure that the extra duplicated write-end fds are closed, for eg the intermediate process, after forking from main process contains the duplicated write-end fd. If not closed, the read end blocks until the intermediate process has also exited. This took me a day to debug, and can be tricky to maintain.Also, currently the exec-failed case always exits with 255, even though we might know the exact error code, as I couldn't find a way to downcast the anyhow error to specific type. The buffer size is also hard-coded to 1024, so in case the error message is longer, it will be cut -off.
Personally I would have preferred a solution like notify-socket or main/intermediate sender-receiver channels, but not sure if that supports O_CLOEXEC, and would work the same. Another issue preventing using main/intermediate channels is that they are created in the main_process which exists inside the
.build
call, however, the exec call is done only after notify_start is done, which is after the.build
. This means that any other solution that is used will need some kind of parameter either being hard-coded or drilled down in a similar way RawFd is currently done.Finally, one more thing to note is not only the youki cli invocation should not block/hand for exec if the detach flag is given, for some reason the containerd tests also monitor the intermediate process(?) which before this waited unconditionally on the init process. This PR thus also drills down the detach status, and both the cli and intermediate wait conditionally. I'm not sure why the test blocks on the exit of intermediate process , need to check if this is because we are leaking the pid of it somehow by incorrectly giving it instead of the init process. I have manually verified that the pid written to pid file is of the actual init process in case of create-start.
As mentioned in the start, not sure if this is the best approach, might be some better way to do this, comments and suggestion welcome. Thanks :)
cc: @utam0k @Furisto @yihuaf