Make exec behaviour consistent with runc's exec #1252

YJDoc2 · 2022-10-07T18:04:41Z

related : #531

NOTE : even though this does solve the issue, I'm not particularly happy with the way this is implemented. Comments and suggestions for improvements are most welcome.

This helps in passing following containerd's integration tests which were previously failing :

TestContainerExec
TestContaienrExecNoBinaryExists
TestContainerExecLargeTTY (not sure if it was passing before and then broke, or was not passing from start, this PR will pass it)
TestContainerLargeExecArgs
TestContainerAttachProcess
TestContainerMetrics
🎉

Background

runc's exec has 4 distinct outcomes in terms of return value:

when run without --detach and with a valid commannd, it waits and exits with the command's exit code
when run without --detach and with an invalid command, it exits with and error, 255 exit code
when run with --detach and with a valid command, it exits immediately after launching the process (tenant) and exits with a 0
when run with --detach and with an invalid command, it exits with and error and exit code of 255

The 4rth case is subtle and made the main difference of why youki was not originally passing containerd's tests.

On a sidde note, originally youki's exec did not respect the --detach flag and that was observed by the failing of TestContaienrExec after #1018 . This PR also fixes that.

Now, the reason behind the 4rth case exiting with error and not 0 like the 3rd, even though both are given detach flags, is runc executes the exec command similar to using process::spawn in Rust. Thus the exec occurs in two steps :

First the exec process is spawned, and if there is any error in spawning the process, such as invalid command, invalid operation or OOM errors, it is checked at this stage, and immediately returned.
If there was no error in spawning the command, then after doing some other work, runc checks if detach flag is given, if given immediately exits with 0, else waits for the exec'ed process to exit , and exits with its exit code.

Unlike it, youki uses fork-exec, which makes it difficult to do similar two-step exec as runc, because unlike process::spawn we do not really have a "handle" on the fork-exec'ed process, at best we have parent-child waitpid relationship.

This PR

This PR solves the issue by making sure the exec invocation done by youki binary gets the status of exec, AND add conditionality for waiting.

The issue with the implementation is that the only way I could make it work was to open a pipe with O_CLOEXEC flag, and in the intermediate's fork for init process, if the start_init_process returns, treat it as error (as the exec call should change process image and start the actual exec command, thus never returning). If the error occurs write the error message to the pipe. If exec succeeds, then the pipe is closed by OS. The tenant builder, after notifying start blocks by read-ing on this pipe, and if the read returns 0, indicating all write ends are closed, then the exec call has succeeded, otherwise, the buffer given to error contains the error message written by the init_process.
For this I had to drill down the RawFd from the tenant builder to the init process, and as the final .build uses the same implementation for init process and exec process, I had to introduce an enum to indicate the type AND pass the Fd. The problem is I don't feel this is much intuitive, and we have to make sure that the extra duplicated write-end fds are closed, for eg the intermediate process, after forking from main process contains the duplicated write-end fd. If not closed, the read end blocks until the intermediate process has also exited. This took me a day to debug, and can be tricky to maintain.

Also, currently the exec-failed case always exits with 255, even though we might know the exact error code, as I couldn't find a way to downcast the anyhow error to specific type. The buffer size is also hard-coded to 1024, so in case the error message is longer, it will be cut -off.

Personally I would have preferred a solution like notify-socket or main/intermediate sender-receiver channels, but not sure if that supports O_CLOEXEC, and would work the same. Another issue preventing using main/intermediate channels is that they are created in the main_process which exists inside the .build call, however, the exec call is done only after notify_start is done, which is after the .build . This means that any other solution that is used will need some kind of parameter either being hard-coded or drilled down in a similar way RawFd is currently done.

Finally, one more thing to note is not only the youki cli invocation should not block/hand for exec if the detach flag is given, for some reason the containerd tests also monitor the intermediate process(?) which before this waited unconditionally on the init process. This PR thus also drills down the detach status, and both the cli and intermediate wait conditionally. I'm not sure why the test blocks on the exit of intermediate process , need to check if this is because we are leaking the pid of it somehow by incorrectly giving it instead of the init process. I have manually verified that the pid written to pid file is of the actual init process in case of create-start.

As mentioned in the start, not sure if this is the best approach, might be some better way to do this, comments and suggestion welcome. Thanks :)

cc: @utam0k @Furisto @yihuaf

Rather than drilling down RawFd from tenant to init_process, this instead adds an enum in ContainerArgs, containing RawFd

utam0k · 2022-10-08T05:27:04Z

crates/libcontainer/src/process/args.rs

+pub enum ContainerType {
+    InitContainer,
+    TenantContainer {
+        detached: bool,
+        exec_notify_fd: RawFd,
+    },
+}


I love this idea ❤️

crates/libcontainer/src/container/tenant_builder.rs

utam0k · 2022-10-08T07:49:20Z

@YJDoc2 Thanks for your hard work 🙏 🙇

utam0k · 2022-10-08T07:49:48Z

Hi, @Furisto. May I ask you to review it? I would like to review it with you both.

YJDoc2 · 2022-10-08T15:24:56Z

@YJDoc2 Thanks for your hard work pray bow

Thank you for your kind words 😊

Furisto · 2022-10-10T22:02:12Z

Hey, really sorry, I have not found the time yet to look at this. I will take a stab at this tomorrow.

Furisto · 2022-10-12T08:25:14Z

I only see this dealing with the exec/tenant case, is my understanding correct? You can also detach from an init container (see https://github.com/opencontainers/runc/blob/main/docs/terminals.md for the supported modes). I am concerned that we introduce code that makes it harder to support this case in the future. Can you think of a way to support both?

YJDoc2 · 2022-10-12T09:22:50Z

Hey, yes this currently only deals with exec/tenant case, primarily because the main intention to do this PR was to pass the containerd integration tests.

I do partially agree with you that it'd be better to unify the detach handling for init and tenant cases, but I'm not sure why the current way might make it harder to handle the init container detach. Most of the changes in this PR are in the case of when the container is tenant only, and otherwise do not change the way other parts work.

One of the changes I would add is that instead of keeping detach in TenantContainer enum, extract it to the builder, as it will be required in case of run / detached console as well. However, the readpipe is specific to the exec case, so that should remain in the enum.

I can't seem to think about what part of current code might make it harder to implement the console/terminal case which you have noted, so can you please explain a bit more on it , so I can try changing the code accordingly?

Thanks :)

Furisto · 2022-10-14T20:54:47Z

Maybe I am overthinking it, I am trusting your judgement. Moving detach to the builder is the right call as we will be needing it later. Lets do this and then we are good to go I think.

YJDoc2 · 2022-10-20T14:37:23Z

@utam0k Apologies for the delay, I have extracted the detach in the base builder as discussed with Furisto, as well as have implemented looping read for exec error message, so even longer error messages can be received. Please take a look. Thanks!

utam0k

@YJDoc2 💯 Thanks for your great work!

YJDoc2 added 6 commits October 3, 2022 12:00

Fix behaviour on exec failure

98926ad

Fix exec command for detached

911381c

Close the write end from intermediate process as well

b48a379

Fix formatting

603475b

Make handling of exec containers slightly more idiomatic

c6b8b81

Rather than drilling down RawFd from tenant to init_process, this instead adds an enum in ContainerArgs, containing RawFd

Actually set value of detach from command args

f5a84bd

utam0k reviewed Oct 8, 2022

View reviewed changes

crates/libcontainer/src/container/tenant_builder.rs Outdated Show resolved Hide resolved

utam0k requested a review from Furisto October 9, 2022 01:13

utam0k mentioned this pull request Oct 9, 2022

Support for containerd #531

Closed

Extract detached into main builder struct, better error message handling

bed057f

YJDoc2 force-pushed the fix-exec-return-code branch from af6c05d to bed057f Compare October 20, 2022 14:25

utam0k approved these changes Oct 21, 2022

View reviewed changes

utam0k merged commit bcdfd59 into youki-dev:main Oct 21, 2022

YJDoc2 deleted the fix-exec-return-code branch November 8, 2022 06:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make exec behaviour consistent with runc's exec #1252

Make exec behaviour consistent with runc's exec #1252

YJDoc2 commented Oct 7, 2022 •

edited

Loading

utam0k Oct 8, 2022

utam0k commented Oct 8, 2022

utam0k commented Oct 8, 2022

YJDoc2 commented Oct 8, 2022

Furisto commented Oct 10, 2022

Furisto commented Oct 12, 2022

YJDoc2 commented Oct 12, 2022

Furisto commented Oct 14, 2022

YJDoc2 commented Oct 20, 2022

utam0k left a comment

Make exec behaviour consistent with runc's exec #1252

Make exec behaviour consistent with runc's exec #1252

Conversation

YJDoc2 commented Oct 7, 2022 • edited Loading

Background

This PR

utam0k Oct 8, 2022

Choose a reason for hiding this comment

utam0k commented Oct 8, 2022

utam0k commented Oct 8, 2022

YJDoc2 commented Oct 8, 2022

Furisto commented Oct 10, 2022

Furisto commented Oct 12, 2022

YJDoc2 commented Oct 12, 2022

Furisto commented Oct 14, 2022

YJDoc2 commented Oct 20, 2022

utam0k left a comment

Choose a reason for hiding this comment

YJDoc2 commented Oct 7, 2022 •

edited

Loading