Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make exec behaviour consistent with runc's exec #1252

Merged
merged 7 commits into from
Oct 21, 2022

Conversation

YJDoc2
Copy link
Collaborator

@YJDoc2 YJDoc2 commented Oct 7, 2022

related : #531

NOTE : even though this does solve the issue, I'm not particularly happy with the way this is implemented. Comments and suggestions for improvements are most welcome.

This helps in passing following containerd's integration tests which were previously failing :

  • TestContainerExec
  • TestContaienrExecNoBinaryExists
  • TestContainerExecLargeTTY (not sure if it was passing before and then broke, or was not passing from start, this PR will pass it)
  • TestContainerLargeExecArgs
  • TestContainerAttachProcess
  • TestContainerMetrics
    🎉

Background

runc's exec has 4 distinct outcomes in terms of return value:

  • when run without --detach and with a valid commannd, it waits and exits with the command's exit code
  • when run without --detach and with an invalid command, it exits with and error, 255 exit code
  • when run with --detach and with a valid command, it exits immediately after launching the process (tenant) and exits with a 0
  • when run with --detach and with an invalid command, it exits with and error and exit code of 255

The 4rth case is subtle and made the main difference of why youki was not originally passing containerd's tests.

On a sidde note, originally youki's exec did not respect the --detach flag and that was observed by the failing of TestContaienrExec after #1018 . This PR also fixes that.

Now, the reason behind the 4rth case exiting with error and not 0 like the 3rd, even though both are given detach flags, is runc executes the exec command similar to using process::spawn in Rust. Thus the exec occurs in two steps :

  • First the exec process is spawned, and if there is any error in spawning the process, such as invalid command, invalid operation or OOM errors, it is checked at this stage, and immediately returned.
  • If there was no error in spawning the command, then after doing some other work, runc checks if detach flag is given, if given immediately exits with 0, else waits for the exec'ed process to exit , and exits with its exit code.

Unlike it, youki uses fork-exec, which makes it difficult to do similar two-step exec as runc, because unlike process::spawn we do not really have a "handle" on the fork-exec'ed process, at best we have parent-child waitpid relationship.

This PR

This PR solves the issue by making sure the exec invocation done by youki binary gets the status of exec, AND add conditionality for waiting.

The issue with the implementation is that the only way I could make it work was to open a pipe with O_CLOEXEC flag, and in the intermediate's fork for init process, if the start_init_process returns, treat it as error (as the exec call should change process image and start the actual exec command, thus never returning). If the error occurs write the error message to the pipe. If exec succeeds, then the pipe is closed by OS. The tenant builder, after notifying start blocks by read-ing on this pipe, and if the read returns 0, indicating all write ends are closed, then the exec call has succeeded, otherwise, the buffer given to error contains the error message written by the init_process.
For this I had to drill down the RawFd from the tenant builder to the init process, and as the final .build uses the same implementation for init process and exec process, I had to introduce an enum to indicate the type AND pass the Fd. The problem is I don't feel this is much intuitive, and we have to make sure that the extra duplicated write-end fds are closed, for eg the intermediate process, after forking from main process contains the duplicated write-end fd. If not closed, the read end blocks until the intermediate process has also exited. This took me a day to debug, and can be tricky to maintain.

Also, currently the exec-failed case always exits with 255, even though we might know the exact error code, as I couldn't find a way to downcast the anyhow error to specific type. The buffer size is also hard-coded to 1024, so in case the error message is longer, it will be cut -off.

Personally I would have preferred a solution like notify-socket or main/intermediate sender-receiver channels, but not sure if that supports O_CLOEXEC, and would work the same. Another issue preventing using main/intermediate channels is that they are created in the main_process which exists inside the .build call, however, the exec call is done only after notify_start is done, which is after the .build . This means that any other solution that is used will need some kind of parameter either being hard-coded or drilled down in a similar way RawFd is currently done.

Finally, one more thing to note is not only the youki cli invocation should not block/hand for exec if the detach flag is given, for some reason the containerd tests also monitor the intermediate process(?) which before this waited unconditionally on the init process. This PR thus also drills down the detach status, and both the cli and intermediate wait conditionally. I'm not sure why the test blocks on the exit of intermediate process , need to check if this is because we are leaking the pid of it somehow by incorrectly giving it instead of the init process. I have manually verified that the pid written to pid file is of the actual init process in case of create-start.


As mentioned in the start, not sure if this is the best approach, might be some better way to do this, comments and suggestion welcome. Thanks :)

cc: @utam0k @Furisto @yihuaf

Comment on lines 10 to 16
pub enum ContainerType {
InitContainer,
TenantContainer {
detached: bool,
exec_notify_fd: RawFd,
},
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this idea ❤️

@utam0k
Copy link
Member

utam0k commented Oct 8, 2022

@YJDoc2 Thanks for your hard work 🙏 🙇

@utam0k
Copy link
Member

utam0k commented Oct 8, 2022

Hi, @Furisto. May I ask you to review it? I would like to review it with you both.

@YJDoc2
Copy link
Collaborator Author

YJDoc2 commented Oct 8, 2022

@YJDoc2 Thanks for your hard work pray bow

Thank you for your kind words 😊

@utam0k utam0k requested a review from Furisto October 9, 2022 01:13
@utam0k utam0k mentioned this pull request Oct 9, 2022
@Furisto
Copy link
Collaborator

Furisto commented Oct 10, 2022

Hey, really sorry, I have not found the time yet to look at this. I will take a stab at this tomorrow.

@Furisto
Copy link
Collaborator

Furisto commented Oct 12, 2022

I only see this dealing with the exec/tenant case, is my understanding correct? You can also detach from an init container (see https://github.com/opencontainers/runc/blob/main/docs/terminals.md for the supported modes). I am concerned that we introduce code that makes it harder to support this case in the future. Can you think of a way to support both?

@YJDoc2
Copy link
Collaborator Author

YJDoc2 commented Oct 12, 2022

Hey, yes this currently only deals with exec/tenant case, primarily because the main intention to do this PR was to pass the containerd integration tests.

I do partially agree with you that it'd be better to unify the detach handling for init and tenant cases, but I'm not sure why the current way might make it harder to handle the init container detach. Most of the changes in this PR are in the case of when the container is tenant only, and otherwise do not change the way other parts work.

One of the changes I would add is that instead of keeping detach in TenantContainer enum, extract it to the builder, as it will be required in case of run / detached console as well. However, the readpipe is specific to the exec case, so that should remain in the enum.

I can't seem to think about what part of current code might make it harder to implement the console/terminal case which you have noted, so can you please explain a bit more on it , so I can try changing the code accordingly?

Thanks :)

@Furisto
Copy link
Collaborator

Furisto commented Oct 14, 2022

Maybe I am overthinking it, I am trusting your judgement. Moving detach to the builder is the right call as we will be needing it later. Lets do this and then we are good to go I think.

@YJDoc2
Copy link
Collaborator Author

YJDoc2 commented Oct 20, 2022

@utam0k Apologies for the delay, I have extracted the detach in the base builder as discussed with Furisto, as well as have implemented looping read for exec error message, so even longer error messages can be received. Please take a look. Thanks!

Copy link
Member

@utam0k utam0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YJDoc2 💯 Thanks for your great work!

@utam0k utam0k merged commit bcdfd59 into youki-dev:main Oct 21, 2022
@YJDoc2 YJDoc2 deleted the fix-exec-return-code branch November 8, 2022 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants