Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

youki exec did not join the correct pid namespace #185

Closed
yihuaf opened this issue Aug 5, 2021 · 14 comments · Fixed by #217
Closed

youki exec did not join the correct pid namespace #185

yihuaf opened this issue Aug 5, 2021 · 14 comments · Fixed by #217
Assignees

Comments

@yihuaf
Copy link
Collaborator

yihuaf commented Aug 5, 2021

This will lead youki exec to fail at the moment.

[DEBUG src/container/builder_impl.rs:83] 2021-08-05T08:28:56.915335338+02:00 failed to run container_init: Failed to clean up extra fds

Caused by:
    0: Failed to obtain opened fds
    1: /proc/self/fd is not the actual procfs
    2: No such file or directory (os error 2)

Edit: this is just the side effect, where /proc/self is not set correctly. The root cause is that the youki exec process didn't enter the correct pid namespace. The setns for pid namespace require an additional fork/clone to enter into the right namepsace.

@yihuaf yihuaf self-assigned this Aug 5, 2021
@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 5, 2021

Looks like we didn't set the /proc correctly when doing youki exec. As a result, there is no /proc/self for the process that executes the youki exec. I may have to spend some time trying to go through the prepre_rootfs code.

@yihuaf yihuaf closed this as completed Aug 5, 2021
@yihuaf yihuaf reopened this Aug 5, 2021
@yihuaf yihuaf changed the title youki exec is error out when cleanning up the extra fds. youki exec did not join the correct pid namespace Aug 6, 2021
@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 6, 2021

@utam0k @Furisto @tsturzl The root cause of this is that the youki exec process didn't join the correct pid namespaces. While clone allows us to create new user and pid namespaces by creating only 1 process, the setns for pid namespace has the same limitation. When calling setns for pid namespace, only the future children will join the pid namespace, instead of the calling process itself.

I am not sure what is the right approach to resolve this yet. Perhaps we need to revert the change and clone/fork two processes after all.

@tsturzl
Copy link
Collaborator

tsturzl commented Aug 6, 2021

I will take a look and see if I have any ideas, otherwise perhaps the double fork is the right solution for now. I'm not entirely sure what runc and crun are doing, and I'd like to look deeper at what they're doing. I think doing what runc is doing is a pretty reliable baseline for us to work off of, because at worst we are only as efficient as what's currently the standard.

@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 6, 2021

I will take a look and see if I have any ideas, otherwise perhaps the double fork is the right solution for now. I'm not entirely sure what runc and crun are doing, and I'd like to look deeper at what they're doing. I think doing what runc is doing is a pretty reliable baseline for us to work off of, because at worst we are only as efficient as what's currently the standard.

Thanks. I have a feeling that we may have to revert to creating two processes but want to take a bit of time to think it over. As far as I know, crun uses double fork similar to what we used to do. runc internally does two clone calls to solve the user and pid namespace issue in a similar way. runc actually does one more fork because of the limitation of golang.

@tsturzl
Copy link
Collaborator

tsturzl commented Aug 10, 2021

I'm a bit naive here so forgive any silly questions. Can we not know the pid namespace for the fork until the fork has already happened? It seems like the namespace is derived from the Spec which I assume is known upfront since it seems like the initial process is just passing the loaded spec as is. So those namespaces presumably should be known before the fork::clone ever happens. Could we not just apply setns on the first process? Am I missing something here? Presumably the parent process doesn't enter this namespace itself even if it itself calls setns so it's not actually in that namespace. Maybe there is an apparent caveat that I'm not seeing here. I haven't delved too deep into this yet. It would be great to not spawn a new process if we don't need to, but if it comes down to that it's definitely not a deal breaker.

@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 10, 2021 via email

@utam0k
Copy link
Member

utam0k commented Aug 12, 2021

@Furisto
Copy link
Collaborator

Furisto commented Aug 12, 2021

I haven’t get a chance to try yet, but another way to work around this would be to make Youki main process join the user namespace and pid namespace first, fork the init process, then handle the rest like we do today. Since the youki main process will return and the process will go away, theoretically it should be OK

I don't think this will work because for rootless containers the main process has to write the uid and gid mappings for the init process and it cannot do that if it is in the same user namespace.

@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 13, 2021

I haven’t get a chance to try yet, but another way to work around this would be to make Youki main process join the user namespace and pid namespace first, fork the init process, then handle the rest like we do today. Since the youki main process will return and the process will go away, theoretically it should be OK

I don't think this will work because for rootless containers the main process has to write the uid and gid mappings for the init process and it cannot do that if it is in the same user namespace.

Yes, I agree. Double fork it is. I will put together the PR over the weekend.

@utam0k
Copy link
Member

utam0k commented Aug 13, 2021

It's good to have our original integration test to prevent a recurrence. @YJDoc2 is now working on this area. I'd like to create a future where we can check automatically.
#56

@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 13, 2021

It's good to have our original integration test to prevent a recurrence. @YJDoc2 is now working on this area. I'd like to create a future where we can check automatically.
#56

Agreed. This should have been caught much earlier with the integration test.

@utam0k utam0k added the enhancement New feature or request label Aug 21, 2021
@utam0k
Copy link
Member

utam0k commented Aug 21, 2021

@yihuaf I believe the goal of this issue is to pass the below integration test cases. Is that correct? When I organized integration tests, I found that this case is failed.
https://github.com/containers/youki/blob/9acc899239ee0b9d58ea1a81b69884d13ba561fe/integration_test.sh#L37-L40
https://github.com/opencontainers/runtime-tools/blob/master/validation/linux_ns_path/linux_ns_path.go

Please carefully that these test cases use unshare command, so using the current youki breaks your environment.

@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 22, 2021

@yihuaf I believe the goal of this issue is to pass the below integration test cases. Is that correct? When I organized integration tests, I found that this case is failed.
https://github.com/containers/youki/blob/9acc899239ee0b9d58ea1a81b69884d13ba561fe/integration_test.sh#L37-L40

https://github.com/opencontainers/runtime-tools/blob/master/validation/linux_ns_path/linux_ns_path.go
Please carefully that these test cases use unshare command, so using the current youki breaks your environment.

The goal of this is to fix not entering into pid namespace correctly for exec. The fix requires us to go back to the double fork approach. However, passing these integration test is also important. I can follow up these tests after the PR if you'd like me to.

@yihuaf yihuaf added bug and removed enhancement New feature or request labels Aug 23, 2021
@yihuaf
Copy link
Collaborator Author

yihuaf commented Aug 23, 2021

This is actually a bug, not an enhancement. In fact, this is a regression because it broke exec to enter into the right pid namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants