-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsenter: major cleanups #950
Conversation
* List of netlink message types sent to us as part of bootstrapping the init. | ||
* These constants are defined in libcontainer/message_linux.go. | ||
*/ | ||
#define INIT_MSG 62000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: some alignment issues here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I set my editor to a tabsize of 4 by accident.
For all your formatting needs... |
exit(1); | ||
} | ||
/* Sync with parent. */ | ||
if (read(syncpipe[0], &s, sizeof(s)) != sizeof(s) || s != SYNC_VAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add back the extras ()
for easier reading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it isn't easier to read with more parenthesis.
@crosbymichael Cheers, I've added that to my |
} | ||
|
||
static int clone_parent(jmp_buf *env, int flags) __attribute__((noinline)); | ||
static int clone_parent(jmp_buf *env, int flags) | ||
static int clone_parent(jmp_buf * env, int flags) __attribute__ ((noinline)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space between *
and env
:/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like GNU indent doesn't know what the kernel style is. Fixed.
I like the new Although, the commit message will need to be changed before merging :) |
Of course, it's still a wip because I still don't like bits of how the id mapping management are done. There's also some more code that I want to pull out of the rootless containers patchset to put it here. :P |
} | ||
len = write(fd, data, data_len); | ||
if (len != data_len) | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we close fd before return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm currently thinking about that. In all of the error cases, we are going to exit so I don't think it really matters that we close all of the file descriptors. I'm currently trying to figure out how alloca
interacts with clone
(hint: the answer is "not nicely") but I might also end up just adding a memory leak because we're going to execve
anyway (which clears all of the allocated memory from the old process anyway).
I might end up implementing it as a bunch of gotos
with error strings but that would mean that we lose the benefit of exit(__COUNTER__ + 1)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise you could return -(len != data_len)
that way close(fd)
will always be called
Alright. I think I've included all of the refactorings and improvements that I wanted to. That was not as much fun as it looked. PTAL. Please make sure you test it on your local machine, as there's been quite a bit of oddness with our testbed recently. |
|
||
s = SYNC_USERMAP_ACK; | ||
if (write(syncfd, &s, sizeof(s)) != sizeof(s)) { | ||
kill(child, SIGKILL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waitpid()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? We didn't waitpid
before, and I don't think we can because the processes are cloned with CLONE_PARENT | SIGCHLD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, overlooked the CLONE_PARENT
, sorry!
/ping @opencontainers/runc-maintainers PTAL |
From this week's meeting, @crosbymichael suggested that we include a refactoring of the communication methods between the bootstrap process and the init. This would involve presumably figuring out how we should deal with both the netlink magic as well as the other bits and pieces. Also, this is probably going to be the base for the console handling rewrite. So there's that. 😉 |
|
||
/* Retrieve data. */ | ||
size = NLMSG_PAYLOAD(&hdr, 0); | ||
current = data = malloc(size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check that malloc doesn't return NULL here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix'd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NULL check still missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyphar I don't see a NULL check here..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dammit, I must've accidentally put the fix in a different patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There, actually fixed this time.
@cyphar , @mrunalp , I tested this PR on Fodora 23, and I also tested this PR on RHEL7, and the same failure happened as described in #915. However, I do not think the failure is due to this PR. |
If we're not using user namespaces, they're all equivalent. If we are using user namespaces, they are not equivalent. The first one changes us to have The second one does a similar thing, but for a new mapping that we're creating (it's the same sort of argument). And the third one is required to In fact, I think the fourth one isn't necessary. |
@opencontainers/runc-maintainers :: Okay, @dqminh and I had a discussion about the mutual exclusivity of Would everyone be fine if we enforce this in |
config->consolefd = open(current, O_RDWR); | ||
if (config->consolefd < 0) | ||
bail("failed to open console %s", current); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not introduced in this PR, I have a question about consolePath, is there any specific reasons that we have to handle consolePath in nsexec.c
? It's only used by exec process, why can't we handle it in go code like we do for init process?
If possible, I think we should do so to make c code as simple as possible. @crosbymichael
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hqhq It's because we setns
inside the C code, and we can't be sure of where the slave path will be inside the container's mount namespace (or even if there will be a path to it) we need to open it before we join the container's namespaces. This isn't a problem for when we create a container, because nothing changes after an unshare
.
All of this will be reworked as part of the --console
rewrite in #814. But first we need to merge some of this code that will lay the groundwork for that other stuff. :P
@cyphar I think you can split this PR to speed up the process, the clean up part is a big change but can be merged rapidly, it'll also make other people easier to review and give other maintainers more confidence to merge :) |
@hqhq Fair point. :P I will break up this PR into three parts:
This PR blew up quite quickly. Hopefully the commit will rebase nicely. :P |
LGTM, ping @mrunalp @crosbymichael @avagin @dqminh |
child = clone(child_func, ca.stack_ptr, CLONE_PARENT | SIGCHLD | flags, &ca); | ||
|
||
/* | ||
* On old kernels, CLONE_PARENT didn't work with CLONE_NEWPID, so we have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we note an example of the last known kernel that exhibits this problem so maybe eventually we can remove this if possible ?
If we want to support those kernels, it might also make sense to setup CI for it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the commit that fixed this: torvalds/linux@1f7f4dd. And this is the commit which caused the issue: torvalds/linux@40a0d32.
The only mainline kernel which this affected is 3.12, but I can't really comment on kernels that this change might've been backported to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix'd.
Alright. Time for the final round of LGTMs. /cc @opencontainers/runc-maintainers |
Removed a lot of clutter, improved the style of the code, removed unnecessary complexity. In addition, made errors unique by making bail() exit with a unique error code. Most of this code comes from the current state of the rootless containers branch. Signed-off-by: Aleksa Sarai <asarai@suse.de>
This just moves everything to one function so we don't have to pass a bunch of things to functions when there's no real benefit. It also makes the API nicer. Signed-off-by: Aleksa Sarai <asarai@suse.de>
@mrunalp Fixed the NULL check. /cc @opencontainers/runc-maintainers |
1 similar comment
🎉 Now for the other PRs. :D |
Just to make the code much easier to read, also removing redundant code. I also added the functionality that
setns(2)
will fail if we passed the wrong path to the bootstrap process (to avoid bugs in runC's namespace sending being non-fatal).In addition, I switch
pr_perror
tobail
, where the process will exit with a distinct error code. Currently we aren't handlingsyncT
properly within the bootstrap process. But this is a good stopgap.There are also lots of other fixes that make the code more robust, easier to read and much nicer to maintain. Some of this code comes from the rootless containers PR #774.
TODO
:nsenter
and with bootstrap process. (not relevant here, there's only one pipe for bootstrap and one for the inter-process stuff).There are other cleanup parts in #975 and #976 and #977.
Signed-off-by: Aleksa Sarai asarai@suse.de