-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support idmap mounts for volumes #3717
Conversation
df4f448
to
95170b5
Compare
@AkihiroSuda can we add this to the 1.2 milestone? 🙏 |
We have something working now. Will polish and push next week. Still tests and that missing. |
ee112d5
to
409c31e
Compare
39913c4
to
854940e
Compare
Test failures seem unrelated, for example:
or go 1.19 with -race failing due to criu:
|
ed0739a
to
d92ec12
Compare
Please rebase, then the CI should be green |
We've had some issues with packages in the containerd repository (which was related to mirrors configured on the workers that only have a subset of architectures). I gave CI a kick (hoping it was just a fluke) 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks so much!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small suggestions. we aleady have 3 LGTMs, I think we can move forward now.
This commit adds support for idmap mounts as specified in the runtime-spec. We open the idmap source paths and call mount_setattr() in runc PARENT, as we need privileges in the init userns for that, and then sends the fds to the child process. For this fd passing we use the same mechanism used in other parts of thecode, the _LIBCONTAINER_ env vars. The mount is finished (unix.MoveMount) from go code, inside the userns, so we reuse all the prepareBindMount() security checks and the remount logic for some flags too. This commit only supports idmap mounts when userns are used AND the mappings are the same specified for the userns mapping. This limitation is to simplify the initial implementation, as all our users so far only need this, and we can avoid sending over netlink the mappings, creating a userns with this custom mapping, etc. Future PRs will remove this limitation. Co-authored-by: Francis Laniel <flaniel@linux.microsoft.com> Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Co-authored-by: Francis Laniel <flaniel@linux.microsoft.com> Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
@lifubang thanks, fixed that too. Feel free to merge now! 🎉 |
Thanks everyone working on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this was merged while I was reviewing it. I guess I'll send a fix-up PR then... (I had 6-7 other comments but GitHub won't let me post them because the "diff has changed"...)
if m.idmapFD == -1 { | ||
return fmt.Errorf("error creating mount %+v: idmapFD is invalid, should point to a valid fd", m) | ||
} | ||
if err := unix.MoveMount(m.idmapFD, "", -1, dest, unix.MOVE_MOUNT_F_EMPTY_PATH); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := unix.MoveMount(m.idmapFD, "", -1, dest, unix.MOVE_MOUNT_F_EMPTY_PATH); err != nil { | |
if err := unix.MoveMount(m.idmapFD, "", unix.AT_FDCWD, dest, unix.MOVE_MOUNT_F_EMPTY_PATH); err != nil { |
static inline int sys_mount_setattr(int dfd, const char *path, unsigned int flags, struct mount_attr *attr, size_t size) | ||
{ | ||
return syscall(__NR_mount_setattr, dfd, path, flags, attr, size); | ||
} | ||
|
||
static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags) | ||
{ | ||
return syscall(__NR_open_tree, dfd, filename, flags); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We don't usually call these wrappers sys_foo, we just call them foo, but I guess it doesn't really matter.
continue; | ||
} | ||
|
||
int fd_tree = sys_open_tree(-EBADF, idmap_src, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int fd_tree = sys_open_tree(-EBADF, idmap_src, | |
int fd_tree = sys_open_tree(AT_FDCWD, idmap_src, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are enforcing that the path is a abs dir. If we use AT_FDCWD
it will work (if the validation is skipped for some reason), while with this IIRC it will fail if the path is a rel path.
Why is this suggestion better?
@cyphar Sure, feel free to open a PR and tag me if that works for you. Otherwise, let me know what suggested changes you had in mind |
Sorry to interrupt your code review. Maybe you can set this PR's label to By the way, I think all the maintainers should have a msg group in a instant messaging tool If convenient. Then we can announce some important things in this msg group. |
Do you have an account in |
Has created one account: Lifubang. Can you bring me in |
Done, but not much active currently |
This was a warning already and it was requested to make this an error while we will add validation of idmap mounts: opencontainers/runc#3717 (comment) I've also tested a k8s cluster and the config.json generated by containerd didn't use any relative paths. I tested one pod, so it was definitely not an extensive test. Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This PR implements support for this runtime-spec change that added idmap mounts support: opencontainers/runtime-spec#1143
We open the idmap source paths and call mount_setattr() in runc PARENT,
as we need privileges in the init userns for that, and then sends the
fds to the child process. For this fd passing we use the same mechanism
used in other parts of thecode, the
_LIBCONTAINER_
env vars.The mount is finished (unix.MoveMount) from go code, inside the userns,
so we reuse all the prepareBindMount() security checks and the remount
logic for some flags too.
This PR only supports idmap mounts when userns are used AND the mappings
are the same specified for the userns mapping. This limitation is to
simplify the initial implementation, as all our users so far only need
this, and we can avoid sending over netlink the mappings, creating a
userns with this custom mapping, etc. Future PRs will remove this
limitation.
As the idmap case is quite similar to the existing mount sources case we
open with O_PATH, some simple refactors are done to share more code and
to group the slices of fds in go code. To that end, we created the
mountFds struct, and add all the slices of fds there.
This replaces PR #3429, as that PR tries to mount already when we are inside the user namespace. AFAIK, that will never work and therefore this PR tries a completely different way to do the idmap mounts.
This PR is co-authored-by:
Francis Laniel <flaniel@linux.microsoft.com>
cc @eiffel-fl
Question
One open question I have is I added most validations in
libcontainer/configs/validate/validator.go
but I couldn't help to notice that other parts of the code do some other validations. All examples that I tried hit the validations, but am I missing adding the validations in some other part, that don't use this maybe when runc is called differently? Or adding them there should be enough?TODO
Changelog entry
Closes: #3429
Closes: #3020