-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to run user namespaced container #3770
Comments
/cc @rata |
Thanks! I can't repro with that, though :( The issue really seems like the same symptom that PR #3511 fixed, but that is in 1.1.4 and you are running with 1.1.4. So maybe something with a similar symptom is still lurking there. What I've tried so far is and failed to repro this is:
So, I can't really repro with that config. It would be great if you
These are the things that come to mind that might help us debug this. But it would be great if @kolyshkin can have a look. |
Thanks for the detailed steps @rata! Ill run through them and report back. |
The container runtime is containerd 1.7.0 I still can't get things to run on an ubuntu node
Here is the error:
Then I cd into the directory with config.json
Then cloned runc git repo and ran bats
|
@vinayakankugoyal things should run on Ubuntu, it is probably some config or binary missing on your side. The error you pasted is from containerd, that is not what we want. The runc output you pasted is not useful either, see that it says:
That file doesn't exist anymore and you are not seeing the error you saw before. You will need to repro this when the file exists, or copy it and adjust the config.json (those two bind mounts, the shm and the resolv.conf). The bats output not sure it is useful either, it seems to throw an error due to some bats variable not working. Maybe is something with your bats installation? Also, when you have the time, please see all the things that I asked and answer with all :) |
Ok I was able to get a repro by pointing the config.json to another running containers sandbox id. Now I get the same error as I was getting from kubelet on ubuntu.
|
I noticed the following sysctls:
if I remove "net.core.somaxconn": "1024", from config.json it works. Here is what I see when I remove that sysctl
|
Non usernamespace containers also have that sysctl in their config.json and those are coming up fine. |
@vinayakankugoyal Hmm, that is not the same error, isn't it? I mean, the one you mentioned here: That is not the same error you reported in the original issue description. Am I missing something? Please try to create a repro for the issue in the original issue. As I mentioned, I tried several scenarios but couldn't repro, but it seems you have a setup that you can repro, so let's see how is it that you hit it. |
I am not able to repro the same issue that I am seeing on the COS based nodes on the Ubuntu based nodes. On COS I am seeing:
On Ubuntu nodes I am seeing:
However I did the same changed on the COS node and now I get the original error. I also ran strace and here is the output filtered to anything that had resolv.conf in it.
This is how the directory permissions are setup:
|
@vinayakankugoyal if you can repro with runc run, can you then do all the things I asked to instead of just these? The strace uploaded to some site would be nice, the grep doesn't seem to show useful information. Also, though, you might have not reproduced the issue. The strace output you put does not have any mount calls, and we do know it is failing in the mount call (technically it can be failing in the prior checks, but seems unlikely as that mount works without userns) |
your repro is too confusing, but if your kernel is v5.15 and set sysctl in a new usernamespace, may be you ran into this problem: net: Don't export sysctls to unprivileged users
this code is in v5.15 |
@rata - here is the strace output. |
@vinayakankugoyal cool. Can you paste the other things I asked here? Sorry I'm repeating this in a loop, but you continue to answer with partial information and not comment at all on doing the other things I asked. If you planned to do it later, please let me know, so I don't ask you repeatedly. |
@vinayakankugoyal So, let's go one report at a time. For Ubuntu 22.04: I can't reproduce. I've installed a VM with Ubuntu 22.04, run apt dist-upgrade to have the latest versions. This is what I did, because I'm very used to the development setup, no other special reason.
This will start a k8s cluster and you should see in your terminal running containerd some activity (it will try to create the coredns pod). In this setup, I've applied the pod you sent me via slack:
You can download the kubectl binary and, as the k8s start script printed, you can The pod was created just fine and with userns:
Note that in this ubuntu version there is a bug (it seems to be fixed in latest debian, will probably come in ubuntu 23.04) where hitting ctrl-c doesn't work to stop the k8s cluster. You can kill the processes if you are using a vm via ssh by running: Due to that, you might need to run this after killing all the k8s processes but before starting it again: If you do this, does it work for you? Can you try to find what is the difference between this and the ubuntu node you are using? And care to share how is it installed (all components, the OS, the container runtime, CNI, runc, etc.)? Regarding the bats error you pasted here: This is because you installed an old version of bats. If you install latest from source, it will not throw that error: https://bats-core.readthedocs.io/en/stable/installation.html#any-os-installing-bats-from-source The tests run fine in ubuntu 22.04 with a new version of bats (expected, as that is tested in the CI too IIRC). Regarding the config.json you sent me on slack: config-slack.txt. This is a config.json that was created on COS when you hit the issue in the k8s cluster. With that config.json I could repro the issue in Ubuntu 22.04. But it seems like a red herring. First, I deleted the somax sysctl line, that isn't added on Ubuntu when I started my k8s cluster here (I guess COS has some specific config to add those?). When the mount is pointing to the host /etc/resolv.conf, in ubuntu 22.04 (at least in this default config on an Azure VM) that file is a symlink. If you copy the file (cp /etc/resolv.conf .) and then point to this new file, the container starts fine. It also starts fine if you keep the host /etc/resolv.conf and change the Do you have a config.json generated in your k8s your setup when it fails with Ubuntu? Also, can you try in COS if adding those options to the mount makes it work? I'm out of ideas on how I can reproduce this. Besides the missing things that you will send when you can, also:
Regarding COS, whenever you send the other info we'll have more insight into what might be happening. My gut feeling now is that it might be configured to use some options in the mount that don't work with userns, although not sure how they achieve to add those options in the config.json. But let me know if in your setup with Ubuntu this works fine (is this a GKE cluster?). |
Thanks for all the details! UbuntuLike I mentioned in chat, Ubuntu works if I remove the somaxconn sysctl. It seems like in GKE kubelet was adding that to all pods, and I turned that "feature" off. Now the pod is able to come up just fine. Ill have to followup on whether COSFor COS when I update the
|
You are welcome. But please, please, PLEASE understand that I'm spending a lot of time on this, and in part is due to your concise bug reports and replies you post here, that just don't say enough. With Ubuntu I can spin a server and spend a lot of time (even though having to spend lot of time has a significant impact on my day to day tasks), but with COS that is either a closed-source or only runs on google cloud, this is of course even way more difficult. For example, if you had mentioned that in Ubuntu the sysctl was the only issue, that is a feature you have on GKE and that you can turn it off, and when you do that then all works fine, that would have saved me several HOURS to try it, write the elaborated post I did here with clear step-by-step instructions, etc. Another thing that will help is if you say exactly how you run something to produce some output. If it is a k8s cluster, then any details needed in the setup, etc. Talking more in general, please think others don't know anything else than what you write. So explaining exactly what you did is critical, and maybe also ask yourself some questions before submitting, like: is this enough for someone on another laptop to reproduce what I have here, or is something missing or that can be interpreted in some other way? Am I being as clear as I can with what I write? I'd really need you to start answering the questions I ask, if you can't answer some now and plan to answer them later, please do say so. And follow on what you said you will do later (so far so said you will follow up on some, but you didn't and I don't know if you consider that is not relevant anymore or you will do it later; it seems weird as for some things you do spend time, but for others you don't and I don't understand). Also, I think if you don't know why something fixes something for you, then don't open PRs doing that change. We are debugging, we need first to understand what is happening (and we will find ways were things work while debugging, maybe more than one), and only then we can propose a fix (if any is really needed). If we do try something for debugging, that seems to help with some thing and you don't really understand why it helps nor it fixes the problem, then opening a PR for that is not the flow that I expect. Let's debug and understand first. We can open PRs later.
This is not what I understood at all of what you said in the chat. but great it works! I'll need you once again to be more verbose here. How do you disable the "feature" in GKE? This bug report can be useful for others only if you share this. Also, is that enabled by default on GKE? Is it part of some kubernetes upstream project? Or how is GKE adding that?
Cool, but please do follow-up on this. Regarding COS, is the filesystem mounting the resolv.conf from when running from k8s, is mounted with those flags? If not that, is the file a symlink or is there a symlink in any of the paths components to the resolv.conf? Also, you didn't mention anything at all, but did that containerd patch make the flow from k8s work? I mean create a pod from k8s, using the patched containerd, and the pod is started with userns. Regarding the last issue you pasted about permission denied, can you try if it is fixed with runc from git with this PR applied? #3753 |
@rata - I appreciate your help on this but please allow my to clarify. I have mentioned both in chat and in the report now that Ubuntu seems to be working once I remove the somaxconn sysctl. I had posted that Ubuntu works if I remove the sysctl a week ago. I did not know why those sysctls were being added in GKE myself and only learnt about that feature recently and am still following up on how that can be turned off by someone. AFAIK there isn't a way to turn it off in GKE, other than messing with the kubelet config which is what I did but I am following up with the GKE team to understand more. I initially repro'd this issue on COS and only switch the repo to Ubuntu because in comment you mentioned:
Again sorry for the confusion about this and the wasted hours. But if you are not clear on some details in chat please don't hesitate to clarify. You have been extremely gracious with your time and I don't want you to waste it because of miscommunication. Let me try to give as much details as I can about my setup now: UbuntuI created a GKE cluster using the following command:
Notice that the cluster above only has 1 node. After the cluster came up I ssh'd into the node and did the following: Ubuntu Version
Install
|
@rata - I think I got this work on COS. Here is what I did:
As we can see the container process is running as UID 3306815488. |
Sure, but that is manually running runc, not starting a k8s pod. My understanding was that runc was running, but the k8s pod for another reason than the sysctl was not working on Ubuntu either. Miscommunication, that is all :)
Thanks, this report really helps A LOT. COS
Ohh, great to know what you are doing. Then, can you paste the I'm curious to understand what is GKE doing here. My guess is that the kubelet allows that unsafe sysctl to be used, and that a mutating webhook adds those sysctl to the pod or something like that. But unsure why if it is not safe on one node, how the hook realizes that... Maybe something completely different is happening? To verify this, can you:
....
Right, but as But what we are really interested at is the output of ....
I'm still curious on why you found this error when running manually. Is it something obvious (like not execute bit in the binary) or something like that, maybe due to some cp option missing? There must be some difference to when those flags are added by containerd (that seems to work) and when we add them manually here... Regarding cgroupsv1/v2: agree. It was relevant to know if you were using cgroups v1 as you could trigger some bugs with that, but if you are not using it, no need to try it out. Regarding the PR: my point was that we didn't know if this helps in any way to any real use case (we do know now). If we want to open the PR due to consistency, we should mention that. If we need these to make a real world OS work, we need to mention it. Until we know which case it is, we can't really open the PR and mention the reasons, so it can be properly reviewed (it is not the same to review the change and think this is needed to fix COS vs the author thinks this is nice to have)
Great! Then why was the permissions denied error caused before, have you figured it out? Regarding somax sysctlDo you want to investigate further what can we do and follow-up on that? I'll check what crun does, just in case, too. Regarding possible remount on runcDo you want to open an issue here in runc and ask about remounting with those flags, even if they are not specified? Crun (another OCI compatible runtime) is doing that: https://github.com/containers/crun/blob/main/src/libcrun/linux.c#L919-L946. We might want to do this in runc to keep compatibility with crun, maybe not. I think opening an issue to discuss with maintainers makes sense. If there is agreement on going down that route, if you want to implement it, it would be great! :) |
Hope i'm not adding too much entropy to this discussion, this issue piqued my interest and following @vinayakankugoyal steps i managed to also repro it on COS.
|
@lrascao That is a commit on containerd, right? It seems to be the one I wrote "cri: Support pods with user namespaces". Thanks for the effort. But it doesn't add any information really: before that commit user namespaces are not used, so all the user namespaces messages are ignored by containerd, so a regular pod is created. And with that commit, of course, the container with userns is created and due to the special mount options of COS, that fails in that environment. Thanks anyways :) |
I opened a discussion thread in runc regarding remounting bind mounts if they fail with the right options. #3801. |
@vinayakankugoyal friendly ping? I'll be busy with Kubecon next week, but wanted to re-bump this |
Description
Unable to run user-namespaced container.
My setup is
containerd v1.7.0 (which supports usernamespaces)
runc version 1.1.4
Here are the commandline args being passed to runc by containerd
Here is the config.json
Steps to reproduce the issue
With containerd 1.7.0 and runc 1.1.4 installed run the following:
create a container with the following config.json
Describe the results you received and expected
I get the following error:
expected:
No error. Non user namespace containers are able to run.
What version of runc are you using?
runc version 1.1.4
commit: v1.1.4-0-g5fd4c4d1
spec: 1.0.2-dev
go: go1.17.10
libseccomp: 2.5.4
Host OS information
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
GOOGLE_METRICS_PRODUCT_ID=26
KERNEL_COMMIT_ID=44456f0e9d2cd7a9616fb0d05bc4020237839a5a
GOOGLE_CRASH_ID=Lakitu
VERSION=101
VERSION_ID=101
BUILD_ID=17162.40.56
Host kernel information
Linux 5.15.65+ #1 SMP Sat Jan 21 10:12:05 UTC 2023 x86_64 Intel(R) Xeon(R) CPU @ 2.20GHz GenuineIntel GNU/Linux
The text was updated successfully, but these errors were encountered: