Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Incompatibility] Change in actions/runner-images breaks devcontainers if image has user with UID 1001 or group with GID 999 #723

Closed
rradczewski opened this issue Aug 30, 2023 · 6 comments

Comments

@rradczewski
Copy link

This is not a bug, but a recently introduced incompatibility between the provided devcontainer images and using them on github actions. I'm filing it hoping it will help others identify the problem they're experiencing. Feel free to close this issue.

I'm using devcontainer images to provide turnkey project boilerplates for coding exercises in rradczewski/kata-bootstraps. Recently, builds started to fail with permission errors (see https://github.com/rradczewski/kata-bootstraps/actions/runs/6025462340).

The issue seems to arise from ghcr.io/devcontainers/features/node:1 creating a system group that will be assigned GID 999, which, after a recent update to the GHA image ubuntu-latest, will coincidentally be the GID the GHA user is running with (uid=1001(runner) gid=999(docker) groups=999(docker),4(adm),101(systemd-journal) as per https://github.com/rradczewski/kata-bootstraps-reproducer/actions/runs/6025545201/job/16346481450#step:2:5).

This means the updateUID script will no longer update the remoteUser to match the host user and thus will lead to the permission errors.

Affected images

At least all images that include ghcr.io/devcontainers/features/node:1 in their devcontainer.json will most likely be affected. The javascript-node image further creates another npm group which faces the same issue.

Reproducer

A minimal reproducer using busybox is available in https://github.com/rradczewski/kata-bootstraps-reproducer/tree/1037fdcbe0c7db88d765646a6842acd222b4af70 - see this run for a good example: https://github.com/rradczewski/kata-bootstraps-reproducer/actions/runs/6025545201/job/16346488288

Other issues

Workaround

As per microsoft/vscode-remote-release#7284 (specifically this comment) a workaround is to delete the groups using GID 999 in the Dockerfile that is used to create the container, or to specifically avoid the GID 999 when creating groups.

@samruddhikhandale
Copy link
Member

Hi 👋

Thanks for the details information, I am sure people running into similar issues would benefit from it! ✨

This means the updateUID script will no longer update the remoteUser to match the host user and thus will lead to the permission errors.

Dev container Features and Images repo uses the mentioned ^ images within GitHub Actions (Examples - smoke test - workflow, and sample action-run). I wonder why don't we see such errors 🤔

This means the updateUID script will no longer update the remoteUser to match the host user and thus will lead to the permission errors.

"remoteUser": "root",

@rradczewski Does things work if you comment this ^ line? https://github.com/rradczewski/kata-bootstraps/blob/main/dotnet/.devcontainer.json#L4

@samruddhikhandale
Copy link
Member

samruddhikhandale commented Aug 31, 2023

Ah, we are seeing such issues in this repo as well. See https://github.com/devcontainers/images/actions/runs/6014530431/job/16314428455

From actions/runner-images#8160 (comment)

issue seems to be in conflicting groups on host (uid=1001(runner) gid=998(docker) groups=998(docker)) and container (uid=1000(node) gid=1000(node) groups=1000(node),998(nvm),999(npm)). With microsoft/vscode-remote-release#7284 (comment) suggestion in devcontainers, I was able to finish builds after additions to Dockerfile:
on ubuntu20 runner - RUN delgroup nvm
on ubuntu22 runner - RUN delgroup npm

I believe we need to update the uids and gids used by the dev container images which are conflicting with action runners (Unless the action-runners team reverts these updates, I'll have an async chat with them)
// cc @Chuxel @bamurtaugh

@rradczewski
Copy link
Author

rradczewski commented Aug 31, 2023

Dev container Features and Images repo uses the mentioned ^ images within GitHub Actions (Examples - smoke test - workflow, and sample action-run). I wonder why don't we see such errors 🤔

The log for that run is rather confusing to me. I would expect the updateUID script to run, thus to find a docker build with --build-arg NEW_UID=$XYZ, but I'm not seeing that. I suspect your test for the ruby image passes because it's never actually creating any files in the mounted workspace. From a quick glance, I don't see any bundle install etc. happening.

"remoteUser": "root",

@rradczewski Does things work if you comment this ^ line? rradczewski/kata-bootstraps@main/dotnet/.devcontainer.json#L4

I temporarily run all devcontainers with root, circumventing the issue. This is not a longterm solution because people running this locally will have all their node_modules/ etc. files created as root:root, which I think is not acceptable. I'm working on a fix on this branch: https://github.com/rradczewski/kata-bootstraps/tree/fix/remoteuser

Ah, we are seeing such issues in this repo as well. See devcontainers/images/actions/runs/6014530431/job/16314428455

All images with a group with GID=999 existing should be affected, as this prevents the updateUID script from mapping the requested remoteUser (usually UID=1000) to the host user (in GHA right now that's UID=1001). updateUID will not do anything if the GID already exists (https://github.com/devcontainers/cli/blob/7866e193a49a31ab542c6d3bbe34cc99bcf58dc9/scripts/updateUID.Dockerfile#L21-L23), so even if the remoteUser is in the correct group, they will only get readonly access to the workspace.


In the reproducer repository (https://github.com/rradczewski/kata-bootstraps-reproducer/), I use a container with the problematic users and gids existing. One with the remoteUser set to node, (implicitly, see the Dockerfile and https://github.com/rradczewski/kata-bootstraps-reproducer/blob/main/user_busybox_node_user/.devcontainer.json) and one where it is set to root (https://github.com/rradczewski/kata-bootstraps-reproducer/blob/main/root_busybox_node_user/.devcontainer.json).

I believe we need to update the uids and gids used by the dev container images which are conflicting with action runners (Unless the action-runners team reverts these updates, I'll have an async chat with them)

A tricky situation, I agree. A policy for devcontainers/images to avoid creating groups and users might be more appropriate, or to allocate only UIDs and GIDs that are less likely to cause conflicts (e.g. UIDs > 1100, and system GIDs < 900)? The ultimate solution would be for ID-mapping or user namespaces to be more widely adopted and easier to use (without having to reconfigure dockerd or run a different container runtime), but that doesn't seem to be coming anytime soon.

@samruddhikhandale
Copy link
Member

Quick update: Investigated this issue with the dev containers team, few findings -

@rradczewski
Copy link
Author

The dev container CLI should have automatic UID/GID syncing, unless the UID or GIDs were already in use

Yepp, that's what updateRemoteUserUID should've been doing.

From
actions/runner-images#8157, Docker's group was 999, and they then update the user's default group to that rather than adding the docker group to the user

I think I understood this differently, as in, the docker group used to be 123, which didn't exist in any of the devcontainer/images, so creating it as part of the auto sync wasn't an issue. (See this comment). Now when that changed to be 999, anything that would just create a system group like the features/node feature would create a group with that GID and cause the automatic UID/GID syncing to stop working because that GID already exists in the devcontainer/image.

However, we could update our Features to use system groups with lower IDs to try to avoid this in the future. We do this pretty frequently to deal with priv issues

I agree this is the way to go for the time being and appreciate the effort, otherwise it's gonna keep popping up. This would probably also be necessary for images that create extra groups like javascript-node.

Thanks a lot for the quick response!

@samruddhikhandale
Copy link
Member

samruddhikhandale commented Sep 11, 2023

Newer ubuntu (action runner) images were released last week with actions/runner-images#8201 changes. This has fixed the ID problems we have been seeing.

Also, on the CLI side, we have devcontainers/cli#635 opened which fixes microsoft/vscode-remote-release#7284

Feel free to reopen if the issue resurfaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants