Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New BuildX Instance fails to execute, with cgroup errors. #5363

Closed
donovat opened this issue Aug 18, 2023 · 12 comments
Closed

New BuildX Instance fails to execute, with cgroup errors. #5363

donovat opened this issue Aug 18, 2023 · 12 comments
Assignees
Labels
kind/bug Something isn't working priority/1 Work should be fixed for next release triage/confirmed Issue has been reproduced by dev team
Milestone

Comments

@donovat
Copy link

donovat commented Aug 18, 2023

Actual Behavior

A newly created buildx instance, then fails with cgroup errors when trying to perform the build.

Steps to Reproduce

docker buildx create  --name amd64builder
docker buildx use amd64builder
docker buildx build  -t testbuild:00 --platform linux/amd64 --load .
[+] Building 80.0s (6/6) FINISHED                                                                                         docker-container:amd64builder
 => [internal] booting buildkit                                                                                                                   29.6s
 => => pulling image moby/buildkit:buildx-stable-1                                                                                                29.0s
 => => creating container buildx_buildkit_amd64builder0                                                                                            0.5s
 => [internal] load build definition from Dockerfile                                                                                               0.0s
 => => transferring dockerfile: 163B                                                                                                               0.0s
 => [internal] load metadata for registry.access.redhat.com/ubi8/python-39:1-57                                                                    5.7s
 => [internal] load .dockerignore                                                                                                                  0.0s
 => => transferring context: 2B                                                                                                                    0.0s
 => [1/2] FROM registry.access.redhat.com/ubi8/python-39:1-57@sha256:a72c25a5601c3f3c97ab056d3b9919370b659b94712e953f8c1654400ba7bada             44.2s
 => => resolve registry.access.redhat.com/ubi8/python-39:1-57@sha256:a72c25a5601c3f3c97ab056d3b9919370b659b94712e953f8c1654400ba7bada              0.0s
=> ERROR [2/2] RUN  python3 -m pip install  tornado                                                                                               0.3s
------                                                                                                                                                  
 > [2/2] RUN  python3 -m pip install  tornado:                                                                                                          
0.110 runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/6), flags: 0xf, data: openrc: invalid argument
------
Dockerfile:3
--------------------
   1 |     FROM  registry.access.redhat.com/ubi8/python-39:1-57
   2 |     
   3 | >>> RUN  python3 -m pip install  tornado
   4 |     
   5 |     CMD echo "Running on $(uname -m)"
--------------------
ERROR: failed to solve: process "/bin/sh -c python3 -m pip install  tornado" did not complete successfully: exit code: 1

Result

See Error above.

Expected Behavior

If we switch back to the default buildx instance...

docker buildx use default
docker buildx build  -t testbuild:00 --platform linux/amd64 --load .
[+] Building 6.5s (6/6) FINISHED                                                                                                         docker:default
 => [internal] load build definition from Dockerfile                                                                                               0.0s
 => => transferring dockerfile: 201B                                                                                                               0.0s
 => [internal] load .dockerignore                                                                                                                  0.0s
 => => transferring context: 2B                                                                                                                    0.0s
 => [internal] load metadata for registry.access.redhat.com/ubi8/python-39:1-57                                                                    0.4s
 => CACHED [1/2] FROM registry.access.redhat.com/ubi8/python-39:1-57@sha256:a72c25a5601c3f3c97ab056d3b9919370b659b94712e953f8c1654400ba7bada       0.0s
 => [2/2] RUN  python3 -m pip install  tornado                                                                                                     6.1s
 => exporting to image                                                                                                                             0.0s 
 => => exporting layers                                                                                                                            0.0s 
 => => writing image sha256:47e55eeb47ed62a067c532030e7bb4138bf5422ea2f83356217de84d171060e8                                                       0.0s 
 => => naming to docker.io/library/testbuild:00                                                                                                    0.0s 

Additional Information

docker info  
Client:
 Version:    24.0.2-rd
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.0
    Path:     /Users/timd/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.19.0
    Path:     /Users/timd/.docker/cli-plugins/docker-compose

Server:
 Containers: 20
  Running: 11
  Paused: 0
  Stopped: 9
 Images: 25
 Server Version: 23.0.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1fbd70374134b891f97ce19c70b6e50c7b9f4e0d
 runc version: 860f061b76bb4fc671f0f9e900f7d80ff93d4eb7
 init version: 
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 6.1.30-0-virt
 Operating System: Alpine Linux v3.18
 OSType: linux
 Architecture: aarch64
 CPUs: 2
 Total Memory: 5.781GiB
Name: lima-rancher-desktop
 ID: TDQE:24VE:6ERY:DELH:I73M:4YOH:SS6Z:6QPE:X2JN:LUYS:HKNT:62ZI
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 
$ docker buildx inspect default
Name:          default
Driver:        docker
Last Activity: 2023-08-18 09:04:57 +0000 UTC

Nodes:
Name:      default
Endpoint:  default
Status:    running
Buildkit:  v0.10.6+d52b2d5
Platforms: linux/arm64, linux/amd64, linux/amd64/v2
timd@Timothys-MacBook-Pro /tmp % docker buildx ls
NAME/NODE         DRIVER/ENDPOINT             STATUS  BUILDKIT        PLATFORMS
amd64builder      docker-container                                    
  amd64builder0   unix:///var/run/docker.sock running v0.12.1         linux/arm64, linux/amd64, linux/amd64/v2
default *         docker                                              
  default         default                     running v0.10.6+d52b2d5 linux/arm64, linux/amd64, linux/amd64/v2
rancher-desktop   docker                                              
  rancher-desktop rancher-desktop             running v0.10.6+d52b2d5 linux/arm64, linux/amd64, linux/amd64/v2

$ docker buildx inspect default
Name:          default
Driver:        docker
Last Activity: 2023-08-18 09:04:57 +0000 UTC

Nodes:
Name:      default
Endpoint:  default
Status:    running
Buildkit:  v0.10.6+d52b2d5
Platforms: linux/arm64, linux/amd64, linux/amd64/v2

docker buildx inspect amd64builder 
Name:          amd64builder
Driver:        docker-container
Last Activity: 2023-08-18 09:01:09 +0000 UTC

Nodes:
Name:      amd64builder0
Endpoint:  unix:///var/run/docker.sock
Status:    running
Buildkit:  v0.12.1
Platforms: linux/arm64, linux/amd64, linux/amd64/v2
Labels:
 org.mobyproject.buildkit.worker.executor:         oci
 org.mobyproject.buildkit.worker.hostname:         a36f2b60e432
 org.mobyproject.buildkit.worker.network:          host
 org.mobyproject.buildkit.worker.oci.process-mode: sandbox
 org.mobyproject.buildkit.worker.selinux.enabled:  false
 org.mobyproject.buildkit.worker.snapshotter:      overlayfs
GC Policy rule#0:
 All:           false
 Filters:       type==source.local,type==exec.cachemount,type==source.git.checkout
 Keep Duration: 48h0m0s
 Keep Bytes:    488.3MiB
GC Policy rule#1:
 All:           false
 Keep Duration: 1440h0m0s
 Keep Bytes:    9.313GiB
GC Policy rule#2:
 All:        false
 Keep Bytes: 9.313GiB
GC Policy rule#3:
 All:        true
 Keep Bytes: 9.313GiB

Rancher Desktop Version

24.0.2-rd

Rancher Desktop K8s Version

1.9.1

Which container engine are you using?

moby (docker cli)

What operating system are you using?

macOS

Operating System / Build Version

Apple M1 Max - 64GB - macOS - 13.4.1 Ventura

What CPU architecture are you using?

arm64 (Apple Silicon)

Linux only: what package format did you use to install Rancher Desktop?

AppImage

@donovat donovat added the kind/bug Something isn't working label Aug 18, 2023
@rak-phillip rak-phillip added triage/next-candidate Discuss if it should be moved to "Next" milestone triage/confirmed Issue has been reproduced by dev team labels Aug 18, 2023
@rak-phillip
Copy link
Contributor

@donovat thanks for raising this issue and helping to make Rancher Desktop better.

I can confirm that we are able to repro in Linux on openSUSE Tumbleweed with the provided steps and the following Dockerfile

FROM  registry.access.redhat.com/ubi8/python-39:1-57
RUN  python3 -m pip install  tornado
CMD echo "Running on $(uname -m)"

@rak-phillip
Copy link
Contributor

These look to be related:

#5092 (comment)
#4141

@mook-as
Copy link
Contributor

mook-as commented Aug 18, 2023

runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/6), flags: 0xf, data: openrc: invalid argument

I believe I was discussing with @Nino-K about something similar, there was something wrong with the last buildkit bump that broke things; however, that was buildkit 0.12.0, whereas this is using 0.10.6, so it's a different (but probably related) issue.

@donovat
Copy link
Author

donovat commented Aug 18, 2023

If you notice, the buildx instance that works default is version Buildkit: v0.10.6+d52b2d5.
But the version that fails amd64builder0 is version Buildkit: v0.12.1
so could be related to the bump noted above.

@haljoh
Copy link

haljoh commented Aug 22, 2023

Hi,
I am facing a similar problem:

I am running rancher-desktop on a Mac M1 and I am trying to build a compose file that includes:

platforms:
  - linux/amd64
  - linux/arm64/v8

When running docker compose build --push I get the following error:

=> ERROR [sqldb linux/amd64  2/14] RUN mkdir -p /usr/config                                                                                                                          0.1s
 => CANCELED [sqldb linux/arm64  2/14] RUN mkdir -p /usr/config                                                                                                                       0.0s
------
 > [sqldb linux/amd64  2/14] RUN mkdir -p /usr/config:
0.137 runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/6), flags: 0xf, data: openrc: invalid argument
------
failed to solve: process "/dev/.buildkit_qemu_emulator /bin/sh -c mkdir -p /usr/config" did not complete successfully: exit code: 1

I have tried factory reset but that did not resolve the problem. I have also verified that rancher-desktop does not show any errors in the diagnostics page.

If I switch do docker desktop it works fine and I can build without any problems. In docker desktop I can see that version v0.12.1 of buildkit is being used, which works fine.

@jandubois jandubois added this to the 1.11 milestone Aug 23, 2023
@jandubois jandubois removed the triage/next-candidate Discuss if it should be moved to "Next" milestone label Aug 23, 2023
@jandubois jandubois self-assigned this Aug 23, 2023
@jandubois jandubois modified the milestones: 1.11, 1.10 Aug 23, 2023
Pehesi97 added a commit to nearform/initium-platform that referenced this issue Aug 24, 2023
@ngearhart
Copy link

From this issue:

docker buildx create \
              --name fixed_builder \
              --driver-opt 'image=moby/buildkit:v0.12.1-rootless' \
              --bootstrap --use

Is a workaround.

@jandubois
Copy link
Member

I believe this has been fixed by #5400.

@donovat
Copy link
Author

donovat commented Aug 29, 2023

Hi @jandubois - when will the fix be in the main branch?
I can see the above work around is ok, but until in an update, it will continue to fail.

@jandubois
Copy link
Member

when will the fix be in the main branch?

The fix is in the main branch already. It will be in the 1.10 release.

Here is a maybe better workaround, so you don't need to modify your buildx commandline:

rdctl shell sudo sed -E -i 's/#(rc_cgroup_mode).*/\1="unified"/' /etc/rc.conf

You need to reboot Rancher Desktop after running the command above. It will remain active until you do a factory-reset.

@jandubois
Copy link
Member

jandubois commented Sep 1, 2023

Here is a maybe better workaround, so you don't need to modify your buildx commandline:

rdctl shell sudo sed -E -i 's/#(rc_cgroup_mode).*/\1="unified"/' /etc/rc.conf

We have discovered a number of regressions associated with this change, including:

  • it breaks k3d
  • it breaks the rancher/rancher image (running Rancher Manager inside a container)
  • it breaks compatibility with k3s versions older than v1.20.0

This fix was also not sufficient to fix kind.

For that reason we are reverting Rancher Desktop 1.10 back to cgroup v1 and will try to find a better fix.

If none of the regressions above matter to you, than this still allows you to switch Rancher Desktop to cgroup v2 locally. Just remember that a factory reset will undo the change.

@jandubois jandubois reopened this Sep 1, 2023
@jandubois jandubois modified the milestones: 1.10, 1.11 Sep 1, 2023
jandubois added a commit to jandubois/rancher-desktop that referenced this issue Sep 5, 2023
Signed-off-by: Jan Dubois <jan.dubois@suse.com>
ericpromislow added a commit that referenced this issue Sep 5, 2023
jandubois added a commit to jandubois/rancher-desktop that referenced this issue Sep 5, 2023
This is just a workaround until rancher-sandbox#5363 has been fixed. For that
reason the hack has been duplicated into each file instead of
abstracting it away in the helpers.

Signed-off-by: Jan Dubois <jan.dubois@suse.com>
@jandubois jandubois added the priority/1 Work should be fixed for next release label Sep 6, 2023
@marcindulak
Copy link

I'm runing rancher desktop version 1.10.0, with dockerd (moby) container engine, on an up to date M1 Ventura 13.6, and still getting the error mentioned in the original post.

runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/6), flags: 0xf, data: openrc: invalid argument

I'm using an older "node:14.17.0-alpine3.10" image, and the error is thrown at "RUN apk --no-cache add git==2.25.5-r0 bash=5.0.0-r0"

@mook-as
Copy link
Contributor

mook-as commented Oct 16, 2023

This should be fixed as of #5647 because we hacked up the cgroups service from OpenRC to not create the problematic cgroup. (We have also pushed a fix upstream to buildkit#4308, but that doesn't appear to have been released yet.)

Closing this and moving it to the Verify column for that reason; this should be available in the next release of Rancher Desktop.

@mook-as mook-as closed this as completed Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working priority/1 Work should be fixed for next release triage/confirmed Issue has been reproduced by dev team
Projects
None yet
Development

No branches or pull requests

7 participants