Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qemu 5.2 regression causes arm64 detection flakiness (Reverted) #542

Closed
sdwr98 opened this issue Feb 12, 2021 · 26 comments
Closed

Qemu 5.2 regression causes arm64 detection flakiness (Reverted) #542

sdwr98 opened this issue Feb 12, 2021 · 26 comments

Comments

@sdwr98
Copy link

sdwr98 commented Feb 12, 2021

I'm using buildx to do multiplatform builds in our CI environment and recently we started getting some odd errors. I'm able to reproduce this locally with this simple Dockerfile:

FROM amazonlinux
RUN yum -y update

I set up my multiplatform build this way:

docker buildx create --platform linux/amd64,linux/arm64 --use

Then run the build command. When I run it in multi-platform, I get an error:

[vagrant@vagrant temp]$ docker buildx build  --pull --no-cache --platform linux/amd64,linux/arm64 -t motus/testbuild .
WARN[0000] No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
[+] Building 0.8s (8/8) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                 0.0s
 => => transferring dockerfile: 31B                                                                                                                                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                    0.0s
 => => transferring context: 2B                                                                                                                                                                                                                      0.0s
 => [linux/arm64 internal] load metadata for docker.io/library/amazonlinux:latest                                                                                                                                                                    0.1s
 => [linux/amd64 internal] load metadata for docker.io/library/amazonlinux:latest                                                                                                                                                                    0.1s
 => CACHED [linux/amd64 1/2] FROM docker.io/library/amazonlinux@sha256:ed6a24ee79bb52f2308a20fb20e48b73cfa0b65e89d8a84b6eb68738d4a16152                                                                                                              0.0s
 => => resolve docker.io/library/amazonlinux@sha256:ed6a24ee79bb52f2308a20fb20e48b73cfa0b65e89d8a84b6eb68738d4a16152                                                                                                                                 0.0s
 => CACHED [linux/arm64 1/2] FROM docker.io/library/amazonlinux@sha256:ed6a24ee79bb52f2308a20fb20e48b73cfa0b65e89d8a84b6eb68738d4a16152                                                                                                              0.0s
 => => resolve docker.io/library/amazonlinux@sha256:ed6a24ee79bb52f2308a20fb20e48b73cfa0b65e89d8a84b6eb68738d4a16152                                                                                                                                 0.0s
 => CANCELED [linux/amd64 2/2] RUN yum -y update                                                                                                                                                                                                     0.6s
 => ERROR [linux/arm64 2/2] RUN yum -y update                                                                                                                                                                                                        0.5s
------
 > [linux/arm64 2/2] RUN yum -y update:
#8 0.370 /usr/bin/python: can't open file 'yum': [Errno 2] No such file or directory
------
Dockerfile:3
--------------------
   1 |     FROM amazonlinux
   2 |
   3 | >>> RUN yum -y update
   4 |
   5 |
--------------------
error: failed to solve: rpc error: code = Unknown desc = executor failed running [/dev/.buildkit_qemu_emulator /bin/sh -c yum -y update]: exit code: 2

but when I run it for just arm64 it works:

[vagrant@vagrant temp]$ docker buildx build  --pull --no-cache --platform linux/arm64 -t motus/testbuild .
WARN[0000] No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
[+] Building 12.6s (5/5) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                 0.0s
 => => transferring dockerfile: 31B                                                                                                                                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                    0.0s
 => => transferring context: 2B                                                                                                                                                                                                                      0.0s
 => [internal] load metadata for docker.io/library/amazonlinux:latest                                                                                                                                                                                0.3s
 => CACHED [1/2] FROM docker.io/library/amazonlinux@sha256:ed6a24ee79bb52f2308a20fb20e48b73cfa0b65e89d8a84b6eb68738d4a16152                                                                                                                          0.0s
 => => resolve docker.io/library/amazonlinux@sha256:ed6a24ee79bb52f2308a20fb20e48b73cfa0b65e89d8a84b6eb68738d4a16152                                                                                                                                 0.0s
 => [2/2] RUN yum -y update                                                                                                                                                                                                                         12.2s

Any thoughts on this? I have the qemu aarch64 emulator installed.

Here's my docker info:

[vagrant@vagrant temp]$ docker system info
Client:
 Debug Mode: false
 Plugins:
  buildx: Build with BuildKit (Docker Inc., v0.5.1)

Server:
 Containers: 63
  Running: 1
  Paused: 0
  Stopped: 62
 Images: 148
 Server Version: 19.03.13-ce
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: c623d1b36f09f8ef6536a057bd658b3aa8632828
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0 (expected: fec3683)
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.14.214-160.339.amzn2.x86_64
 Operating System: Amazon Linux 2
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 22.98GiB
 Name: vagrant
 ID: MDVE:KCFM:HGLC:XUYO:OXRX:G4SG:Y74M:7GAM:QE3H:K364:YFYL:SH3X
 Docker Root Dir: /vagrant-storage/docker
 Debug Mode: false
 Username: srankin
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
@tonistiigi
Copy link
Member

That it is trying to use /dev/.buildkit_qemu_emulator is a sign that the emulator is not installed properly to be used by the containers. https://github.com/tonistiigi/binfmt/ for the recommended install method.

No idea why it works on single run though, if it is the same configuration.

If you can't figure it out post output of buildx inspect/ls and tonistiigi/binfmt.

@sdwr98
Copy link
Author

sdwr98 commented Feb 13, 2021

We do have QEMU installed via this method:

docker run --privileged --rm tonistiigi/binfmt --install all

Here's the other info you requested:

[vagrant@vagrant ~]$ docker buildx ls
NAME/NODE         DRIVER/ENDPOINT             STATUS  PLATFORMS
elegant_rhodes *  docker-container
  elegant_rhodes0 unix:///var/run/docker.sock running linux/amd64*, linux/arm64*, linux/386, linux/arm/v7, linux/arm/v6, linux/riscv64, linux/ppc64le, linux/s390x
default           docker
  default         default                     running linux/amd64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
[vagrant@vagrant ~]$ docker buildx inspect elegant_rhodes
Name:   elegant_rhodes
Driver: docker-container

Nodes:
Name:      elegant_rhodes0
Endpoint:  unix:///var/run/docker.sock
Status:    running
Platforms: linux/amd64*, linux/arm64*, linux/386, linux/arm/v7, linux/arm/v6, linux/riscv64, linux/ppc64le, linux/s390x
[vagrant@vagrant ~]$ docker run --privileged --rm tonistiigi/binfmt
{
  "supported": [
    "linux/amd64",
    "linux/ppc64le",
    "linux/s390x",
    "linux/386",
    "linux/arm/v7",
    "linux/arm/v6"
  ],
  "emulators": [
    "qemu-aarch64",
    "qemu-arm",
    "qemu-i386",
    "qemu-mips64",
    "qemu-mips64el",
    "qemu-ppc64le",
    "qemu-riscv64",
    "qemu-s390x"
  ]
}

@tonistiigi
Copy link
Member

If you look at the latest output then you have emulator called qemu-aarch64 installed but no support for running linux/arm64. So I assume this was installed with some other method and is not being picked up.

Run

docker run -it --rm --privileged tonistiigi/binfmt --uninstall qemu-aarch64
docker run -it --rm --privileged tonistiigi/binfmt --install arm64

and see if it changes.

@jlesage
Copy link

jlesage commented Feb 13, 2021

I'm also seeing the same issue with my GitHub workflow. The docker/setup-qemu-action action is used to setup QEMU and docker/setup-buildx-action to setup buildx.

As you mentioned, qemu-aarch64 is installed but no support for running linux/arm64:

2021-02-13T22:39:50.7920712Z 💎 Installing QEMU static binaries...
2021-02-13T22:39:50.7974478Z [command]/usr/bin/docker run --rm --privileged tonistiigi/binfmt:latest --install arm,arm64,ppc64le,mips64,s390x
2021-02-13T22:39:50.8341994Z Unable to find image 'tonistiigi/binfmt:latest' locally
2021-02-13T22:39:51.6194008Z latest: Pulling from tonistiigi/binfmt
2021-02-13T22:39:51.8126259Z 41ac47e7c2ad: Pulling fs layer
2021-02-13T22:39:51.8126947Z bad951d4a57c: Pulling fs layer
2021-02-13T22:39:52.1739534Z bad951d4a57c: Verifying Checksum
2021-02-13T22:39:52.1740972Z bad951d4a57c: Download complete
2021-02-13T22:39:52.2839528Z 41ac47e7c2ad: Verifying Checksum
2021-02-13T22:39:52.2840145Z 41ac47e7c2ad: Download complete
2021-02-13T22:39:52.6107772Z 41ac47e7c2ad: Pull complete
2021-02-13T22:39:52.7352258Z bad951d4a57c: Pull complete
2021-02-13T22:39:52.7462173Z Digest: sha256:aeeef8b7f7fa80ed9316ec1cc5793072e322ee69590b684ea4195445851f1edf
2021-02-13T22:39:52.7486217Z Status: Downloaded newer image for tonistiigi/binfmt:latest
2021-02-13T22:39:53.5885624Z 2021/02/13 22:39:53 installing: arm OK
2021-02-13T22:39:53.5886192Z 2021/02/13 22:39:53 installing: arm64 OK
2021-02-13T22:39:53.5886636Z 2021/02/13 22:39:53 installing: ppc64le OK
2021-02-13T22:39:53.5887167Z 2021/02/13 22:39:53 installing: mips64 OK
2021-02-13T22:39:53.5887676Z 2021/02/13 22:39:53 installing: s390x OK
2021-02-13T22:39:53.9516310Z {
2021-02-13T22:39:53.9516817Z   "supported": [
2021-02-13T22:39:53.9517243Z     "linux/amd64",
2021-02-13T22:39:53.9517692Z     "linux/ppc64le",
2021-02-13T22:39:53.9518107Z     "linux/s390x",
2021-02-13T22:39:53.9518568Z     "linux/386",
2021-02-13T22:39:53.9518949Z     "linux/mips64",
2021-02-13T22:39:53.9519358Z     "linux/arm/v7",
2021-02-13T22:39:53.9519743Z     "linux/arm/v6"
2021-02-13T22:39:53.9520100Z   ],
2021-02-13T22:39:53.9520572Z   "emulators": [
2021-02-13T22:39:53.9520932Z     "cli",
2021-02-13T22:39:53.9522010Z     "llvm-10-runtime.binfmt",
2021-02-13T22:39:53.9522647Z     "llvm-8-runtime.binfmt",
2021-02-13T22:39:53.9523252Z     "llvm-9-runtime.binfmt",
2021-02-13T22:39:53.9523675Z     "python2.7",
2021-02-13T22:39:53.9524001Z     "python3.8",
2021-02-13T22:39:53.9524464Z     "qemu-aarch64",
2021-02-13T22:39:53.9524906Z     "qemu-arm",
2021-02-13T22:39:53.9525358Z     "qemu-mips64",
2021-02-13T22:39:53.9525827Z     "qemu-ppc64le",
2021-02-13T22:39:53.9526290Z     "qemu-s390x"
2021-02-13T22:39:53.9526581Z   ]
2021-02-13T22:39:53.9526825Z }
2021-02-13T22:39:54.0862753Z 🛒 Extracting available platforms...
2021-02-13T22:39:54.7548198Z linux/amd64,linux/arm64,linux/ppc64le,linux/386,linux/mips64,linux/arm/v7,linux/arm/v6

Here is the error that occurred during build of the images:

2021-02-13T22:41:20.7716630Z #50 [linux/arm64 stage-3  7/10] RUN     /usr/bin/add-pkg         tzdata
2021-02-13T22:41:20.7717738Z #50 sha256:75d1344563df9ebcb12e099a46eaf49f6e05430c63642a7eb2f893d165108ba7
2021-02-13T22:41:20.7718732Z #50 22.05 Fetched 295 kB in 1s (350 kB/s)
2021-02-13T22:41:20.8872590Z #50 22.12 Error while loading /usr/sbin/dpkg-split: No such file or directory
2021-02-13T22:41:20.8873922Z #50 22.12 Error while loading /usr/sbin/dpkg-deb: No such file or directory
2021-02-13T22:41:20.8875214Z #50 22.13 dpkg: error processing archive /var/cache/apt/archives/tzdata_2021a-0ubuntu0.20.04_all.deb (--unpack):
2021-02-13T22:41:20.8876463Z #50 22.13  dpkg-deb --control subprocess returned error exit status 1
2021-02-13T22:41:20.8877335Z #50 22.14 Errors were encountered while processing:
2021-02-13T22:41:20.8878380Z #50 22.14  /var/cache/apt/archives/tzdata_2021a-0ubuntu0.20.04_all.deb
2021-02-13T22:41:21.0278608Z #50 22.23 E: Sub-process /usr/bin/dpkg returned an error code (1)
2021-02-13T22:41:21.0279850Z #50 22.27 /bin/sh: 0: Can't open which
2021-02-13T22:41:21.4204700Z #50 ERROR: executor failed running [/dev/.buildkit_qemu_emulator /bin/sh -c /usr/bin/add-pkg         tzdata]: exit code: 100

If you want to view the workflow: https://github.com/jlesage/docker-baseimage/runs/1895262162?check_suite_focus=true

@tonistiigi
Copy link
Member

@jlesage can you run docker run --rm --privileged tonistiigi/binfmt:latest before trying to install emulators. I'd like to see what is already in the system. Other possibility is that the other emulators are colliding with the arm64 binfmt mask real emulator does not get called. There was a new release of binfmt image this week, that updated to qemu 5.2 and added mips. I checked the mask for mips and it looks ok to me and image seems to work well for me. If you want another test point you can try removing mips from the --install and see if it makes any difference.

@jlesage
Copy link

jlesage commented Feb 14, 2021

I did the requested tests locally on my machine, since I seem to have the same issue. I first clear all emulators from the system. So from a clean state:

$ ls /proc/sys/fs/binfmt_misc/
jar  python2.7  python3.5  python3.6  register  status
$ docker run --rm --privileged tonistiigi/binfmt:latest
{
  "supported": [
    "linux/amd64",
    "linux/386"
  ],
  "emulators": [
    "jar",
    "python2.7",
    "python3.5",
    "python3.6"
  ]
}

Then, installing all emulators:

$ docker run --rm --privileged tonistiigi/binfmt:latest --install all
2021/02/14 03:27:34 installing: mips64el OK
2021/02/14 03:27:34 installing: mips64 OK
2021/02/14 03:27:34 installing: riscv64 OK
2021/02/14 03:27:34 installing: i386 OK
2021/02/14 03:27:34 installing: arm OK
2021/02/14 03:27:34 installing: s390x OK
2021/02/14 03:27:34 installing: ppc64le OK
2021/02/14 03:27:34 installing: arm64 OK
{
  "supported": [
    "linux/amd64",
    "linux/386",
    "linux/mips64le",
    "linux/arm/v7",
    "linux/arm/v6"
  ],
  "emulators": [
    "jar",
    "python2.7",
    "python3.5",
    "python3.6",
    "qemu-aarch64",
    "qemu-arm",
    "qemu-i386",
    "qemu-mips64",
    "qemu-mips64el",
    "qemu-ppc64le",
    "qemu-riscv64",
    "qemu-s390x"
  ]
}

I also tried to remove mips64 from the emulators to install (again, starting from a fresh state):

$ docker run --rm --privileged tonistiigi/binfmt:latest --install arm,arm64,ppc64le,s390x
2021/02/14 03:28:58 installing: arm OK
2021/02/14 03:28:58 installing: arm64 OK
2021/02/14 03:28:58 installing: ppc64le OK
2021/02/14 03:28:58 installing: s390x OK
{
  "supported": [
    "linux/amd64",
    "linux/ppc64le",
    "linux/386",
    "linux/arm/v7",
    "linux/arm/v6"
  ],
  "emulators": [
    "jar",
    "python2.7",
    "python3.5",
    "python3.6",
    "qemu-aarch64",
    "qemu-arm",
    "qemu-ppc64le",
    "qemu-s390x"
  ]
}

Installing only arm64 is not better:

$ docker run --rm --privileged tonistiigi/binfmt:latest --install arm64
2021/02/14 03:33:44 installing: arm64 OK
{
  "supported": [
    "linux/amd64",
    "linux/386"
  ],
  "emulators": [
    "jar",
    "python2.7",
    "python3.5",
    "python3.6",
    "qemu-aarch64"
  ]
}

@jlesage
Copy link

jlesage commented Feb 14, 2021

On Github workflow, before install QEMU emulators:

2021-02-14T03:46:25.1011499Z Unable to find image 'tonistiigi/binfmt:latest' locally
2021-02-14T03:46:25.2815492Z latest: Pulling from tonistiigi/binfmt
2021-02-14T03:46:25.3493708Z 41ac47e7c2ad: Pulling fs layer
2021-02-14T03:46:25.3494416Z bad951d4a57c: Pulling fs layer
2021-02-14T03:46:25.4257776Z bad951d4a57c: Verifying Checksum
2021-02-14T03:46:25.4259469Z bad951d4a57c: Download complete
2021-02-14T03:46:25.4993738Z 41ac47e7c2ad: Verifying Checksum
2021-02-14T03:46:25.4998207Z 41ac47e7c2ad: Download complete
2021-02-14T03:46:25.8354546Z 41ac47e7c2ad: Pull complete
2021-02-14T03:46:25.9541848Z bad951d4a57c: Pull complete
2021-02-14T03:46:25.9597979Z Digest: sha256:aeeef8b7f7fa80ed9316ec1cc5793072e322ee69590b684ea4195445851f1edf
2021-02-14T03:46:25.9616686Z Status: Downloaded newer image for tonistiigi/binfmt:latest
2021-02-14T03:46:27.8037888Z {
2021-02-14T03:46:27.8042752Z   "supported": [
2021-02-14T03:46:27.8043470Z     "linux/amd64",
2021-02-14T03:46:27.8044203Z     "linux/386"
2021-02-14T03:46:27.8044826Z   ],
2021-02-14T03:46:27.8045453Z   "emulators": [
2021-02-14T03:46:27.8046053Z     "cli",
2021-02-14T03:46:27.8047224Z     "llvm-10-runtime.binfmt",
2021-02-14T03:46:27.8048271Z     "llvm-8-runtime.binfmt",
2021-02-14T03:46:27.8049161Z     "llvm-9-runtime.binfmt",
2021-02-14T03:46:27.8049842Z     "python2.7",
2021-02-14T03:46:27.8050372Z     "python3.8"
2021-02-14T03:46:27.8050908Z   ]
2021-02-14T03:46:27.8051329Z }

@jlesage
Copy link

jlesage commented Feb 14, 2021

In the following run: https://github.com/jlesage/docker-baseimage/runs/1896128490?check_suite_focus=true
arm64 is now shown as supported:

2021-02-14T03:46:30.7959331Z {
2021-02-14T03:46:30.7959830Z   "supported": [
2021-02-14T03:46:30.7960243Z     "linux/amd64",
2021-02-14T03:46:30.7960651Z     "linux/arm64",
2021-02-14T03:46:30.7961061Z     "linux/ppc64le",
2021-02-14T03:46:30.7961492Z     "linux/s390x",
2021-02-14T03:46:30.7961860Z     "linux/386",
2021-02-14T03:46:30.7962260Z     "linux/mips64",
2021-02-14T03:46:30.7962676Z     "linux/arm/v7",
2021-02-14T03:46:30.7963073Z     "linux/arm/v6"
2021-02-14T03:46:30.7963426Z   ],
2021-02-14T03:46:30.7963803Z   "emulators": [
2021-02-14T03:46:30.7964180Z     "cli",
2021-02-14T03:46:30.7965269Z     "llvm-10-runtime.binfmt",
2021-02-14T03:46:30.7966032Z     "llvm-8-runtime.binfmt",
2021-02-14T03:46:30.7966795Z     "llvm-9-runtime.binfmt",
2021-02-14T03:46:30.7967302Z     "python2.7",
2021-02-14T03:46:30.7967704Z     "python3.8",
2021-02-14T03:46:30.7968278Z     "qemu-aarch64",
2021-02-14T03:46:30.7968826Z     "qemu-arm",
2021-02-14T03:46:30.7969388Z     "qemu-mips64",
2021-02-14T03:46:30.7969972Z     "qemu-ppc64le",
2021-02-14T03:46:30.7970554Z     "qemu-s390x"
2021-02-14T03:46:30.7970912Z   ]
2021-02-14T03:46:30.7971233Z }

But build fails with same error:

2021-02-14T03:47:56.1183272Z #50 [linux/arm64 stage-3  7/10] RUN     /usr/bin/add-pkg         tzdata
2021-02-14T03:47:56.1184640Z #50 sha256:75d1344563df9ebcb12e099a46eaf49f6e05430c63642a7eb2f893d165108ba7
2021-02-14T03:47:56.1185874Z #50 17.49 Reading package lists...
2021-02-14T03:47:56.8687433Z #50 21.40 Building dependency tree...
2021-02-14T03:47:56.8688084Z #50 22.05 Reading state information...
2021-02-14T03:47:57.4693448Z #50 22.65 The following NEW packages will be installed:
2021-02-14T03:47:57.4694159Z #50 22.65   tzdata
2021-02-14T03:47:57.7696955Z #50 22.98 0 upgraded, 1 newly installed, 0 to remove and 5 not upgraded.
2021-02-14T03:47:57.7697696Z #50 22.98 Need to get 295 kB of archives.
2021-02-14T03:47:57.7698401Z #50 22.98 After this operation, 4033 kB of additional disk space will be used.
2021-02-14T03:47:57.7700215Z #50 22.98 Get:1 http://ports.ubuntu.com/ubuntu-ports focal-updates/main arm64 tzdata all 2021a-0ubuntu0.20.04 [295 kB]
2021-02-14T03:47:58.9108549Z #50 24.07 debconf: delaying package configuration, since apt-utils is not installed
2021-02-14T03:47:58.9109363Z #50 24.19 Fetched 295 kB in 1s (460 kB/s)
2021-02-14T03:47:59.0609674Z #50 24.26 Error while loading /usr/sbin/dpkg-split: No such file or directory
2021-02-14T03:47:59.0610775Z #50 24.27 Error while loading /usr/sbin/dpkg-deb: No such file or directory
2021-02-14T03:47:59.0612022Z #50 24.27 dpkg: error processing archive /var/cache/apt/archives/tzdata_2021a-0ubuntu0.20.04_all.deb (--unpack):
2021-02-14T03:47:59.0613479Z #50 24.27  dpkg-deb --control subprocess returned error exit status 1
2021-02-14T03:47:59.0614297Z #50 24.28 Errors were encountered while processing:
2021-02-14T03:47:59.0615256Z #50 24.28  /var/cache/apt/archives/tzdata_2021a-0ubuntu0.20.04_all.deb
2021-02-14T03:47:59.2110209Z #50 24.39 E: Sub-process /usr/bin/dpkg returned an error code (1)
2021-02-14T03:47:59.2111113Z #50 24.42 /bin/sh: 0: Can't open which
2021-02-14T03:47:59.5867028Z #50 ERROR: executor failed running [/dev/.buildkit_qemu_emulator /bin/sh -c /usr/bin/add-pkg         tzdata]: exit code: 100

@tonistiigi
Copy link
Member

I pushed tonistiigi/binfmt:qemu-v5.1.0 tonistiigi/binfmt:qemu-v5.0.1 for testing.

Also, what output do you get from docker run --rm arm64v8/alpine uname -a

@tonistiigi
Copy link
Member

Btw the /dev/.buildkit_qemu_emulator itself should be fixed in moby/buildkit:master image(even without binfmt) with moby/buildkit#1953 but if there is a regression in external emulators, we need to trace that down as well.

@jlesage
Copy link

jlesage commented Feb 14, 2021

Strange thing, arm64 finally appears as supported, but it takes some time. Here, you see that it took almost a minute:

$ date && docker run --rm --privileged tonistiigi/binfmt:latest --install arm64
Sat Feb 13 22:39:13 EST 2021
2021/02/14 03:39:15 installing: arm64 OK
{
  "supported": [
    "linux/amd64",
    "linux/arm64",
    "linux/386"
  ],
  "emulators": [
    "jar",
    "python2.7",
    "python3.5",
    "python3.6",
    "qemu-aarch64"
  ]
}

$ date && docker run --rm --privileged tonistiigi/binfmt:latest
Sat Feb 13 22:39:59 EST 2021
{
  "supported": [
    "linux/amd64",
    "linux/386"
  ],
  "emulators": [
    "jar",
    "python2.7",
    "python3.5",
    "python3.6",
    "qemu-aarch64"
  ]
}

$ date && docker run --rm --privileged tonistiigi/binfmt:latest
Sat Feb 13 22:40:12 EST 2021
{
  "supported": [
    "linux/amd64",
    "linux/arm64",
    "linux/386"
  ],
  "emulators": [
    "jar",
    "python2.7",
    "python3.5",
    "python3.6",
    "qemu-aarch64"
  ]
}

In all cases, docker run --rm arm64v8/alpine uname -a returns:

WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64) and no specific platform was requested
Linux d2843748a282 4.15.0-129-generic #132-Ubuntu SMP Thu Dec 10 14:02:26 UTC 2020 aarch64 Linux

@jlesage
Copy link

jlesage commented Feb 14, 2021

tonistiigi/binfmt:qemu-v5.1.0 seems to have the same behaviour:

$ docker run --rm --privileged tonistiigi/binfmt:qemu-v5.1.0 --install arm64
2021/02/14 04:28:46 installing: arm64 OK
{
  "supported": [
    "linux/amd64",
    "linux/386"
  ],
  "emulators": [
    "jar",
    "python2.7",
    "python3.5",
    "python3.6",
    "qemu-aarch64"
  ]
}

However, tonistiigi/binfmt:qemu-v5.0.1 looks better, since arm64 is shown as supported right after install:

$ docker run --rm --privileged tonistiigi/binfmt:qemu-v5.0.1 --install arm64
2021/02/14 04:29:47 installing: arm64 OK
{
  "supported": [
    "linux/amd64",
    "linux/arm64",
    "linux/386"
  ],
  "emulators": [
    "jar",
    "python2.7",
    "python3.5",
    "python3.6",
    "qemu-aarch64"
  ]
}

@jlesage
Copy link

jlesage commented Feb 14, 2021

Using tonistiigi/binfmt:qemu-v5.0.1 in the workflow looks better. However, one job failed with the following error. Maybe it's not related to the current issue ?

2021-02-14T04:36:37.8879847Z #23 [linux/amd64 logmonitor 4/4] RUN make -C /tmp/logmonitor
2021-02-14T04:36:37.8891304Z #23 sha256:2277452567b751793ecf34a694dd5049bd8adb766b566f50f7665134d3d6dfdb
2021-02-14T04:36:37.8894837Z #23 ERROR: executor failed running [/bin/sh -c make -C /tmp/logmonitor]: flightcontrol: exceeded retry timeout

https://github.com/jlesage/docker-baseimage/runs/1896254442?check_suite_focus=true

@sdwr98
Copy link
Author

sdwr98 commented Feb 14, 2021

Is it normal for multiple runs of tonistiigi/binfmt to return different results each time? I wonder if that has something to do with the intermittent nature of my build failing.

[vagrant@vagrant temp]$ date && docker run --rm --privileged tonistiigi/binfmt:qemu-v5.0.1
Sun Feb 14 15:07:03 UTC 2021
{
  "supported": [
    "linux/amd64",
    "linux/386",
    "linux/mips64le",
    "linux/mips64",
    "linux/arm/v7",
    "linux/arm/v6"
  ],
  "emulators": [
    "qemu-aarch64",
    "qemu-arm",
    "qemu-i386",
    "qemu-mips64",
    "qemu-mips64el",
    "qemu-ppc64le",
    "qemu-riscv64",
    "qemu-s390x"
  ]
}
[vagrant@vagrant temp]$ date && docker run --rm --privileged tonistiigi/binfmt:qemu-v5.0.1
Sun Feb 14 15:07:06 UTC 2021
{
  "supported": [
    "linux/amd64",
    "linux/riscv64",
    "linux/386",
    "linux/mips64",
    "linux/arm/v7",
    "linux/arm/v6"
  ],
  "emulators": [
    "qemu-aarch64",
    "qemu-arm",
    "qemu-i386",
    "qemu-mips64",
    "qemu-mips64el",
    "qemu-ppc64le",
    "qemu-riscv64",
    "qemu-s390x"
  ]
}
[vagrant@vagrant temp]$ date && docker run --rm --privileged tonistiigi/binfmt:qemu-v5.0.1
Sun Feb 14 15:07:09 UTC 2021
{
  "supported": [
    "linux/amd64",
    "linux/arm64",
    "linux/ppc64le",
    "linux/s390x",
    "linux/386",
    "linux/mips64le",
    "linux/mips64",
    "linux/arm/v7",
    "linux/arm/v6"
  ],
  "emulators": [
    "qemu-aarch64",
    "qemu-arm",
    "qemu-i386",
    "qemu-mips64",
    "qemu-mips64el",
    "qemu-ppc64le",
    "qemu-riscv64",
    "qemu-s390x"
  ]
}

@tonistiigi
Copy link
Member

Is it normal for multiple runs of tonistiigi/binfmt to return different results each time?

No. When this happens does docker run --rm arm64v8/alpine uname -a work consistently? This should show is flakiness is in running the emulator or in the "supported" check.

Please post the full system information where you see this. Kernel, distro etc. Is there a way I can access such a machine? I know some reports seem to be in github actions(not sure about kernel etc there as well) but if it is flaky, it is hard to debug/bisect there.

It looks like @jlesage is reporting that 5.0.1 is ok while @sdwr98 has (sometimes) issue with that image as well. @sdwr98 Can you confirm this also worked for you before this week when we made a new release? Or did you just start using it.

@tonistiigi
Copy link
Member

I seem to be able to reproduce the flakiness in github codespaces environment.

@sdwr98
Copy link
Author

sdwr98 commented Feb 14, 2021

When this happens does docker run --rm arm64v8/alpine uname -a work consistently? This should show is flakiness is in running the emulator or in the "supported" check.

Yes, this consistently reports Linux 2bcacd630fd7 4.14.214-160.339.amzn2.x86_64 #1 SMP Sun Jan 10 05:53:05 UTC 2021 aarch64 Linux

Please post the full system information where you see this. Kernel, distro etc. Is there a way I can access such a machine? I know some reports seem to be in github actions(not sure about kernel etc there as well) but if it is flaky, it is hard to debug/bisect there.

I see this on my local vagrant environment and in our AWS EC2 build agent.

Vagrant (on an Intel MacOS host)

Linux vagrant 4.14.214-160.339.amzn2.x86_64 #1 SMP Sun Jan 10 05:53:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

EC2 build agent:

Linux ip-10-1-6-33.tools.us-west-2.motushost.com 4.14.209-160.339.amzn2.x86_64 #1 SMP Wed Dec 16 22:44:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Both are amazonlinux2. Notably, this does not happen on either my Intel or Apple Silicon Macs.

It looks like @jlesage is reporting that 5.0.1 is ok while @sdwr98 has (sometimes) issue with that image as well. @sdwr98 Can you confirm this also worked for you before this week when we made a new release? Or did you just start using it.

This was working fine up until roughly Thursday of last week.

@tonistiigi
Copy link
Member

This looks really bizarre.

@sdwr98 Can you confirm that if you run docker run --rm --privileged tonistiigi/binfmt:qemu-v5.0.1 --uninstall qemu-aarch64 && docker run --rm --privileged tonistiigi/binfmt:qemu-v5.0.1 --install arm64 then output of docker run --rm --privileged tonistiigi/binfmt:qemu-v5.0.1 is not falky for you anymore.

@sdwr98
Copy link
Author

sdwr98 commented Feb 14, 2021

I can confirm that it is not flaky after running those steps

@tonistiigi
Copy link
Member

I have reverted tonistiigi/binfmt:latest back to qemu-v5.0.1 until this gets sorted. Afaics v5.0.1 does not have any issue but you need to uninstall and reinstall the emulators. For testing 5.2 version tonistiigi/binfmt:qemu-v5.2.0 can be used.

I traced this issue to the changes in qemu between versions v5.0.1 and v5.1.0 . The issue is that running the test binary https://github.com/moby/buildkit/blob/master/util/archutil/fixtures/exit.arm64.s sometimes fails with "Segmentation fault (core dumped)". Therefore arm64 support is not detected. The issue seems to be arm64 specific and only seems to affect this binary (although there was a report of possible completely unrelated regression report in v5.2). Looking at the regression points I think it may be more related to how the test binary is invoked with chroot rather than with the binary itself. If I just invoke the same binary in the shell I don't see the issue.

v5.0.1 seems clear of any issues. First regression point is qemu/qemu@ee94743

ee94743034bfb443cf246eda4971bdc15d8ee066 is the first bad commit
commit ee94743034bfb443cf246eda4971bdc15d8ee066
Author: Alex Bennée <alex.bennee@linaro.org>
Date:   Wed May 13 18:51:28 2020 +0100
    linux-user: completely re-write init_guest_space
    First we ensure all guest space initialisation logic comes through
    probe_guest_base once we understand the nature of the binary we are
    loading. The convoluted init_guest_space routine is removed and
    replaced with a number of pgb_* helpers which are called depending on
    what requirements we have when loading the binary.
    We first try to do what is requested by the host. Failing that we try
    and satisfy the guest requested base address. If all those options
    fail we fall back to finding a space in the memory map using our
    recently written read_self_maps() helper.
    There are some additional complications we try and take into account
    when looking for holes in the address space. We try not to go directly
    after the system brk() space so there is space for a little growth. We
    also don't want to have to use negative offsets which would result in
    slightly less efficient code on x86 when it's unable to use the
    segment offset register.
    Less mind-binding gotos and hopefully clearer logic throughout.
    Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
    Acked-by: Laurent Vivier <laurent@vivier.eu>
    Message-Id: <20200513175134.19619-5-alex.bennee@linaro.org>
 linux-user/elfload.c  | 503 +++++++++++++++++++++++++-------------------------
 linux-user/flatload.c |   6 +
 linux-user/main.c     |  23 +--
 linux-user/qemu.h     |  31 ++--
 4 files changed, 277 insertions(+), 286 deletions(-)

after this the test binary fails with Unable to allocate 0x10000bc bytes of virtual address space . This happens 100% the time (not flaky) and seems to be explained in the message of the second regression point qemu/qemu@ad592e3

ad592e37dfccf730378a44c5fa79acb603a7678d is the first bad commit
commit ad592e37dfccf730378a44c5fa79acb603a7678d
Author: Alex Bennée <alex.bennee@linaro.org>
Date:   Fri Jun 5 16:49:26 2020 +0100

    linux-user: provide fallback pgd_find_hole for bare chroots
    
    When running QEMU out of a chroot environment we may not have access
    to /proc/self/maps. As there is no other "official" way to introspect
    our memory map we need to fall back to the original technique of
    repeatedly trying to mmap an address range until we find one that
    works.
    
    Fortunately it's not quite as ugly as the original code given we
    already re-factored the complications of dealing with the
    ARM_COMMPAGE. We do make an attempt to skip over brk() which is about
    the only concrete piece of information we have about the address map
    at this moment.
    
    Fixes: ee9474303
    Reported-by: Peter Maydell <peter.maydell@linaro.org>
    Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
    Message-Id: <20200605154929.26910-12-alex.bennee@linaro.org>

 linux-user/elfload.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
 

... after which flakiness is introduced and binary sometimes succeeds and sometimes errors with segmentation fault.

The current master branch has the same issue as v5.2.0.

@stsquad Could you please take a look at this?

tonistiigi added a commit to tonistiigi/binfmt that referenced this issue Feb 16, 2021
regression docker/buildx#542

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@tonistiigi tonistiigi changed the title Intermittent buildx failures when multiplatform Qemu 5.2 regression causes arm64 detection flakiness Feb 26, 2021
hwdsl2 added a commit to hwdsl2/docker-ipsec-vpn-server that referenced this issue Mar 10, 2021
- Pin version for multiarch/qemu-user-static due to QEMU 5.2 regression
  Ref: docker/buildx#542
arkodg added a commit to arkodg/envoy that referenced this issue Mar 10, 2021
Hit this multiarch build issue on a seperate PR
envoyproxy#15204 (comment)

Fixes envoyproxy#14971

Most likely relates to docker/buildx#542

https://github.com/docker/buildx#building-multi-platform-images
recommends running `docker run --privileged --rm tonistiigi/binfmt
--install all` before setting up `buildx`

Signed-off-by: Arko Dasgupta <arko@tetrate.io>
krunalhinguu pushed a commit to krunalhinguu/ingress-nginx that referenced this issue Dec 29, 2023
chiukapoor pushed a commit to chiukapoor/ingress-nginx that referenced this issue Feb 20, 2024
vardhaman22 pushed a commit to vardhaman22/ingress-nginx that referenced this issue May 13, 2024
vardhaman22 pushed a commit to vardhaman22/ingress-nginx that referenced this issue Jun 6, 2024
krunalhinguu pushed a commit to krunalhinguu/ingress-nginx that referenced this issue Aug 14, 2024
Fix flaky arm64 emulator issue docker/buildx#542

Use Go 1.17.2 and fix golint installation

fix drone build failure

`go install ginkgo` followed by `which ginkgo` is newly added in the upstream repo.
This is making the drone build fail for arm64 arch. The ginkgo library
gets installed under $GOPATH/bin/linux_arm64 dir. This is unlike amd64 images
which typically install go libraries under $GOPATH/bin. Since the previously mentioned
dir is not in PATH, the command `which ginkgo` fails. I've added this location to PATH
to fix the build failure. See upstream PRs linked below for more info:
kubernetes#8566
kubernetes#8569

use go 1.21.5 | go mod tidy

Signed-off-by: Chirayu Kapoor <chirayu.kapoor@suse.com>
krunalhinguu pushed a commit to krunalhinguu/ingress-nginx that referenced this issue Aug 20, 2024
Fix flaky arm64 emulator issue docker/buildx#542

Use Go 1.17.2 and fix golint installation

fix drone build failure

`go install ginkgo` followed by `which ginkgo` is newly added in the upstream repo.
This is making the drone build fail for arm64 arch. The ginkgo library
gets installed under $GOPATH/bin/linux_arm64 dir. This is unlike amd64 images
which typically install go libraries under $GOPATH/bin. Since the previously mentioned
dir is not in PATH, the command `which ginkgo` fails. I've added this location to PATH
to fix the build failure. See upstream PRs linked below for more info:
kubernetes#8566
kubernetes#8569

use go 1.21.5 | go mod tidy

Signed-off-by: Chirayu Kapoor <chirayu.kapoor@suse.com>
krunalhinguu pushed a commit to krunalhinguu/ingress-nginx that referenced this issue Aug 20, 2024
Add apk-tools first

Now, apks-tools package is installed first. Apk-tools needs
to finish installing  before busybox can succesfully install
on arm. Prior, apk-tools would start installing first but
frequently busybox would start installing before apk-tools
installation finished in drone. Specifically, the trigger
script for busybox would fail while building arm image in
drone.

Revert "Add apk-tools first"

This reverts commit 3179cfd.
chiukapoor pushed a commit to chiukapoor/ingress-nginx that referenced this issue Sep 5, 2024
* replace upstream's workflow with Rancher's workflow files and add FOSSA

* cherry pick commits: f327746, b8966a6
refactor rancher build process due to upstream changes and update docker version and buildx version

* fix drone build failure

`go install ginkgo` followed by `which ginkgo` is newly added in the upstream repo.
This is making the drone build fail for arm64 arch. The ginkgo library
gets installed under $GOPATH/bin/linux_arm64 dir. This is unlike amd64 images
which typically install go libraries under $GOPATH/bin. Since the previously mentioned
dir is not in PATH, the command `which ginkgo` fails. I've added this location to PATH
to fix the build failure. See upstream PRs linked below for more info:
kubernetes#8566
kubernetes#8569

* Fix flaky arm64 emulator issue docker/buildx#542

Signed-off-by: Chirayu Kapoor <chirayu.kapoor@suse.com>
chiukapoor pushed a commit to chiukapoor/ingress-nginx that referenced this issue Sep 5, 2024
* replace upstream's workflow with Rancher's workflow files and add FOSSA

* cherry pick commits: f327746, b8966a6
refactor rancher build process due to upstream changes and update docker version and buildx version

* fix drone build failure

`go install ginkgo` followed by `which ginkgo` is newly added in the upstream repo.
This is making the drone build fail for arm64 arch. The ginkgo library
gets installed under $GOPATH/bin/linux_arm64 dir. This is unlike amd64 images
which typically install go libraries under $GOPATH/bin. Since the previously mentioned
dir is not in PATH, the command `which ginkgo` fails. I've added this location to PATH
to fix the build failure. See upstream PRs linked below for more info:
kubernetes#8566
kubernetes#8569

* Fix flaky arm64 emulator issue docker/buildx#542

Signed-off-by: Chirayu Kapoor <chirayu.kapoor@suse.com>
chiukapoor pushed a commit to chiukapoor/ingress-nginx that referenced this issue Sep 6, 2024
github-actions bot pushed a commit to chiukapoor/ingress-nginx that referenced this issue Sep 25, 2024
chiukapoor pushed a commit to chiukapoor/ingress-nginx that referenced this issue Oct 14, 2024
chiukapoor pushed a commit to chiukapoor/ingress-nginx that referenced this issue Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants