-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qemu 5.2 regression causes arm64 detection flakiness (Reverted) #542
Comments
That it is trying to use No idea why it works on single run though, if it is the same configuration. If you can't figure it out post output of |
We do have QEMU installed via this method:
Here's the other info you requested:
|
If you look at the latest output then you have emulator called Run
and see if it changes. |
I'm also seeing the same issue with my GitHub workflow. The As you mentioned,
Here is the error that occurred during build of the images:
If you want to view the workflow: https://github.com/jlesage/docker-baseimage/runs/1895262162?check_suite_focus=true |
@jlesage can you run |
I did the requested tests locally on my machine, since I seem to have the same issue. I first clear all emulators from the system. So from a clean state:
Then, installing all emulators:
I also tried to remove mips64 from the emulators to install (again, starting from a fresh state):
Installing only
|
On Github workflow, before install QEMU emulators:
|
In the following run: https://github.com/jlesage/docker-baseimage/runs/1896128490?check_suite_focus=true
But build fails with same error:
|
I pushed Also, what output do you get from |
Btw the |
Strange thing,
In all cases,
|
However,
|
Using
https://github.com/jlesage/docker-baseimage/runs/1896254442?check_suite_focus=true |
Is it normal for multiple runs of
|
No. When this happens does Please post the full system information where you see this. Kernel, distro etc. Is there a way I can access such a machine? I know some reports seem to be in github actions(not sure about kernel etc there as well) but if it is flaky, it is hard to debug/bisect there. It looks like @jlesage is reporting that 5.0.1 is ok while @sdwr98 has (sometimes) issue with that image as well. @sdwr98 Can you confirm this also worked for you before this week when we made a new release? Or did you just start using it. |
I seem to be able to reproduce the flakiness in github codespaces environment. |
Yes, this consistently reports
I see this on my local vagrant environment and in our AWS EC2 build agent. Vagrant (on an Intel MacOS host)
EC2 build agent:
Both are amazonlinux2. Notably, this does not happen on either my Intel or Apple Silicon Macs.
This was working fine up until roughly Thursday of last week. |
This looks really bizarre. @sdwr98 Can you confirm that if you run |
I can confirm that it is not flaky after running those steps |
I have reverted I traced this issue to the changes in qemu between versions v5.0.1 and v5.1.0 . The issue is that running the test binary https://github.com/moby/buildkit/blob/master/util/archutil/fixtures/exit.arm64.s sometimes fails with "Segmentation fault (core dumped)". Therefore arm64 support is not detected. The issue seems to be arm64 specific and only seems to affect this binary (although there was a report of possible completely unrelated regression report in v5.2). Looking at the regression points I think it may be more related to how the test binary is invoked with chroot rather than with the binary itself. If I just invoke the same binary in the shell I don't see the issue.
after this the test binary fails with
... after which flakiness is introduced and binary sometimes succeeds and sometimes errors with segmentation fault. The current master branch has the same issue as v5.2.0. @stsquad Could you please take a look at this? |
I changed the way API proxy builds image in previous PR. I added a flag to turn on buildx to do cross build. I forgot to toggle that tag for build in our image pipeline. This PR is prone to the following bug in QEMU: docker/buildx#542 I have seen it happening. However, after running 22 times it happened only once. Find below example of good runs. Tested on multiple runs: https://dev.azure.com/msazure/One/_build/results?buildId=39414284&view=results https://dev.azure.com/msazure/One/_build/results?buildId=39414287&view=logs&j=5e897c14-3122-5b03-9993-10a307d9da6f&t=5e897c14-3122-5b03-9993-10a307d9da6f https://dev.azure.com/msazure/One/_build/results?buildId=39414297&view=results https://dev.azure.com/msazure/One/_build/results?buildId=39414299&view=results https://dev.azure.com/msazure/One/_build/results?buildId=39413233&view=results https://dev.azure.com/msazure/One/_build/results?buildId=39413213&view=results https://dev.azure.com/msazure/One/_build/results?buildId=39412803&view=results https://dev.azure.com/msazure/One/_build/results?buildId=39412923&view=results https://dev.azure.com/msazure/One/_build/results?buildId=39413217&view=results https://dev.azure.com/msazure/One/_build/results?buildId=39412501&view=results
regression docker/buildx#542 Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
- Pin version for multiarch/qemu-user-static due to QEMU 5.2 regression Ref: docker/buildx#542
Hit this multiarch build issue on a seperate PR envoyproxy#15204 (comment) Fixes envoyproxy#14971 Most likely relates to docker/buildx#542 https://github.com/docker/buildx#building-multi-platform-images recommends running `docker run --privileged --rm tonistiigi/binfmt --install all` before setting up `buildx` Signed-off-by: Arko Dasgupta <arko@tetrate.io>
Fix flaky arm64 emulator issue docker/buildx#542 Use Go 1.17.2 and fix golint installation fix drone build failure `go install ginkgo` followed by `which ginkgo` is newly added in the upstream repo. This is making the drone build fail for arm64 arch. The ginkgo library gets installed under $GOPATH/bin/linux_arm64 dir. This is unlike amd64 images which typically install go libraries under $GOPATH/bin. Since the previously mentioned dir is not in PATH, the command `which ginkgo` fails. I've added this location to PATH to fix the build failure. See upstream PRs linked below for more info: kubernetes#8566 kubernetes#8569 use go 1.21.5 | go mod tidy Signed-off-by: Chirayu Kapoor <chirayu.kapoor@suse.com>
Fix flaky arm64 emulator issue docker/buildx#542 Use Go 1.17.2 and fix golint installation fix drone build failure `go install ginkgo` followed by `which ginkgo` is newly added in the upstream repo. This is making the drone build fail for arm64 arch. The ginkgo library gets installed under $GOPATH/bin/linux_arm64 dir. This is unlike amd64 images which typically install go libraries under $GOPATH/bin. Since the previously mentioned dir is not in PATH, the command `which ginkgo` fails. I've added this location to PATH to fix the build failure. See upstream PRs linked below for more info: kubernetes#8566 kubernetes#8569 use go 1.21.5 | go mod tidy Signed-off-by: Chirayu Kapoor <chirayu.kapoor@suse.com>
Add apk-tools first Now, apks-tools package is installed first. Apk-tools needs to finish installing before busybox can succesfully install on arm. Prior, apk-tools would start installing first but frequently busybox would start installing before apk-tools installation finished in drone. Specifically, the trigger script for busybox would fail while building arm image in drone. Revert "Add apk-tools first" This reverts commit 3179cfd.
* replace upstream's workflow with Rancher's workflow files and add FOSSA * cherry pick commits: f327746, b8966a6 refactor rancher build process due to upstream changes and update docker version and buildx version * fix drone build failure `go install ginkgo` followed by `which ginkgo` is newly added in the upstream repo. This is making the drone build fail for arm64 arch. The ginkgo library gets installed under $GOPATH/bin/linux_arm64 dir. This is unlike amd64 images which typically install go libraries under $GOPATH/bin. Since the previously mentioned dir is not in PATH, the command `which ginkgo` fails. I've added this location to PATH to fix the build failure. See upstream PRs linked below for more info: kubernetes#8566 kubernetes#8569 * Fix flaky arm64 emulator issue docker/buildx#542 Signed-off-by: Chirayu Kapoor <chirayu.kapoor@suse.com>
* replace upstream's workflow with Rancher's workflow files and add FOSSA * cherry pick commits: f327746, b8966a6 refactor rancher build process due to upstream changes and update docker version and buildx version * fix drone build failure `go install ginkgo` followed by `which ginkgo` is newly added in the upstream repo. This is making the drone build fail for arm64 arch. The ginkgo library gets installed under $GOPATH/bin/linux_arm64 dir. This is unlike amd64 images which typically install go libraries under $GOPATH/bin. Since the previously mentioned dir is not in PATH, the command `which ginkgo` fails. I've added this location to PATH to fix the build failure. See upstream PRs linked below for more info: kubernetes#8566 kubernetes#8569 * Fix flaky arm64 emulator issue docker/buildx#542 Signed-off-by: Chirayu Kapoor <chirayu.kapoor@suse.com>
I'm using buildx to do multiplatform builds in our CI environment and recently we started getting some odd errors. I'm able to reproduce this locally with this simple Dockerfile:
I set up my multiplatform build this way:
Then run the build command. When I run it in multi-platform, I get an error:
but when I run it for just arm64 it works:
Any thoughts on this? I have the qemu aarch64 emulator installed.
Here's my docker info:
The text was updated successfully, but these errors were encountered: