-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"write: connection reset by peer" pushing to 3rd party registry #2713
Comments
we upgraded to v0.10.0 and the same issue was solved on our end. v0.10.0 was just released 4 hours ago, good luck man |
Thanks @shaoye but I've just tried to v0.10.0 and it's no better (I had already tried 0.10 release candidates and even a build of master anyway): https://github.com/brightbox/container-registry-write-test/runs/5496288170 |
I am having the same issue. We do quite a bit of pushes often to multiple docker repos and having these intermittent issues is not good for us. At this point, I have changed my action that does build and push that does not use buildkit as a workaround. |
Our workaround has been to export the image from the buildkit environment to the docker daemon (using
|
@johnl We use containerd project for the direct pushes. If you can reproduce with their tools directly and provide reproducible steps it would be good to report it there. It also doesn't hurt to report it to the registry. I'm not familiar with this one but I've seen many cases where the registry is implemented by not following the spec, but just testing it against whatever requests a specific version of docker binary seems to make.
|
@tonistiigi thanks for the details - I couldn't confirm quite how the pushes were done. I'll try reproduce with containerd directly. Luckily in this case, my team operates the registry and we assumed a problem there first. It's a fairly standard docker-distribution deployment so should be widely used! The proxy was a tcp haproxy and we switched to nginx as a test to get more details about the problem, which reported that the client connection was at fault:
I'll try reproduce with an isolated containerd though thanks. |
@tonistiigi I've modified my action to save the image from buildkit to containerd (1.4.12+azure-2) and push from there (using ctr images push) and the push completes successfully with no errors. https://github.com/brightbox/container-registry-write-test/runs/5570728801 I've tested with containerd v1.6.1 too, to match the latest builtkit release I've also confirmed that pushing from within buildkit is still broken. And to be clear, I can't actually reproduce this outside of github actions, even using buildkit. So, this is a weird one. |
To more closely reproduce the buildkit environment, I tried running containerd (1.6.1) inside a container (still in a github action) and pushed from there but that worked fine too. https://github.com/brightbox/container-registry-write-test/runs/5577590298 |
I've captured some packet traces at the registry end. One push session from buildkit which failed and another from containerd which succeeded. With the failed push from buildkit, the client side just suddenly sends a RST packet, mid stream, on most of the tcp connections. One example:
And what's weirder is, a couple of new tcp connections later in the session re-use some source ports too, which isn't easily explained. I kind of feel this is a github actions execution environment problem - a stateful firewall or a NAT gateway. The mystery is why buildkit triggers this while containerd doesn't. I see that the http1 library doesn't support any GODEBUG options, so looks like I'll have to add some debug messages direct to buildkit to get any more info :/ |
I can no longer reproduce this problem, even with the same versions of buildkit. The registry software and configuration haven't changed in that time either. I can only assume the github action execution environment has changed. My gut feeling is still that buildkit was/is behaving in some unusual way (given the weird tcp traces) which was tickling some bug in the github network, but I don't think we'll ever be sure now! |
As mentioned on moby/buildkit#2713 (comment), try to enable `load: true` to solve the pushing issues. Removed the random sleeps.
Details: moby/buildkit#2713 (comment) Signed-off-by: Andrei Jiroh Halili <ajhalili2006@gmail.com>
I'm using buildkit from within github actions and get repeated "write: connection reset by peer" errors when pushing to a 3rd party registry (cr.brightbox.com). They usually retry successfully but some of the time will fail the build. The internal docker builder pushes always work fine in the exact same environment.
The registry server (fairly standard and up to date Docker Distribution Registry) shows this as the client closing the connection:
The same image can be built and pushed to the same registry (in the same github action run!) successfully using the internal docker build engine, every time - never has a problem.
I've tried various versions of buildkit, old and new (including master) with no improvement.
I admittedly cannot reproduce this outside of github actions - the same build and push using "docker buildx" (with docker-container driver) locally (on a much slower network) with the docker-ce packages on Ubuntu Jammy. So I'm not 100% certain this is a buildkit bug but I've seen other similar problems fixed in buildkit (mostly related to GCR) so I thought I'd report it.
I can reproduce this reliably on github with a simple docker image containing six layers with 16M of random data.
This github action run shows an internal build and push succeeding then a buildkit build and push failing (though it eventually succeeds due to retries):
https://github.com/brightbox/container-registry-write-test/runs/5433639972
The text was updated successfully, but these errors were encountered: