-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buildkitd logs "failed to kill process in container id <cid>: buildkit-runc did not terminate successfully: container not running" for hours #4483
Comments
👋 @coryb Can you take a look? I've seen some of these logs myself but not in a loop as it is described here. I've also not been able to understand what exact case causes this. Even if there is some weird condition that stops from killing container cleanly, going to loop when the process is already gone should be avoided.
This might be related as it does change the parent-child structure for subprocess. |
Looking at this for loop buildkit/executor/runcexecutor/executor.go Lines 615 to 637 in 438f472
We know that
I don't understand exactly the scenario that leads to this, but I believe this could explain the behaviour. Maybe before going into the select on 620 we could inspect the error and I can also try to capture a goroutine stack dump if that would help. |
I think this makes sense. Looks like the actual error is coming from runc directly (via We do need to make sure that the container is actually cleaned up and this isn't some issue where |
Thank you for confirming! This hasn't been a huge problem just yet, but I will be doing more work with our buildkit runners soon. I may come back to this and attempt a fix if it's getting in the way of the work I am doing. |
Confirmed still occurring in We are seeing this behavior occur when the In addition to generating massive amounts of error messages, the buildkitd server can no longer terminate gracefully immediately. Instead it waits a few minutes before finally exiting with this message:
|
Interestingly, it does resolve itself eventually, sometimes in as short as 30 seconds but other times continues for many minutes. It does not appear to have any functional impact. The buildkitd daemon is still able to accept new builds even when this is occurring. |
Looked into it more but still quite confusing what case this is. Pretty clear it comes from this loop/sleep https://github.com/moby/buildkit/blob/v0.14.1/executor/runcexecutor/executor.go#L618-L633 , but this is only possible if There is no stacktrace but possible candidates where the goroutine is stuck is That leaves Then last remaining part is actual runc bug where So, although we could fix the log issue with some detection of specific error, it is not clear if the |
While the specifics of the bug are over my head, I am happy to build and deploy a debug build if that might help capture more specific information such as a stack trace. |
Any additional insights or work on this @fullykubed? |
@thompson-shaun I am afraid I am not well-versed enough in the internals to drive this forward independently. However, if anyone wants help catching more runtime information (@tonistiigi), I have a setup where I can reliably reproduce the issue. |
@fullykubed Can you put together some repro steps for us? If not, then I guess one option is that we will send you a special version of buildkit with some extra logging that you could run. Or maybe there is a way you can give us access to the setup that reproduces this. |
@tonistiigi Unfortunately, the setup I have is going to be somewhat cumbersome to deploy as BuildKit is running in an EKS cluster and builds are submitting over an SSH tunnel. That all said, I am having to give anyone working on this issue direct access to a cluster where this issue occurs via a kubeconfig file. Alternatively, if you have an image / binary with the extra logging, I am happy to deploy it to the cluster, run the reproduction steps and post the logs here. |
Any chance these nodes are using a containerd image store @fullykubed ? |
@thompson-shaun They are. |
@fullykubed I think this is caused by In buildx config would look like |
@tonistiigi Just want to clarify something: I believe this problem occurs specifically when of buildkit is started with We are running our buildkit instances as follows: rootlesskit \
buildkitd \
--addr tcp://0.0.0.0:1234 \
--addr unix:///run/user/1000/buildkit/buildkitd.sock \
--oci-worker-no-process-sandbox \
--oci-worker-gc-keepstorage 50000 Using these settings, I am able to replicate the problem reliably by terminating the build in the middle of executing a RUN stanza. That said, I did execute the privileged version of your buildkitd with the time="2024-09-13T13:10:57Z" level=warning msg="TLS is not enabled for tcp://0.0.0.0:1234. enabling mutual TLS authentication is highly recommended"
time="2024-09-13T13:10:57Z" level=info msg="auto snapshotter: using overlayfs"
time="2024-09-13T13:10:57Z" level=warning msg="using host network as the default"
time="2024-09-13T13:10:57Z" level=info msg="found worker \"jj1z7z0hdc13p8lplson6fghe\", labels=map[org.mobyproject.buildkit.worker.executor:oci org.mobyproject.buildkit.worker.hostname:buildkit-amd64-0 org.mobyproject.buildkit.worker.network:host org.mobyproject.buildkit.worker.oci.process-mode:sandbox org.mobyproject.buildkit.worker.selinux.enabled:false org.mobyproject.buildkit.worker.snapshotter:overlayfs], platforms=[linux/amd64 linux/amd64/v2 linux/amd64/v3 linux/amd64/v4 linux/386]"
time="2024-09-13T13:10:57Z" level=warning msg="skipping containerd worker, as \"/run/containerd/containerd.sock\" does not exist"
time="2024-09-13T13:10:57Z" level=info msg="found 1 workers, default=\"jj1z7z0hdc13p8lplson6fghe\""
time="2024-09-13T13:10:57Z" level=warning msg="currently, only the default worker can be used."
time="2024-09-13T13:10:57Z" level=info msg="running server on [::]:1234"
time="2024-09-13T13:10:57Z" level=info msg="running server on /run/buildkit/buildkitd.sock"
time="2024-09-13T13:12:18Z" level=debug msg="session started" spanID=b14bdfaf38cbd8c0 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="resolve exporter image with map[name:891377197483.dkr.ecr.us-east-2.amazonaws.com/891377197483.dkr.ecr.us-east-2.amazonaws.com/bastion:test-amd64 push:true source-date-epoch:315532800]" spanID=e219f48b8502ce4c traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="new ref for local: sby9wiv04wqz05txzuwo6q4sx" span="[internal] load build definition from ./Dockerfile" spanID=2062be68a5e7685d traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="diffcopy took: 188.928225ms" span="[internal] load build definition from ./Dockerfile" spanID=4fc338e951ef3334 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="saved sby9wiv04wqz05txzuwo6q4sx as dockerfile:dockerfile:" span="[internal] load build definition from ./Dockerfile" spanID=2062be68a5e7685d traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="checked for cached auth handler namespace" cached=false key="docker.io/library/ubuntu::pull" name=docker.io/library/ubuntu scope=pull
time="2024-09-13T13:12:18Z" level=debug msg=resolving span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="do request" request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=buildkit/v0.0.0+unknown request.method=HEAD span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="fetch response received" response.header.content-length=157 response.header.content-type=application/json response.header.date="Fri, 13 Sep 2024 13:12:18 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.docker-ratelimit-source=18.218.214.155 response.header.strict-transport-security="max-age=31536000" response.header.www-authenticate="Bearer realm=\"https://auth.docker.io/token\",service=\"registry.docker.io\",scope=\"repository:library/ubuntu:pull\"" response.status="401 Unauthorized" span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg=Unauthorized header="Bearer realm=\"https://auth.docker.io/token\",service=\"registry.docker.io\",scope=\"repository:library/ubuntu:pull\"" span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="do request" request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=buildkit/v0.0.0+unknown request.method=HEAD span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="fetch response received" response.header.content-length=1338 response.header.content-type=application/vnd.oci.image.index.v1+json response.header.date="Fri, 13 Sep 2024 13:12:18 GMT" response.header.docker-content-digest="sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee" response.header.docker-distribution-api-version=registry/2.0 response.header.docker-ratelimit-source=18.218.214.155 response.header.etag="\"sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee\"" response.header.ratelimit-limit="100;w=21600" response.header.ratelimit-remaining="91;w=21600" response.header.strict-transport-security="max-age=31536000" response.status="200 OK" span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg=resolved desc.digest="sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee" span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg=fetch span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="do request" request.header.accept="application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=buildkit/v0.0.0+unknown request.method=GET span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="fetch response received" response.header.content-length=1338 response.header.content-type=application/vnd.oci.image.index.v1+json response.header.date="Fri, 13 Sep 2024 13:12:18 GMT" response.header.docker-content-digest="sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee" response.header.docker-distribution-api-version=registry/2.0 response.header.docker-ratelimit-source=18.218.214.155 response.header.etag="\"sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee\"" response.header.ratelimit-limit="100;w=21600" response.header.ratelimit-remaining="91;w=21600" response.header.strict-transport-security="max-age=31536000" response.status="200 OK" span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg=fetch span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="do request" request.header.accept="application/vnd.oci.image.manifest.v1+json, */*" request.header.user-agent=buildkit/v0.0.0+unknown request.method=GET span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="fetch response received" response.header.content-length=424 response.header.content-type=application/vnd.oci.image.manifest.v1+json response.header.date="Fri, 13 Sep 2024 13:12:18 GMT" response.header.docker-content-digest="sha256:d35dfc2fe3ef66bcc085ca00d3152b482e6cafb23cdda1864154caf3b19094ba" response.header.docker-distribution-api-version=registry/2.0 response.header.docker-ratelimit-source=18.218.214.155 response.header.etag="\"sha256:d35dfc2fe3ef66bcc085ca00d3152b482e6cafb23cdda1864154caf3b19094ba\"" response.header.ratelimit-limit="100;w=21600" response.header.ratelimit-remaining="91;w=21600" response.header.strict-transport-security="max-age=31536000" response.status="200 OK" span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg=fetch span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:18Z" level=debug msg="do request" request.header.accept="application/vnd.oci.image.config.v1+json, */*" request.header.user-agent=buildkit/v0.0.0+unknown request.method=GET span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:19Z" level=debug msg="fetch response received" response.header.accept-ranges=bytes response.header.age=2376105 response.header.cache-control="public, max-age=14400" response.header.cf-cache-status=HIT response.header.cf-ray=8c28635f2f6160a2-ORD response.header.connection=keep-alive response.header.content-length=2297 response.header.content-type=application/octet-stream response.header.date="Fri, 13 Sep 2024 13:12:19 GMT" response.header.etag="\"6816f90355932e7c37011fcb928f42e4\"" response.header.expires="Fri, 13 Sep 2024 17:12:19 GMT" response.header.last-modified="Thu, 01 Aug 2024 15:42:13 GMT" response.header.server=cloudflare response.header.vary=Accept-Encoding response.header.x-amz-id-2="CyCA968dl9flKEKFhfRvjXSObaIn7giCGSWk9P8e6JoJQLJqXBrT90wcDTwXBEFowg+cnatNFzw=" response.header.x-amz-request-id=SDRWHHRM29A58W9T response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-version-id=VvqeYTa4p0U7_iTriRNw8HPg6qGp9Cm3 response.status="200 OK" span="resolving docker.io/library/ubuntu:latest" spanID=f1f20a9c1605a2d1 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:19Z" level=debug msg="new ref for local: kz9ts5enkk4o5fxfvdttxwdl8" span="[internal] load .dockerignore" spanID=1f23a1266a26f856 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:19Z" level=debug msg="diffcopy took: 163.731047ms" span="[internal] load .dockerignore" spanID=961b6b22ea246206 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:19Z" level=debug msg="saved kz9ts5enkk4o5fxfvdttxwdl8 as context:context-.dockerignore:" span="[internal] load .dockerignore" spanID=1f23a1266a26f856 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:19Z" level=debug msg="checked for cached auth handler namespace" cached=true key="docker.io/library/ubuntu::pull" name=docker.io/library/ubuntu scope=pull
time="2024-09-13T13:12:19Z" level=debug msg=fetch digest="sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee" mediatype=application/vnd.oci.image.index.v1+json size=1338 spanID=e219f48b8502ce4c traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:19Z" level=debug msg=fetch digest="sha256:d35dfc2fe3ef66bcc085ca00d3152b482e6cafb23cdda1864154caf3b19094ba" mediatype=application/vnd.oci.image.manifest.v1+json size=424 spanID=e219f48b8502ce4c traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:19Z" level=debug msg=fetch digest="sha256:edbfe74c41f8a3501ce542e137cf28ea04dd03e6df8c9d66519b6ad761c2598a" mediatype=application/vnd.oci.image.config.v1+json size=2297 spanID=e219f48b8502ce4c traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:19Z" level=debug msg="checked for cached auth handler namespace" cached=true key="docker.io/library/ubuntu::pull" name=docker.io/library/ubuntu scope=pull
time="2024-09-13T13:12:19Z" level=debug msg="checked for cached auth handler namespace" cached=true key="docker.io/library/ubuntu::pull" name=docker.io/library/ubuntu scope=pull
time="2024-09-13T13:12:19Z" level=debug msg=fetch digest="sha256:31e907dcc94a592a57796786399eb004dcbba714389fa615f5efa05a91316356" mediatype=application/vnd.oci.image.layer.v1.tar+gzip size=29706298 spanID=e219f48b8502ce4c traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:19Z" level=debug msg="do request" digest="sha256:31e907dcc94a592a57796786399eb004dcbba714389fa615f5efa05a91316356" mediatype=application/vnd.oci.image.layer.v1.tar+gzip request.header.accept="application/vnd.oci.image.layer.v1.tar+gzip, */*" request.header.user-agent=containerd/1.7.21+unknown request.method=GET size=29706298 spanID=e219f48b8502ce4c traceID=63dec637cc25194ba06b3077cfafd98b url="https://registry-1.docker.io/v2/library/ubuntu/blobs/sha256:31e907dcc94a592a57796786399eb004dcbba714389fa615f5efa05a91316356"
time="2024-09-13T13:12:19Z" level=debug msg="fetch response received" digest="sha256:31e907dcc94a592a57796786399eb004dcbba714389fa615f5efa05a91316356" mediatype=application/vnd.oci.image.layer.v1.tar+gzip response.header.accept-ranges=bytes response.header.age=1255003 response.header.cache-control="public, max-age=14400" response.header.cf-cache-status=HIT response.header.cf-ray=8c28636179f960a2-ORD response.header.connection=keep-alive response.header.content-length=29706298 response.header.content-type=application/octet-stream response.header.date="Fri, 13 Sep 2024 13:12:19 GMT" response.header.etag="\"3203ab8dce387ad2c8e28418d6b06b72\"" response.header.expires="Fri, 13 Sep 2024 17:12:19 GMT" response.header.last-modified="Thu, 01 Aug 2024 15:42:11 GMT" response.header.server=cloudflare response.header.vary=Accept-Encoding response.header.x-amz-id-2=yD7+QiTLVu88NGtqU+4FkUs205LxNjOoBcIvRRsuttxs54tVQsHrhFkd22U2OOxErVSzr4XwWPjyEnpvUcfiXFuziqpECi9L response.header.x-amz-request-id=PJZCYC80RPHP331S response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-version-id=B353OJcwFGgvWU6lrbjXkTsaIBod.fTn response.status="200 OK" size=29706298 spanID=e219f48b8502ce4c traceID=63dec637cc25194ba06b3077cfafd98b url="https://registry-1.docker.io/v2/library/ubuntu/blobs/sha256:31e907dcc94a592a57796786399eb004dcbba714389fa615f5efa05a91316356"
time="2024-09-13T13:12:19Z" level=debug msg="using pigz for decompression"
time="2024-09-13T13:12:20Z" level=debug msg="diff applied" d=542.22265ms digest="sha256:31e907dcc94a592a57796786399eb004dcbba714389fa615f5efa05a91316356" media=application/vnd.oci.image.layer.v1.tar+gzip size=29706298 spanID=e219f48b8502ce4c traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:20Z" level=debug msg="> creating n5enwgexkbkkpi6pkx05c1bcj [/bin/sh -c sleep 60]" span="[2/2] RUN sleep 60" spanID=1e4973b167872fa3 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:20Z" level=debug msg="starting runc process monitor" span="[2/2] RUN sleep 60" spanID=1e4973b167872fa3 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:20Z" level=debug msg="runc run n5enwgexkbkkpi6pkx05c1bcj /var/lib/buildkit/runc-overlayfs/executor/n5enwgexkbkkpi6pkx05c1bcj" span="[2/2] RUN sleep 60" spanID=1e4973b167872fa3 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:20Z" level=debug msg="handleSignals done: <nil>" span="[2/2] RUN sleep 60" spanID=1e4973b167872fa3 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:20Z" level=debug msg="waitForStart done: <nil>" span="[2/2] RUN sleep 60" spanID=1e4973b167872fa3 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:26Z" level=debug msg="sending sigkill to process in container n5enwgexkbkkpi6pkx05c1bcj"
time="2024-09-13T13:12:26Z" level=debug msg="sending sigkill via runc process n5enwgexkbkkpi6pkx05c1bcj"
time="2024-09-13T13:12:26Z" level=debug msg="session finished: <nil>" spanID=b14bdfaf38cbd8c0 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:26Z" level=error msg="completed to kill process in container id n5enwgexkbkkpi6pkx05c1bcj: <nil>"
time="2024-09-13T13:12:26Z" level=debug msg="killed process" span="[2/2] RUN sleep 60" spanID=1e4973b167872fa3 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:26Z" level=debug msg="runc run n5enwgexkbkkpi6pkx05c1bcj /var/lib/buildkit/runc-overlayfs/executor/n5enwgexkbkkpi6pkx05c1bcj took 6.128745182s, err: buildkit-runc did not terminate successfully: exit status 137" span="[2/2] RUN sleep 60" spanID=1e4973b167872fa3 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:26Z" level=debug msg="call done: buildkit-runc did not terminate successfully: exit status 137" span="[2/2] RUN sleep 60" spanID=1e4973b167872fa3 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:26Z" level=debug msg="releasing runc process monitor"
time="2024-09-13T13:12:26Z" level=debug msg="runc process monitor ended (2)" span="[2/2] RUN sleep 60" spanID=1e4973b167872fa3 traceID=63dec637cc25194ba06b3077cfafd98b
time="2024-09-13T13:12:26Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = context canceled" spanID=e219f48b8502ce4c traceID=63dec637cc25194ba06b3077cfafd98b
Canceled: context canceled
1 290af9a buildkitd --addr tcp://0.0.0.0:1234 --addr unix:///run/buildkit/buildkitd.sock --debug
main.unaryInterceptor
/src/cmd/buildkitd/main.go:713
google.golang.org/grpc.NewServer.chainUnaryServerInterceptors.chainUnaryInterceptors.func1
/src/vendor/google.golang.org/grpc/server.go:1194
github.com/moby/buildkit/api/services/control._Control_Solve_Handler
/src/api/services/control/control.pb.go:2493
google.golang.org/grpc.(*Server).processUnaryRPC
/src/vendor/google.golang.org/grpc/server.go:1383
google.golang.org/grpc.(*Server).handleStream
/src/vendor/google.golang.org/grpc/server.go:1794
google.golang.org/grpc.(*Server).serveStreams.func2.1
/src/vendor/google.golang.org/grpc/server.go:1027
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1695
time="2024-09-13T13:12:27Z" level=debug msg="remove snapshot" key=mf73lxktlj8rb5izchj6ctejl snapshotter=overlayfs
time="2024-09-13T13:12:27Z" level=debug msg="schedule snapshotter cleanup" snapshotter=overlayfs
time="2024-09-13T13:12:27Z" level=debug msg="removed snapshot" key=buildkit/5/mf73lxktlj8rb5izchj6ctejl snapshotter=overlayfs
time="2024-09-13T13:12:57Z" level=debug msg="session started" spanID=42917d6c8450b569 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg="resolve exporter image with map[name:891377197483.dkr.ecr.us-east-2.amazonaws.com/891377197483.dkr.ecr.us-east-2.amazonaws.com/bastion:test-amd64 push:true source-date-epoch:315532800]" spanID=eba934ca8a5cdb84 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg="reusing ref for local: sby9wiv04wqz05txzuwo6q4sx" span="[internal] load build definition from ./Dockerfile" spanID=2b948af4ef9ae248 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg="diffcopy took: 224.957113ms" span="[internal] load build definition from ./Dockerfile" spanID=26fa7d91985ff8a2 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg="checked for cached auth handler namespace" cached=true key="docker.io/library/ubuntu::pull" name=docker.io/library/ubuntu scope=pull
time="2024-09-13T13:12:57Z" level=debug msg=resolving span="resolving docker.io/library/ubuntu:latest" spanID=73c526f172fe364e traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg="do request" request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=buildkit/v0.0.0+unknown request.method=HEAD span="resolving docker.io/library/ubuntu:latest" spanID=73c526f172fe364e traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg="fetch response received" response.header.content-length=1338 response.header.content-type=application/vnd.oci.image.index.v1+json response.header.date="Fri, 13 Sep 2024 13:12:57 GMT" response.header.docker-content-digest="sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee" response.header.docker-distribution-api-version=registry/2.0 response.header.docker-ratelimit-source=18.218.214.155 response.header.etag="\"sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee\"" response.header.ratelimit-limit="100;w=21600" response.header.ratelimit-remaining="90;w=21600" response.header.strict-transport-security="max-age=31536000" response.status="200 OK" span="resolving docker.io/library/ubuntu:latest" spanID=73c526f172fe364e traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg=resolved desc.digest="sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee" span="resolving docker.io/library/ubuntu:latest" spanID=73c526f172fe364e traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg=fetch span="resolving docker.io/library/ubuntu:latest" spanID=73c526f172fe364e traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg=fetch span="resolving docker.io/library/ubuntu:latest" spanID=73c526f172fe364e traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg=fetch span="resolving docker.io/library/ubuntu:latest" spanID=73c526f172fe364e traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg="reusing ref for local: kz9ts5enkk4o5fxfvdttxwdl8" span="[internal] load .dockerignore" spanID=2d508480f7a08036 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg="diffcopy took: 130.91932ms" span="[internal] load .dockerignore" spanID=fe90dbb59f88a906 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg="checked for cached auth handler namespace" cached=true key="docker.io/library/ubuntu::pull" name=docker.io/library/ubuntu scope=pull
time="2024-09-13T13:12:57Z" level=debug msg=fetch digest="sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee" mediatype=application/vnd.oci.image.index.v1+json size=1338 spanID=eba934ca8a5cdb84 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg=fetch digest="sha256:d35dfc2fe3ef66bcc085ca00d3152b482e6cafb23cdda1864154caf3b19094ba" mediatype=application/vnd.oci.image.manifest.v1+json size=424 spanID=eba934ca8a5cdb84 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg=fetch digest="sha256:edbfe74c41f8a3501ce542e137cf28ea04dd03e6df8c9d66519b6ad761c2598a" mediatype=application/vnd.oci.image.config.v1+json size=2297 spanID=eba934ca8a5cdb84 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:57Z" level=debug msg="load cache for [1/2] FROM docker.io/library/ubuntu:latest@sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee with jj1z7z0hdc13p8lplson6fghe::qrnfdkn16u528xf9k12jomhqh"
time="2024-09-13T13:12:58Z" level=debug msg="> creating oys6n2rrp1dwgclvv5b9pf5wy [/bin/sh -c sleep 75]" span="[2/2] RUN sleep 75" spanID=deba44c1fff943a2 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:58Z" level=debug msg="starting runc process monitor" span="[2/2] RUN sleep 75" spanID=deba44c1fff943a2 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:58Z" level=debug msg="runc run oys6n2rrp1dwgclvv5b9pf5wy /var/lib/buildkit/runc-overlayfs/executor/oys6n2rrp1dwgclvv5b9pf5wy" span="[2/2] RUN sleep 75" spanID=deba44c1fff943a2 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:58Z" level=debug msg="handleSignals done: <nil>" span="[2/2] RUN sleep 75" spanID=deba44c1fff943a2 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:12:58Z" level=debug msg="waitForStart done: <nil>" span="[2/2] RUN sleep 75" spanID=deba44c1fff943a2 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:13:14Z" level=debug msg="sending sigkill to process in container oys6n2rrp1dwgclvv5b9pf5wy"
time="2024-09-13T13:13:14Z" level=debug msg="sending sigkill via runc process oys6n2rrp1dwgclvv5b9pf5wy"
time="2024-09-13T13:13:14Z" level=debug msg="session finished: <nil>" spanID=42917d6c8450b569 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:13:14Z" level=error msg="completed to kill process in container id oys6n2rrp1dwgclvv5b9pf5wy: <nil>"
time="2024-09-13T13:13:14Z" level=debug msg="killed process" span="[2/2] RUN sleep 75" spanID=deba44c1fff943a2 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:13:14Z" level=debug msg="runc run oys6n2rrp1dwgclvv5b9pf5wy /var/lib/buildkit/runc-overlayfs/executor/oys6n2rrp1dwgclvv5b9pf5wy took 16.805789068s, err: buildkit-runc did not terminate successfully: exit status 137" span="[2/2] RUN sleep 75" spanID=deba44c1fff943a2 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:13:14Z" level=debug msg="call done: buildkit-runc did not terminate successfully: exit status 137" span="[2/2] RUN sleep 75" spanID=deba44c1fff943a2 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:13:14Z" level=debug msg="releasing runc process monitor"
time="2024-09-13T13:13:14Z" level=debug msg="runc process monitor ended (2)" span="[2/2] RUN sleep 75" spanID=deba44c1fff943a2 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
time="2024-09-13T13:13:14Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = context canceled" spanID=eba934ca8a5cdb84 traceID=cbb3c2fc5e2e5a1d9755795642a7387c
Canceled: context canceled
1 290af9a buildkitd --addr tcp://0.0.0.0:1234 --addr unix:///run/buildkit/buildkitd.sock --debug
main.unaryInterceptor
/src/cmd/buildkitd/main.go:713
google.golang.org/grpc.NewServer.chainUnaryServerInterceptors.chainUnaryInterceptors.func1
/src/vendor/google.golang.org/grpc/server.go:1194
github.com/moby/buildkit/api/services/control._Control_Solve_Handler
/src/api/services/control/control.pb.go:2493
google.golang.org/grpc.(*Server).processUnaryRPC
/src/vendor/google.golang.org/grpc/server.go:1383
google.golang.org/grpc.(*Server).handleStream
/src/vendor/google.golang.org/grpc/server.go:1794
google.golang.org/grpc.(*Server).serveStreams.func2.1
/src/vendor/google.golang.org/grpc/server.go:1027
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1695
time="2024-09-13T13:13:27Z" level=debug msg="remove snapshot" key=x5psvprsidvd8n3vwqkiiab73 snapshotter=overlayfs
time="2024-09-13T13:13:27Z" level=debug msg="schedule snapshotter cleanup" snapshotter=overlayfs
time="2024-09-13T13:13:27Z" level=debug msg="removed snapshot" key=buildkit/6/x5psvprsidvd8n3vwqkiiab73 snapshotter=overlayfs |
@fullykubed Pushed When I added sleep for |
@tonistiigi thanks a lot for providing this fix, today we suddenly had this issue with all our pipelines failing - are there any plans to merge this fix and release it? |
Any update on a fix for this? For me, it seems like this happens if there is a build error during emulation with QEMU. |
Hello! 👋
Problem
We are seeing the buildkitd logs full of the following lines:
The container id is always the same. These lines are repeated for many hours, sometimes days. The frequency is about 1000/minute (~15/s). We are seeing the buildkitd healthcheck fail, and it's not clear if the cause is these errors or if the errors are simply making it more difficult to determine the underlying cause.
A pod restart clears out the problem for a bit, but it generally happens again within a few hours.
Seems like the logs come from:
buildkit/executor/runcexecutor/executor.go
Lines 526 to 529 in f84cfe3
Before the log spam starts I don't see anything particularlly interesting in the logs
Details
Version:
buildkit:v0.12.3-rootless
(we also saw this problem with 0.11.0 before upgrade)We run 4 buildkitd runner with
--oci-worker-no-process-sandbox
. All of these seem to manifest this behaviour at one time or another (not necessarily at the same time).Any ideas about how to fix this? Thank you!
The text was updated successfully, but these errors were encountered: