-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved debugging support #1472
Comments
Regarding the "debugger image", my colleague @slushie did some interesting work with sharing a mount namespace (partial containers) with a image that has debugging tools: https://github.com/slushie/cdbg In that repository, there's a prototype of This may be useful to debug scratch images or minimal images that may not have the basic tools like a shell binary. |
/cc |
@coryb Now that |
I am working on #1714 now, I am guessing a week+ before I have something viable for that. I have not really looked into the change required for this yet. I think @hinshun has some ideas and is generally more familiar with this than I am. I will sync up with him and maybe twist his arm to help out 😄 I think we can try to break down what is remaining for this and try to come up with some estimates. |
Interactive shells being the only option is going to leave much to be desired when building in CI pipelines. I often use Docker in CI pipelines where the build command has no terminal to drop to or is a direct API call; having the only option be "run interactive" is not inline with current automated build best practices. Please consider an option to allow sideband inspection of buildkit layers, similar to how the legacy |
I've just upgraded Docker for Mac, which uses BUILDKIT as its default engine. Not feeling very comfortable with the suggested |
Just wanted to follow up, changing the backend while building works for me: |
I agree. I guess for now I will run so that I can get the image ids in the output again
|
Is there any solution in this space yet (that doesn't involve Can't find any example of active work to resolve this issue, might step in and help out if there's nothing in the pipeline |
I don't know what you mean by nsenter solution but that is not recommended. What you can do is create a named target to the position of the dockerfile you want to debug, build that target with |
Just chiming in with a user perspective, after being put in a new environment where BUILDKIT appears to be the default, this is a decidedly worse experience than the past. Clearly the layers are being cached. I'd guess the simplest solution with a "backward compatible user experience" might be to just automatically export the last cached layer to the image store, and display its hash, whenever there is an error in |
@tonistiigi |
The |
Iirc, when using Compose, edit: https://github.com/compose-spec/compose-spec/blob/master/build.md#target |
The proposed option mentioned in #1053 , where you can specify that it should create the image even on failure, would be very helpful. It would even be helpful if you could just enhance the --output option with a flag that it also outputs on failure. |
This would be fantastic. It's the only thing holding me back from moving over to buildkit full time! |
Just want to say that it is VERY painful to not be able to interatively debug intermediate images... |
After switching to buildkit recently because of the secret-mount option, I've just spent about half an hour trying to figure out what magical command I need to show the images in the buildkit cache, the apparent answer being "it's not possible". I find it hard to believe that this issue still persists... |
You can add a multi-stage split anywhere in your Dockerfile and use |
A temporary work-around is docker-compose, which (as of writing, v1.29.2) still doesn't use build kit when you do |
This can give you a look at a the point after a successfully completed stage:
But unlike when DOCKER_BUILDKIT=0, I don't think there's a way to see the hash for each layer created in the image so you can't just jump in right before the error and test at the moment of failure. Highly unfortunate, and a big deal if you ask me! |
|
FYI: I'm recently implemented an experimental interactive debugger for Dockerfile : buildg https://github.com/ktock/buildg Also in buildx, discussion is ongoing towards interactive debugger support and UI/UX: docker/buildx#1104 |
|
It's been quite some time since there's been movement here. Can we get an update on this? |
I fully support the idea of getting the hashes of each layer back. Maybe a good compromise would be to at least display the hash of the layer a failing command was run in? |
Hashes of each later would help so much. |
still using |
Because it's not simply "give the hashes", those hashes (i.e. what you see in the legacy builder) do not exist until the export stage of the build, and generating them by exporting each layer as it's built into an image would be a non-trivial operation that makes BuildKIt slower for everyone, and require redesigning the BuildKit build process to know about and use the chosen image exporter much earlier in the build than it does now. As mentioned earlier, the solution for your actual problem (debugging failed builds in Given that the BuildKit work to implement debugging was completed almost a year ago ( |
I just want the hash of the last layer built prior to the failure. Don’t need the hash of every later exported. |
That's what #1472 (comment) does now, by making the "last layer" the final layer, so BuildKit can export an image, since that's all it knows how to do. Anything more would only be workable when BuildKit is being used with Docker directly (and knows it), and buildx exists to contain those cases. What other use do you have for intermediate image generation and hash output that isn't hand-implementing docker/buildx#1104 and isn't trying to build #1472 (comment) directly into BuildKit instead of buildx? |
My use case is actually to access the test report files after a failed unit test step. At the moment we use a separate target that has the unit test as last step with a " || echo failed" at the end to always succeed so we have an image to extract the test report from. But that requires building the dockerfile twice in each build, and specially tuning all the dockerfiles to support this. So access from an automated script to the build/state/files after a failed build would be very useful. |
Okay, so that's a use-case that isn't supported by the legacy builder either, AFAIR, it never created an image out of a failed step. I hope you'll be pleased to know that PR8 of docker/buildx#1104 is implementing both "Execute in container at start of failed step" (similar to legacy builder "write-down layer ID and docker run it") and "Execute in container after failed step" (new! and the default) in the monitor via proposed Based on this work, it would probably also be possible to implement in buildx something that can actually export an image from either the start or end of a failed step, since (I think) BuildKit now sends enough information on failure for buildx to request an image export of the container state, and buildx has enough information to tell BuildKit where to send such an image. I don't immediately see an open feature-request in buildx for that, and I suspect it wouldn't be worked on until docker/buildx#1104 is completed (since the work heavily overlaps). It's also possible that I'm wrong and the infrastructure that supports docker/buildx#1104 is not sufficient to support buildx exporting either or both of the before and after images of a failed build step. So yeah, I suggest you open a feature request for your use-case on buildx, and see what the buildx maintainers think. (I'm not a buildx maintainer; I'm not super familiar with that codebase, and I have no particularly strong prediction on what they'll think of it. I hope they like it, it seems useful to me for, e.g., tests-run-during-container-build workflows.) |
True, legacy didn't support that either. I was just throwing it out there as a use-case, and I am indeed pleased to know that information about PR8, thank you ^^ |
So my usecase is to use docker build to run all the package restore, build, test (including coverage, static code analysis, static security analysis et c) and finally put the built artifact in a lightweight image. The only issue is I'd need to access the test output from the intermedate layer to push to the CI system, and that is possible with buildkit = 0, but as far as this discussion goes not possible with buildkit. Now I'm all for performance, but I'd love it it was possible to label and publish an intermediate layer manually for this specific case. Otherwise I need multiple dockerfiles, like a barbarian. |
You can use #1472 (comment) instead of multiple Dockerfiles. Or you can PR a change that adds an option to stop at a specific Dockerfile line. |
I don’t know how you do test coverage and test results, but I’d like to have the output every run, not just when tests break. |
If your case is that you want to build multiple things (stages) and push their results to different locations, not only your final build result then you can look into |
There are some new (experimental for now) debug options in new buildx release candidate: https://github.com/docker/buildx/releases/tag/v0.11.0-rc1 |
I finally needed to use the experimental debug invoke, and I really like how it works! I hope it gets added to the |
So, considering all the experimental features, is there now a possibility to run a command (typically a shell) inside a build container? With the normal builder, I can run Does buildkit support this in any way, other than running an ssh reverse tunnel from inside the container in a RUN build step? It would be nice for it to support it before the normal builder is removed. |
@shapirus Does https://github.com/docker/buildx/blob/v0.11.2/docs/guides/debugging.md do what you want? The BuildKit-side requirements (low-level bits) are implemented; the buildx side is being built-out, was shipped experimentally in buildx 0.11 and hence Docker Desktop 4.22.0, and is looking for feedback at docker/buildx#1104. I'd suggest trying buildx 0.12.0-rc1 if you're interested in this feature, as the command-line was changed and the relevant docs are now at https://github.com/docker/buildx/blob/v0.12.0-rc1/docs/guides/debugging.md. That way any feedback you give is relative to the current state of development. |
@tonistiigi does it make sense to close this issue? Now that we're tracking things in docker/buildx#1104, and the area/debug tag on buildx. |
Yes, from what I read there, it should solve it, as far as practical use cases are concerned. Thanks for the hint. |
addresses #1053
addresses #1470
An issue with the current build environment is that we often assume everyone can write a perfect Dockerfile from scratch without any mistakes. In real-world there is a lot of trial and error for writing a complex Dockerfile. Users get errors, need to understand what is causing them, and react accordingly.
In the legacy builder, one of the methods for dealing with this situation was to use
--rm=false
or look up the image ID of the last image layer from the build output and rundocker run
session with it to understand what was wrong. Buildkit does not create intermediate images nor make the containers it runs visible indocker run
(both for very good reasons). Therefore this is even more complicated now and usually requires the user to set--target
to do a partial build and the debug the output of it.To improve this, we shouldn't try to bring back
--rm=false
that makes all the builds significantly slower and makes it impossible to manage storage for build cache. Instead, we could provide a better solution for this with a new--debugger
flag.Using
--debugger
on a build, should that build error, will take the user into a debugger shell similar to interactivedocker run
experience. There the user can see the error and use control commands to debug the actual cause.If the error happened on a
RUN
command (execop
in LLB), the user can use shell to rerun the command and keep tweaking it. This will happen in an identical environment to the one whereexecop
runs, for example, this means access to secrets, ssh, cache mounts etc. They can also inspect the environment variables and files in the system that might be causing the issue. Using control commands, a user can switch between the broken state that was left behind by the failed command and the initial base state for that command. So in the case where they would try many possible fixes but end up in a bad state, they can just restore back to the initial state and start again.If the error happened on a copy (or other file operation like rm), they can run
ls
and similar tools to find out why the file path is not correct and not working.For implementation, this depends on #749 for support to run processes on build mounts directly without going through the solver. We would first start by modifying the
Executor
andExecOp
to instead of releasing the mounts after error, return them together with the error. I believe typed errors #1454 support can be reused for this. They should be returned up to the clientSolve
method, who can then decide to callllb.Exec
with these mounts. If mounts are left unhandled, they are released with the gateway api release.Once the debugging has completed, and the user has made changes to the source files, it is easy to trigger a restart of the build with exactly the same settings. This is also useful if you think you might be hitting a temporary error. If the retry didn't fix it, user is brought back to the debugger.
It might make sense to introduce a concept of "debugger image" that is used as a basis of the debugging environment. This would allow avoiding hardcoded logic in an opinionated area.
Later this could be extended with the step-based debugger, and source mapping support could be used to make source code changes directly in the editor or tracking dependencies in the build graph.
@hinshun
The text was updated successfully, but these errors were encountered: