Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: buildkit does not consider ONBUILD COPY --from #816

Closed
astorath opened this issue Feb 6, 2019 · 45 comments · Fixed by #5357
Closed

bug: buildkit does not consider ONBUILD COPY --from #816

astorath opened this issue Feb 6, 2019 · 45 comments · Fixed by #5357
Labels
area/dockerfile area/feature-parity Feature parity with classic builder help wanted

Comments

@astorath
Copy link

astorath commented Feb 6, 2019

Hi, we make heavy use of ONBUILD option in our environment.

Our typical Dockerfile looks like this:

FROM my.company.com/ci/dotnet:v1-build as build
FROM my.company.com/ci/dotnet:v1-runtime

This works fine in legacy docker build, but turning on BUILDKIT option activates some optimizations, so docker tries to build these containers in parallel. The problem is containers are dependent:

  1. my.company.com/ci/dotnet:v1-build:

    FROM base:v1
    WORKDIR /src
    ONBUILD COPY . /src
    ONBUILD make clean all
  2. my.company.com/ci/dotnet:v1-runtime:

    FROM runtime:v1
    WORKDIR /app
    ONBUILD COPY --from=build /src/target /app

Running DOCKER_BUILDKIT=1 docker build . results in:

DOCKER_BUILDKIT=1 docker build .
[+] Building 0.8s (8/8) FINISHED
...
 => ERROR [stage-2 3/1] COPY --from=build /src/target/ /app 0.0s
------
 > [stage-2 3/1] COPY --from=build /src/target/ /app:
------
 not found: not found

My questions are:

  1. Can I turn off these optimizations to run builds sequentially?
  2. Is there a way to mark these stages as dependent explicitly?
@tonistiigi
Copy link
Member

This is a very weird (and inventive) way to use COPY --from.

  • Can I turn off these optimizations to run builds sequentially?
  • Is there a way to mark these stages as dependent explicitly?

No, we don't want to add any special behavior for this. Either we consider this as bug and just fix the image resolution or we document that this is invalid usage (and make it error with a proper message).

A problem with implementing this is that we can't start to process all stages in parallel like we do now because we only know about the dependant images after we have pulled the config of the base. Still doable, just adds complexity. Once we have determined the correct dependencies of the stages it would build in regular concurrent manner.

@AkihiroSuda @tiborvass @ijc @thaJeztah get your votes in

@AkihiroSuda
Copy link
Member

IIRC ONBUILD was deprecated and unrecommended?

@thaJeztah
Copy link
Member

thaJeztah commented Feb 7, 2019

It's not deprecated, but I think the official images stopped creating onbuild variants, due to the behaviour being confusing.

So, IIUC, in buildkit the problem is that the stages are executed in parallel, thus (e.g.)

FROM foo AS ubuntu

FROM something:onbuild AS final

Where something:onbuild has

ONBUILD COPY --from=ubuntu /foo /foo

Could either result in the COPY copying from ubuntu:latest or from the first build-stage (depending on if it's been evaluated first)?

@astorath
Copy link
Author

astorath commented Feb 7, 2019

@tonistiigi

This is a very weird (and inventive) way to use COPY --from.

Well, this is not my invention, https://engineering.busbud.com/2017/05/21/going-further-docker-multi-stage-builds/ - this is one of the first google search results. So I don't think I'm alone here.

@thaJeztah

It's not deprecated, but I think the official images stopped creating unbuild variants, due to the behaviour being confusing.

Maybe this is confusing for official images, but for multistage internal images designed for that purpose - ONBUILD is a revelation...

Down the link above is a nice solution to the problem: if we could use something like:

ONBUILD FROM runtime:v1

we won't need this strange hack.

@tonistiigi
Copy link
Member

tonistiigi commented Feb 7, 2019

Quote from https://engineering.busbud.com/2017/05/21/going-further-docker-multi-stage-builds/:

I’m not sure if it’s a bug, a feature, or a undefined behavior

As the author of the multi-stage PR I can confirm I was not clever enough to see it as a possible feature.

@benvan
Copy link

benvan commented Oct 16, 2019

Just chiming in here - we do the same as @astorath at our company. It's been a godsend for enabling extremely efficient / modular builds and dockerfiles

For context: We discovered this issue after trying to use (new) build secrets (with buildkit)

@DavG
Copy link

DavG commented Feb 4, 2020

Same problem here.

I cannot reference another stage, or even another image, with ONBUILD COPY --from :(

@sirlatrom
Copy link

Same problem here.

I cannot reference another stage, or even another image, with ONBUILD COPY --from :(

You can if you add a dummy COPY --from=<previous layer> /dummy /dummy (assuming `/dummy exists in source image). That will add an explicit dependency. It's a workaround, even if inconvenient.

@DavG
Copy link

DavG commented Feb 4, 2020

I know this trick to force some intermediate stages to build in "weird cases", but I don't see how it is related to the problem here.

@DavG
Copy link

DavG commented Feb 4, 2020

It even does not work with a external image.

Minimum example to reproduce :

template/Dockerfile

FROM php:7.4.2-fpm-buster
ONBUILD COPY --from=composer:1.9.2 /usr/bin/composer /usr/bin/composer

project/Dockerfile

FROM bug_template

build.sh

DOCKER_BUILDKIT=1 docker build ./template --tag bug_template
DOCKER_BUILDKIT=1 docker build ./project

@sirlatrom maybe i didn't get your trick ? do you see a solution here ?

@sirlatrom
Copy link

The "trick" is to add a step in the bug image that copies a known-to-exist file from the composer image. That could also be done by creating a prior stage (from the composer image) in the bug image that adds such a dummy file so you can control which it is.

@DavG
Copy link

DavG commented Feb 4, 2020

Ok, let's try it :

template/Dockerfile

FROM php:7.4.2-fpm-buster
COPY --from=composer:1.9.2 /etc/passwd /etc/passwd_composer
ONBUILD COPY --from=composer:1.9.2 /usr/bin/composer /usr/bin/composer

Same error.

@sirlatrom
Copy link

I meant the new copy instruction should be placed in the last, most downstream image

@DavG
Copy link

DavG commented Feb 4, 2020

Could you provide a working fix based on my minimal example ?

@sirlatrom
Copy link

Could you provide a working fix based on my minimal example ?

Note that the use of images instead of stage names in COPY --from is not documented, so you should use probably stage aliases as shown.

./template/Dockerfile:

FROM php:7.4.2-fpm-buster
ONBUILD COPY --from=template /usr/bin/composer /usr/bin/composer

./project/Dockerfile:

FROM composer:1.9.2 AS template
RUN ["touch", "/tmp/dummy"]

FROM bug_template
COPY --from=template /tmp/dummy /tmp/dummy
DOCKER_BUILDKIT=1 docker build ./template --tag bug_template
DOCKER_BUILDKIT=1 docker build ./project

@tuananh170489
Copy link

Same error @astorath

@EricHripko
Copy link

Facing the same issue as what @astorath described - we have an ONBUILD build and ONBUILD runtime stages to streamline the Dockerfiles. Lack of this feature in buildkit is big roadblock to adoption for us 🙈

@EricHripko
Copy link

Hey folks 👋 Docker Desktop 2.4.0.0 enables BuildKit by default now, which will potentially break any Dockerfiles dependent on this functionality. I'm happy to work with maintainers to get this addressed if capacity is an issue here (will possibly need some pointers on where to start though). It looks like @tonistiigi's suggestion could be a way forward:

  • BuildKit starts by pulling the all external FROMs
  • BuildKit checks for stage additional dependencies in ONBUILD
  • BuildKit performs stage builds in parallel where possible (as before, but now accounting for --from)

@bruth
Copy link

bruth commented Nov 15, 2020

I ran in to this problem and stumbled upon this issue. What is needed to move this fix forward?

@softworm
Copy link

Is there any update on a solution for this? I've just discovered this limitation while building out some reusable images for a large project. I have multi-stage images which declare instructions such as ONBUILD COPY --from=other-image /foo /bar and also use BuiltKit secrets mechanism in ONBUILD instructions: ONBUILD RUN --mount=type=secret,id=my-secret,uid=101 source /run/secrets/my-secret && install-dependencies. The instructions are not being run in downstream image builds, which is a serious limitation. If it worked it would allow me to reduce the amount of boiler plate and have a set of reusable layers for application images across a large project.

EDIT: As I wrote the above I realized that I only use BuildKit for the secrets mechanism. If I can find an alternative I may drop BuildKit due to this limitation.

Here is the switch
companieshouse/ch.gov.uk#192

Just set environment arg DOCKER_BUILDKIT=0 before docker build

@boonware
Copy link

@softworm This does not solve the problem. Turning off BuiltKit means you cannot use the secrets mechanism, which is the only decent approach for passing secrets into a build.

@c-ameron
Copy link

So this is a bug right?
I have a build image

ONBUILD COPY Gemfile* /application/
ONBUILD RUN bundle install

and then a multi-stage build, however if I change the contents of the Gemfile, it won't pick it up

FROM build-image:foo as builder

FROM base-image

I can confirm that running a build with buildkit off, it works and will pick up the changes

@c-ameron
Copy link

@gautaz
Copy link

gautaz commented May 4, 2022

Just for the sake of adding another usage example of this (now missing) feature:
https://github.com/r2d2bzh/docker-build-nodejs

This project started internally 3 years ago.
At this time it was designed this way after reading the article pointed by @astorath in a previous comment and it also closely relates to the situation described by @EricHripko.

Many of us are still not using buildkit.
This means that, for now, we have to support both the pre-buildkit and post-buildkit environments.
The only way to do that ATM is to provide two different build methods, one for non-buildkit users and another for buildkit users.

What is also clearly odd for someone using the docker compose plugin is that these errors start popping between versions 2.3.3 and 2.3.4 of the plugin.
It took me at least an hour to relate these behaviors to this issue, this is OK but perhaps having a warning on this particular subject in the Dockerfile reference (or somewhere else) might be helpful to spot or avoid the issue more easily.
I can help documenting this, provided the proper guidance.

@tonistiigi
Copy link
Member

I've marked this with "help wanted" label but if you are planning to take this on please do verify the design here first(also look at #816 (comment)).

@gautaz
Copy link

gautaz commented May 7, 2022

Thanks @tonistiigi for the input 👍.

Just looking back at your latest comment, it feels like a link is missing:

please do verify the design here first

Did you intend to point to some particular documentation on the "here first" or did you just intend to point to the buildkit documentation?

I have also read again your proposals to resolve the issue, the second option (support by the dockerfile frontend for the async llb graph generation) seems in fact sexier.
Is there already some work started on this particular subject?
I guess the help wanted tag means there is no such thing but just to be sure...

@imsamurai
Copy link

Hi, so there going to be some fix or you're just breaking backward compatibility by literally saying "you always do it wrong way, our way is better! Just do refactoring of all your dockerfiles and ci/cd"? )

@chloe-zen
Copy link

Hi, so there going to be some fix or you're just breaking backward compatibility by literally saying "you always do it wrong way, our way is better! Just do refactoring of all your dockerfiles and ci/cd"? )

This ^

(I would have simply left a "this" emoji reaction but there is none.)

@vdboor
Copy link

vdboor commented Jan 23, 2023

For anyone that loves to work on this issue, I've included steps to reproduce the issue in moby/moby#42845 and kept those as simple as possible.

@filip-aipl
Copy link

filip-aipl commented Mar 8, 2023

Any update on that? Would love to use it.
Or maybe there is some workaround for that?

The 3 alternatives listed above are not the best options.

@Nearoo
Copy link

Nearoo commented Apr 6, 2023

Ditto, when is this getting fixed?

@ericqt
Copy link

ericqt commented Aug 29, 2023

Seeing as how this issue is related to concurrent builds of images I saw that we can create a builder with a config file that sets the max-parallelism to 0.
I followed this page to create a builder instance: https://docs.docker.com/engine/reference/commandline/buildx_create/
and supplied it with a parameter for config file using the --config parameter.
My config file looks like this:

debug = true

[worker.oci]
  enabled = true
  rootless = false
  max-parallelism = 0

but it seems like it's not working since the build failure is still persisting.

Another thing I've noticed is if we take this simple example from here: #816 (comment)

and I changed the the order of the imports from:

FROM test/base as base-image
FROM test/runtime

RUN cat /hello && echo "final"

to

FROM test/runtime
FROM test/base as base-image

RUN cat /hello && echo "final"

The build now passes. Seems like the builds start from the bottom?


Also, this issue is also blocking my org as we rely on ONBUILD ARG to determine how to build our images when the downstream processes consume our base images.

@chloe-zen
Copy link

@ericqt how about max-parallelism = 1?

@ericqt
Copy link

ericqt commented Sep 13, 2023

@chloe-zen I tried that as well and no go :(

@re-mscho
Copy link

re-mscho commented Nov 3, 2023

Any update? As we now see:

DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0
            environment-variable.

This will kill all our builds.

@laoshancun
Copy link

any update? same issue on multi-stage ONBUILD copy.

@drewhemm
Copy link

I ran into this issue and decided to think about it as a dependency management issue rather than a bug. Consider this code:

Base image Dockerfile, tagged as base:latest:

FROM alpine:latest
WORKDIR /foo
ONBUILD COPY bar.txt . 
ONBUILD RUN cat bar.txt

Application Dockerfile:

# Stage 1 
FROM base:latest AS me

# Stage 2
FROM alpine:latest
COPY --from me /foo/bar.txt /hello/bar.txt
RUN cat /hello/bar.txt

This example will likely fail with buildkit since the COPY instruction in app build stage 2 might happen before the ONBUILD executed when stage 1 is built, resulting in a 'file not found' error. The solution is really quite simple and does not require a huge refactor:

Base image Dockerfile, built as base:latest:

FROM alpine:latest
WORKDIR /foo

Application Dockerfile:

# Stage 1 
FROM base:latest AS me
COPY bar.txt . 
RUN cat bar.txt

# Stage 2
FROM alpine:latest
COPY --from me bar.txt /hello/bar.txt
RUN cat /hello/bar.txt

The secret is to ensure that any instructions in application stage one that are depended on by stage two are defined in the application Dockerfile rather than the base image Dockerfile. Any ONBUILD instructions in the base image Dockerfile must not be dependencies of later build stages. This means we can still use ONBUILD COPY and ONBUILD RUN to do things like install/update packages, copy files etc, i.e. things that are needed at runtime .

This is obviously a very cut down example and is not very useful, but in the real-world project I am currently working on, the method of the solution was identical and means I am able to achieve the same original intent, with minor changes to code. Yes, it means the COPY and RUN instructions are no longer centralised in the base image repo and they now need to be replicated across n application repos, but for two lines of code per Dockerfile, I can live with it.

@nightlark
Copy link

nightlark commented Mar 29, 2024

I just encountered this bug when trying to set up a multi-stage build with an early "builder" stage that uses a FROM image with ONBUILD to create an installer, and then a final stage that uses a FROM image with ONBUILD instructions to copy the installer from the builder and do the steps needed to install it. This involves quite a bit of duplicated instructions repeated across Dockerfiles in 50+ repositories, unlike the previous commenter who only has 2 fairly short lines that get duplicated.

From the docs (https://docs.docker.com/reference/dockerfile/#onbuild), The trigger will be executed in the context of the downstream build, as if it had been inserted immediately after the FROM instruction in the downstream Dockerfile implies that ONBUILD COPY executed as part of the image used for the final stage FROM image, should have identical behavior to sticking a COPY instruction immediately after the final stage FROM image in my Dockerfile -- but as many have pointed out here, it does not.

@tonistiigi
Copy link
Member

This has been implemented in #5357 . You can test with #syntax=tonistiigi/buildkit:pr5357 on top of the Dockerfile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dockerfile area/feature-parity Feature parity with classic builder help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.