Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile: Allow files to be excluded from cache on COPY and ADD commands #4561

Merged

Conversation

leandrosansilva
Copy link
Contributor

@leandrosansilva leandrosansilva commented Jan 16, 2024

note to the reviewers: this PR is essentially the re-creation of the closed PR #4440 , as I could not find a way to reopen it.

This PR introduces new options in the Dockerfile frontend, namely --exclude on COPY and ADD commands.

It essentially exposes the existing ExcludePatterns from the LLB to the frontend.

This PR affects only the cache does not prevent file which are excluded by the ADD or COPY commands to be added to the context.

I have manually tested the changes and they work as expected, but I have not yet added integration tests.

TODO:

  • Implement integration tests
  • Prevent files of being added to the context Edit: not done. I've found it easier to implement this in a further PR, as this seems to be an optimization. Please @tonistiigi let me know if you advice if you'd advice otherwise.

More information in the related issue #4439.

@leandrosansilva
Copy link
Contributor Author

Hi @tonistiigi I'm moving the discussion of the issue #4439 to this PR, if you don't mind. I can move this comment back to the issue if you find more suitable.

Previously on #4439 (comment)

AND from stages defined. Is it correct?

and from stages in use. Dockerfile frontend will already determine if a stage is reachable for a given build.

In this case, a file which is excluded (via --exclude) from all but one build stage must be added to the context.

Yes, there are cases where we can't be super precise about it. I think if there are multiple active copies, one has exclude and another one is using the excluded files then we should still transfer with one context and ignore the excludes (let the excludes happen via copy). But for simple cases where there is only one copy or all excludes are the same it should be possible to do the filter on transfer. I'm not sure how far we should go in determining if exclude from one COPY can collide with a path from another COPY. Just handling the most basic cases is probably fine initially. If common pattern appears later that is not handled by this case it can be improved later.

I confess this is yet no clear to me.

On a concrete example, let's suppose I have the following file tree on my repository:

dir1/example.excluded
dir1/file-001.png
dir1/file-002.txt
dir1/file-003.jpeg
dir2/example.excluded
dir2/file-001.pdf
dir2/file-002.jpeg
dir2/file-003.png
Dockerfile

Where Dockerfile has this as content:

FROM scratch as builder1
COPY --exclude=**/*.png --exclude=**/*.jpeg --exclude=**/*.excluded ./ /all-except-images

FROM scratch as builder2
COPY --exclude=**/*.pdf --exclude=**/*.txt --exclude=**/*.excluded ./ /all-except-text

FROM scratch
COPY --from=builder1 / /builder1
COPY --from=builder2 / /builder2

The final image should have the following files:

builder1/all-except-images/Dockerfile
builder1/all-except-images/dir1/file-002.txt
builder1/all-except-images/dir2/file-001.pdf
builder2/all-except-text/Dockerfile
builder2/all-except-text/dir1/file-001.png
builder2/all-except-text/dir1/file-003.jpeg
builder2/all-except-text/dir2/file-002.jpeg
builder2/all-except-text/dir2/file-003.png

So far so good. This is already working in the current PR.

A problem (1) I have now is that the files *.excluded are being copied to the context, even though they are not being used by any layer, as they are being excluded via --excluded.

I confess I could not figure out yet how to get them not to be copied to the context. Would you be able to roughly point out what part of the code is responsible for copying stuff to the context?

And, upon finding it, what approach would recommend to take here? In the example above, it's clear for me as a human that *.excluded should not go to the context as it's being excluded by all the stages, but I still struggle to get an algorithm for it.

If I understood properly, the cache mechanism, which uses the llb.CopyInfo.ExcludePatterns, is executed on the context created previously. In order to get the context to be aware of the exclude patterns, it would need to be aware of the llb, which I am not sure whether that's the case. I'd appreciate any insights here.

The other problem (2) I'd like to have some insights is on what kind of integration tests I need to validate this change. Although I could manually run the example above with the current PR, there seems to be in the repository no integration test that spawns from parsing the Dockerfile until an image is generated. Or I at least could not find any.

The test I foresee would consist on reading a Dockerfile and a directory, build an image and in the end ensure that the files in the image match the expectation (I could for instance the example I gave above). Still, it seems not to fit anywhere in the repository, meaning I'd need to create it.

Any insights into this problem as well? I am afraid I'm over-complicating things :-(

@tonistiigi
Copy link
Member

I confess I could not figure out yet how to get them not to be copied to the context. Would you be able to roughly point out what part of the code is responsible for copying stuff to the context?

The exludepatterns that are controlling the transfer of local files are in https://github.com/moby/buildkit/blob/master/frontend/dockerui/config.go#L453 and https://github.com/moby/buildkit/blob/master/frontend/dockerui/namedcontext.go#L291

, it's clear for me as a human that *.excluded should not go to the context as it's being excluded by all the stages

I think if could find the --exclude that are in use by all COPY rules (especially for the case where there is only one COPY) and ignore them on transfer that would be ok initially.

More complicated cases would be when the source path is not . . I think if the paths do not collide then I assume all excludes could be applied on transfer as well (but they need to be rebased to be in relation to source path).

It is fine by me if we start with some basic rules in here and can extend the coverage in follow-ups.

The other problem (2) I'd like to have some insights is on what kind of integration tests

Dockerfile integration tests are in https://github.com/moby/buildkit/blob/master/frontend/dockerfile/dockerfile_test.go and other test files in same package. For example of test that also checks what files were transferred see testNamedFilteredContext .

@leandrosansilva leandrosansilva force-pushed the feature/exclude-pattern-on-dockerfile branch from 2567f9f to 36863a0 Compare January 24, 2024 17:06
@leandrosansilva leandrosansilva marked this pull request as ready for review January 25, 2024 12:01
@leandrosansilva leandrosansilva force-pushed the feature/exclude-pattern-on-dockerfile branch from 36863a0 to 2f6740f Compare January 29, 2024 16:14
@tonistiigi
Copy link
Member

@leandrosansilva Is this ready?

@leandrosansilva
Copy link
Contributor Author

leandrosansilva commented Feb 7, 2024 via email

@leandrosansilva leandrosansilva force-pushed the feature/exclude-pattern-on-dockerfile branch from 2f6740f to 5c405e2 Compare February 8, 2024 10:06
@leandrosansilva
Copy link
Contributor Author

hi @tonistiigi the PR is now ready for review.

@tonistiigi tonistiigi added this to the v0.13.0 milestone Feb 9, 2024
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly fine but should initially go to labs channel for v0.13 where it can be further tested. For that it needs to be enabled only with a build tag and that build tag added to https://github.com/moby/buildkit/blob/master/frontend/dockerfile/release/labs/tags . If flag is used without build tag then it can show in error that labs channel is needed.

This can be follow-up (but it may need to be documented) but looks like when ADD does extraction from a tarball then the ExcludePatterns are not currently passed through. https://github.com/moby/buildkit/blob/master/solver/llbsolver/file/unpack.go#L38 . The library itself seems to support ExcludePatterns.

@@ -1127,6 +1137,8 @@ func dispatchCopy(d *dispatchState, cfg copyConfig) error {
copyOpt = append(copyOpt, llb.WithUser(cfg.chown))
}

copyOpt = append(copyOpt, &excludeOnCopy{cfg.excludePatterns})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add llb.WithExcludePatterns for this

@@ -7281,3 +7284,151 @@ func fixedWriteCloser(wc io.WriteCloser) filesync.FileOutputFunc {
return wc, nil
}
}

// See #4439
func testExcludedFilesOnCopy(t *testing.T, sb integration.Sandbox) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a couple of more cases (or additional test):

  • with a git source (see testAddGit)
  • testing that adding a file that does not match does not invalidate cache (see testCopyWildcardCache)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the review. I am traveling at the moment and will resume working on this PR next week. I've moved the PR to draft state until then.

It exposes the `ExcludePatterns` to the Dockerfile frontend, adding
`--exclude=<pattern>` option in the COPY and ADD commands, which will
cause filepaths matching such patterns not to be copied.

`--exclude` can be used multiple times.

References moby#4439

Signed-off-by: Leandro Santiago <leandrosansilva@gmail.com>
References moby#4439

Signed-off-by: Leandro Santiago <leandrosansilva@gmail.com>
This affects the --exclude option in the COPY and ADD commands on
Dockerfiles.

References moby#4439

Signed-off-by: Leandro Santiago <leandrosansilva@gmail.com>
Signed-off-by: Leandro Santiago <leandrosansilva@gmail.com>
Signed-off-by: Leandro Santiago <leandrosansilva@gmail.com>
Signed-off-by: Leandro Santiago <leandrosansilva@gmail.com>
@leandrosansilva leandrosansilva marked this pull request as draft February 19, 2024 17:07
@leandrosansilva leandrosansilva force-pushed the feature/exclude-pattern-on-dockerfile branch from 5c405e2 to 0befd8f Compare February 19, 2024 17:08
@@ -138,7 +138,7 @@ target "lint" {
matrix = {
buildtags = [
{ name = "default", tags = "", target = "golangci-lint" },
{ name = "labs", tags = "dfrunsecurity dfparents", target = "golangci-lint" },
{ name = "labs", tags = "dfrunsecurity dfparents dfexcludepatterns", target = "golangci-lint" },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just linter. https://github.com/moby/buildkit/tree/master/frontend/dockerfile/release/labs needs to be updated. Your code and new test does not run atm in the CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

carried this

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@tonistiigi tonistiigi marked this pull request as ready for review February 23, 2024 19:24
> **Note**
>
> The `--exclude` option can be specified multiple times and cause files matching its patterns not to be copied,
> even if the files paths match the pattern specified in `<src>`. This feature requires the build tag `dfexcludepatterns`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This feature requires using labs channel"

cc @dvdksn to double check this before the release. Maybe we need to separate it more. I also see that --parents is not mentioned for COPY that should be similar.

@tonistiigi tonistiigi merged commit c8cf831 into moby:master Feb 24, 2024
66 of 67 checks passed
@leandrosansilva
Copy link
Contributor Author

thank you @tonistiigi for taking over and finishing this PR! :-)

I will hopefuly resume this week working on the follow up changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants