-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCI Images used as cache SHOULD not have empty layers list #4331
Conversation
Just to note - the I'm actually not convinced that we should take this, unless it's clear that this would actually fix aws/containers-roadmap#876. I thought the underlying issue with this was putting layers directly under an image index, see: #2251. That said, maybe @tonistiigi and @neersighted will have more context here. |
I have been meaning to move BuildKit to use artifacts for provenance and for registry cache images; I would like to lean into that as an alternative to this PR, though I have not started any work myself. I'm happy to provide input/guidance if you want to take that on, @TBBle |
I started work on something like this here: #3610, I'm happy for someone to take over that, though I might also get to it eventually. I think we could probably update that to set the config type to the empty descriptor, which would hopefully work around the Quay/Gitlab issue. We could then set I think if that works on all registries, we could definitely do that change. I'm not sure if you're also suggesting switching to the referrers API instead of doing index-bundling - while I'd be happy to have that as an opt-in control for users who want that, I don't think the registry support is out there (including dockerhub, IIRC) to justify changing that default. |
I already talked through this with @tonistiigi pretty extensively, but the gist of it is that I want to use the empty config.mediaType, a custom artifactType, and keep the layers as they are. This will significantly reduce the burden to detect/properly handle attestations in runtimes. With regard to the spec, I understand the reluctance, but it is final as it is going to get and is only blocked on hammering out some final points of contention in distribution-spec. I've been participating in the OCI weekly meetings for over a year now, and I'm confident that we can start building on top of these new primitives (and indeed, getting away from 'fake images' will make all existing conformant registries/most real implementations accept the content). With regard to index-bundling, my proposal is to do both, or to keep it as is. Artifacts != referrers, and while it would be nice to eliminate the digest label and do discovery via the referrers API, I consider that second-order work/not as important as getting away from "fake images" in the immediate term. |
Is there a way to generate image manifest with |
We added the "empty" blob to the spec because Quay, GCR, ECR, and Artifactory all rejected manifests without any layers field, and GCR and ECR rejected a layers field with an empty array: opencontainers/image-spec#1025 |
Indeed, I phrased this PR around the idea that rc5 gets shipped as 1.1.0, or at least no relevant changes happen between the two. I wouldn't want this merged before the 1.1.0 spec release, more-careful testing, and also someone actually verifying that this change actually solves the observed issue with ECR.
The remote registry cache works on ECR with
I probably have a knowledge gap here, because I'm not sure what this means, in terms of changes that would resolve this issue. My understanding is that with #3610, if that is along the lines you were thinking here, appears to be bringing attestation manifests into line with how
Thank you for that, it looks like this issue affects GCR too. And possibly Quay.io and Artifactory, since if I remember correctly, Go will serialise an empty array to JSON as As soon as I saw the guideline for "portability", I suspected that this was the reason for that part of the spec.
This DockerFile will do it, as seen in the ECR roadmap issue. FROM ubuntu:22.04 AS xxx
ARG CI_PIPELINE_ID=not_set
ENV BUILD_BY_CI_PIPELINE_ID=$CI_PIPELINE_ID
#
# RUN echo dummy >f
CMD echo $CI_PIPELINE_ID and enabling the (CLI per aws/containers-roadmap#876 (comment))
My understanding is that the use-case is that these are dummy images used in their CI pipeline, and so the command-line isn't aware that these images have degenerate caches, so they can't simply skip The test I added for this case uses st := llb.Scratch()
def, err := st.Marshal(sb.Context())
require.NoError(t, err) as seen in BuildKit skipping pushing the cache manifest when it has no Layers (i.e. ignoring cache-to) also makes sense to me as a resolution to this issue, and could also be done for image index caches, when their Manifests contain only a cache-config. It'd be easy to adapt the existing test too. Would that be something that should happen in the other exporters too? i.e. would BuildKit not export degenerate caches, or would it just be the Registry Exporter doing that. Or even just the image-manifest exporter? I wonder if that would surprise any users who have built tooling around the expectation that after I also suspect (but have not tested) that we could trigger the same issue with a simple image push (no cache) to ECR or GCR with this Dockerfile: FROM scratch
ARG CI_PIPELINE_ID=not_set
ENV BUILD_BY_CI_PIPELINE_ID=$CI_PIPELINE_ID
CMD echo $CI_PIPELINE_ID since the I haven't tried to resolve that here, as (a) that isn't a new issue, so mitigations may already exist; (b) using this fix in that case has impact outside BuildKit. Also, not pushing the image isn't an option for this case. Since the image-spec requirement is only |
Yes, there are no layers in this case and no config as well so I think we can skip it. Metadata commands in Dockerfile that modify image config are not tracked by build cache.
All exporters that use the JSON manifests(this code is shared). Eg. the output above is from the local exporter.
Cache import from missing ref does not cause a build to fail, so I don't think this is a problem. This also looks like an exceptional case as such a combination of flags does not bring any value to the user. We probably should still print a message in export progress though about "skipping cache export for empty result". |
Upstream changed the definition of the action in docker/docs#17427 to no longer expect the `repo` parameter. Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
This brings in new string constants ImageIndexFile and ImagesBlobsDir. Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
Per image-spec 1.1.0, OCI Images that would otherwise have an empty layers list SHOULD instead have a single "Empty JSON" known-value layer listed. This guideline does not apply to Image Indexes, and also does not appear in the Docker Manifest/Manifest List documentation, so those cases are unchanged. WIP notes: The empty layer is not handled in the cache importer. Not sure if we need to, tests pass without any such changes. Do we actually need to ensure the blob exists on the registry? Probably; the tests pass because the blob already exists as the default cache config object, but that's 100% crimes. Only applied to remote registry cache. Not sure if the other caches should do something similar. Are any of them also outputting in an OCI image format? Relatedly, should we do something like this for normal OCI images? I don't know how robust the ecosystem is against unexpected layers in an OCI Image. Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
e72ae54
to
5bb3f9d
Compare
Not that it matters, but my two cents is that had I considered this for #2251 I would have tried to skip the manifest push like @tonistiigi said. Thanks for chasing this up @TBBle . I was more concerned with the fact that customers to our registry doesn't actually see the errors coming from our registry that use the OCI response body schema that would mention that the image being pushed doesn't have layers, containerd/containerd#8969 which you're already aware of :) |
The current format isn't really an artifact (manifest), it's an index with blobs and config referenced as top-level objects (often called the cache manifest), e.g. neersighted/bk-cache:im-oci:
I'd prefer to think about aligning this (opt-in) to the
Yes, this is along the lines I am thinking, though I have some differences in how I'd like to see the format represented. |
Oh, shoot, it turns out this entire time I was using BuildKit 0.11 and did not realize it! Here's the correct manifest (neersighted/bk-cache:im-oci2):
Yeah, there's a couple things I'd like to tweak here (e.g. set the artifactType), but this is otherwise what I would have suggested/expected. |
Okay, #4336 has pretty clear consensus as the way to approach this issue for BuildKit remote registry caches. Clearly no one's hit a similar problem for pushing So closing this one out. Thank you all for the feedback. |
Opening this PR as draft for discussion and visibility. The first three commits are not actually related, and could be pulled out as a separate PR (Edit: Now #4334), all the interesting stuff is in the final commit, 5bb3f9d at the time of writing.
I noticed while researching background for aws/containers-roadmap#876 that the OCI Image Format Specification v1.1.0-rc5 contains the following note in the Guildelines for Artifact Usage:
And a similar note appears in the Image Manifest
layers
property.I threw together this PR in the hope that this might let aws/containers-roadmap#876 finally be resolved. I don't know that it actually helps there; I haven't tested this with anything except the CI test-suite.
Reproducing the WIP notes from the commit message
The empty layer is not handled in the cache importer. Not sure if we need to, tests pass without any such changes.
Do we actually need to ensure the blob exists on the registry? Probably; the tests pass because the blob already exists as the default cache config object, but that's 100% crimes.Edit: The descriptor inlines the data value, so assuming that's handled correctly, this isn't an issue. Not crimes.Only applied to remote registry cache. Not sure if the other caches should do something similar. Are any of them also outputting in an OCI image format?
Relatedly, should we do something like this for normal OCI images? I don't know how robust the ecosystem is against unexpected layers in an OCI Image.