Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support estargz compression type #2246

Merged
merged 1 commit into from
Aug 25, 2021
Merged

Conversation

ktock
Copy link
Collaborator

@ktock ktock commented Jul 13, 2021

Currently, BuildKit supports lazy pulling of eStargz but not supports creating eStargz images. It would be great if it also supports the creation (there was also a discussion in slack recently). This commit adds support for compression=estargz to the cache as one of the compression variants.

buildctl build ... \
  --output type=image,name=docker.io/username/image,push=true,compression=estargz,oci-mediatypes=true

This commit also contains updates to docs to describe this usage.

@tonistiigi
Copy link
Member

Do I get correctly that this creates 2 blobs. One from differ and then tries to convert it. This should be avoided. PTAL containerd/containerd#4263 (especially the last section about adding a workaround if there is no full support).

@ktock
Copy link
Collaborator Author

ktock commented Jul 13, 2021

@tonistiigi Thank you for the comment.

Do I get correctly that this creates 2 blobs. One from differ and then tries to convert it. This should be avoided.

Yes, this creates one blob from differ and converts it. This might slower the overall time of estargz conversion but why I choose this approach is because it allows us to have this feature without having an additional (forked) implementation of diff.Comparer. Could you tell me why we should avoid this approach? --force-compression already does something similar. (I might have overlooked something)

PTAL containerd/containerd#4263 (especially the last section about adding a workaround if there is no full support).

Is there example/existing implementation of add a callback option to differ where I can pass the WriteCloser for compression. mentioned in containerd/containerd#4263 ?

@tonistiigi
Copy link
Member

--force-compression already does something similar. (I might have overlooked something)

My understanding was that this only happened if the blob with a different compression already existed from the previous build/pull. In that case, the behavior is expected. If that is not the only case we should check that there isn't a performance or storage regression on the default path.

Is there example/existing implementation of add a callback option to differ where I can pass the WriteCloser for compression.

No that would need to be an update in the differ method in containerd library. When I looked at it last time containerd only supported uncompressed/gzip. This is needed for zstd as well, this is the case that I created the issue for.

@ktock
Copy link
Collaborator Author

ktock commented Jul 13, 2021

@tonistiigi

My understanding was that this only happened if the blob with a different compression already existed from the previous build/pull. In that case, the behavior is expected.

Yes, the conversion by --force-compression happens only when the blob from the previous build/pull has the compression type different from the specified one.

If that is not the only case we should check that there isn't a performance or storage regression on the default path.

This patch creates 2 blobs only when the compression is estargz and doesn't affect the default path of gzip and uncompressed. Do you think that should be avoided even for estargz? If so I think we need to have a new diff.Comparer implementation that supports estargz. WDYT?

@tonistiigi
Copy link
Member

Yes, the conversion by --force-compression happens only when the blob from the previous build/pull has the compression type different from the specified one.

Thanks for confirming.

Do you think that should be avoided even for estargz? If so I think we need to have a new diff. Comparer implementation that supports estargz. WDYT?

Yes, it would be preferred to avoid creating 2 blobs. It takes more storage and slows things down without the user expecting it. We have this issue anyway when we want to add more compression methods like zstd. As containerd differ only supports gzip I think the simplest way is to extend it with a callback method that implements the compression that it will call instead of the builtin gzip routine. Eventually, they probably want to extend their stream processors capability but I think that is a bigger change(maybe you disagree?). FYI @fuweid

@ktock ktock marked this pull request as draft July 14, 2021 13:17
cache/blobs.go Outdated
Comment on lines 131 to 137
descr, err = sr.cm.Differ.Compare(ctx, lower, upper,
diff.WithMediaType(mediaType),
diff.WithReference(sr.ID()),
diff.WithCompression(compressor),
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi Added this API to the fork of containerd (ktock/containerd@11b4622). Now this patch doesn't create 2 blobs. I'll work on upstreaming this API.

@ktock ktock force-pushed the esgzcompression branch 7 times, most recently from 4f79c89 to de8a342 Compare July 20, 2021 01:49
go.mod Outdated
@@ -14,6 +14,7 @@ require (
github.com/containerd/go-cni v1.0.2
github.com/containerd/go-runc v1.0.0
github.com/containerd/stargz-snapshotter v0.6.4
github.com/containerd/stargz-snapshotter/estargz v0.6.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump up to 0.7.0, and update Dockerfile as well

@ktock
Copy link
Collaborator Author

ktock commented Jul 27, 2021

#2246 (comment) has been merged in containerd. Updated this patch not to depend on the fork.
Also bumped up stargz snapshotter to v0.7.0

cache/blobs.go Outdated
switch compressionType {
case compression.Uncompressed:
mediaType = ocispec.MediaTypeImageLayer
case compression.Gzip:
mediaType = ocispec.MediaTypeImageLayerGzip
compressor = func(dest io.Writer, requiredMediaType string) (io.WriteCloser, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to set custom compressor for plain gzip?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this not to set the custom compressor.

@@ -136,3 +159,162 @@ func convertMediaTypeToGzip(mt string) string {
}
return mt
}

func eStargzLayerConvertFunc(ctx context.Context, cs content.Store, desc ocispec.Descriptor) (*ocispec.Descriptor, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you move these stargz-only helper functions to a separate stargz file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Moved this into a new file.

cache/remote.go Outdated
@@ -102,30 +103,38 @@ func (sr *immutableRef) GetRemote(ctx context.Context, createIfNeeded bool, comp
existingRepos = append(existingRepos, repo)
}
desc.Annotations[dslKey] = strings.Join(existingRepos, ",")
if dh, ok := sr.descHandlers[desc.Digest]; ok {
desc.Annotations = mergeEStargzAnnotations(dh.SnapshotLabels, desc.Annotations)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note. It would be better if this annotations load/save wouldn't need startgz-only exceptions everywhere and would be more generic. But I don't have a specific proposal how to improve this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the note. Fixed the patch to make it more generic. Now *immutableRef.addCompressionBlob() saves estargz-related annotations as well so we don't need estargz-only expections.

@@ -161,6 +166,9 @@ func (e *imageExporter) Resolve(ctx context.Context, opt map[string]string) (exp
i.meta[k] = []byte(v)
}
}
if esgz && !i.ociTypes {
logrus.Warn("estargz compression should be used with oci-mediatypes for enabling layer verification")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want to make this an error? Or always imply ocitypes for stargz, at least if not set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this to always imply ocitypes with a warning log.

@sipsma
Copy link
Collaborator

sipsma commented Aug 16, 2021

Haven't taken a look in depth yet, but one note, if #2273 gets merged before this PR please update the new TestGetRemote in manager_test.go to also create this new estargz mediatype (right now it covers gzip and uncompressed) so we can make sure there aren't any regressions in parallel blob creation.

@ktock
Copy link
Collaborator Author

ktock commented Aug 19, 2021

@sipsma Thank you for the note about the tests. Added estargz compression to TestGetRemote.

@@ -146,13 +146,32 @@ eStargz is an extended format of stargz by [Stargz Snapshotter](https://github.c
It comes with [additional features](https://github.com/containerd/stargz-snapshotter/blob/master/docs/stargz-estargz.md#estargz-archive-format) including chunk verification and prefetch for avoiding the overhead of on-demand fetching.
For more details about lazy pull with stargz/eStargz images, please refer to the docs on these repositories.

### Getting stargz/eStargz formatted images
## Obtaining stargz/eStargz images
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/obtain/create/ perhaps

@@ -161,6 +166,10 @@ func (e *imageExporter) Resolve(ctx context.Context, opt map[string]string) (exp
i.meta[k] = []byte(v)
}
}
if esgz && !i.ociTypes {
logrus.Warn("forcefully turning on oci-mediatype mode for estargz")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably s/forcefully/forcibly/

@AkihiroSuda
Copy link
Member

Sorry needs rebase

@ktock
Copy link
Collaborator Author

ktock commented Aug 24, 2021

Rebased. Also added tests to check the converted descriptor is valid and fixed estargz generation code to pass the tests.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
@AkihiroSuda
Copy link
Member

3 LGTMs, merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants