Stream processors not supported by differ #4263

tonistiigi · 2020-05-17T19:51:31Z

Despite the fact that stream processors are defined in the diff package they are not supported by the differ and only by implementations of Apply() function.

The differ continues to use hardcoded compression step, only supporting uncompressed and gzip. From the docs and the design intention, it seems quite logical that stream processors should be supported on both sides.

Looking at the code, I don't see much issues with supporting the external binary logic via toml configuration. But the Go API for stream processors seems very much designed with only the Apply use-case in mind.

The first issue I see is that return media type is only accessible after processor has been initialized(and might already read data). That means that when finding a processor, only the accept-mediatype can be used. The return mediatype should also be available before initialization in the handler. All the implementations already use it this way, they just currently pass and store the static mediatype from the handler level to the processor implementation. Because of this, the current processor lookup is also buggy in cases where multiple processors would chain together.

The other and much more complicated issue is that the Go API is solely based on ReadCloser, while differ implementations are usually based on writers. One can, of course, use an extra goroutine and io.Pipe to turn a readcloser into a writecloser, but that would be quite inefficient. For example in default gzip case gzip.NewWriter would turn into a ReadCloser with io.Pipe and then when differ writes to archive.WriteDiff that ReadCloser would be turned back to Writer with another io.Pipe. A solution for this would be to allow both reader and writer based implementations as a processor provider and both types of processor chains. That should make sure that default cases don't do io.Pipe and complex cases do at most one. Another option would be to already define a reader/writer type in the registration/config level. This looks quite ugly from API but would work because in reality we only have processors that either convert to uncompressed layer mediatype or from uncompressed layer mediatype. A cleaner API would then have been to instead of trying to define a generic conversion between mediatypes to define compressors/decompressors for layers.

If there is no plan to support stream processors for this, that would be fine for my use case as well. As I mostly just use libraries, I could just add a callback option to differ where I can pass the WriteCloser for compression.

@fuweid @cpuguy83

The text was updated successfully, but these errors were encountered:

fuweid · 2020-05-18T14:46:37Z

SGTM

tonistiigi added the kind/feature label May 17, 2020

fuweid added this to the 1.5 milestone May 18, 2020

tonistiigi mentioned this issue Sep 5, 2020

Add support for zstd compressed layers moby/moby#40820

Closed

dmcgowan modified the milestones: 1.5, 1.6 Apr 20, 2021

tonistiigi mentioned this issue Jul 13, 2021

Support estargz compression type moby/buildkit#2246

Merged

ktock mentioned this issue Jul 14, 2021

Support custom compressor for walking differ #5735

Merged

dmcgowan modified the milestones: 1.6, 1.7 Feb 17, 2022

dmcgowan removed this from the 1.7 milestone Mar 2, 2023

dosubot bot added the Stale label Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream processors not supported by differ #4263

Stream processors not supported by differ #4263

tonistiigi commented May 17, 2020

fuweid commented May 18, 2020

Stream processors not supported by differ #4263

Stream processors not supported by differ #4263

Comments

tonistiigi commented May 17, 2020

fuweid commented May 18, 2020