Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compression: add support for the zstd algorithm #639

Merged
merged 5 commits into from
Aug 21, 2019

Conversation

giuseppe
Copy link
Member

@giuseppe giuseppe commented Jun 7, 2019

zstd is a compression algorithm that has a very fast decoder, while
providing also good compression ratios. The fast decoder makes it
suitable for container images, as decompressing the tarballs is a very
expensive operation. This is a first step at supporting zstd as we We
don't yet generate zstd layers.

In my testing, copying the Fedora image from a local dir: repository,
the wall clock time passed from ~8s with gzip compression to ~4.5s
with zstd.

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

@giuseppe
Copy link
Member Author

giuseppe commented Jun 7, 2019

PR for c/storage: containers/storage#363

@giuseppe giuseppe changed the title compression: add support for the zstd algorithm [WIP] compression: add support for the zstd algorithm Jun 7, 2019
@giuseppe
Copy link
Member Author

@mtrmac fixed now, it is using the 0.8 format.

How could we start generating zstd tarballs? Should copyBlobFromStream be extended to allow changing compression type?

@mtrmac
Copy link
Collaborator

mtrmac commented Jun 11, 2019

@mtrmac fixed now, it is using the 0.8 format.

ACK. Tests fail on Windows, though.

How could we start generating zstd tarballs? Should copyBlobFromStream be extended to allow changing compression type?

Yes, an option in types.SystemContext allowing the user to choose the compression format (if compression is used) would work for now; I suppose defaulting to gzip for interoperability.

@giuseppe
Copy link
Member Author

Skopeo WIP PR here: containers/skopeo#671

copy/copy.go Show resolved Hide resolved
copy/copy.go Show resolved Hide resolved
copy/copy.go Outdated Show resolved Hide resolved
copy/copy.go Outdated Show resolved Hide resolved
copy/copy.go Outdated Show resolved Hide resolved
copy/copy.go Outdated Show resolved Hide resolved
copy/copy.go Outdated Show resolved Hide resolved
"gzip": GzipCompressor,
"bzip2": Bzip2Compressor,
"xz": XzCompressor,
"zstd": ZstdCompressor,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the callers need to deal with the strings, please turn them into a typed Go enum.

… and then DecompressorFunc and CompressorFunc can be removed, and become methods on that typed enum. (Well, DecompressorFunc needs to continue to exist so that we don’t break the API.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you fine with the last version where I changed the function to look like:

func GetCompressor(dest io.Writer, name string, level int) (io.WriteCloser, error)

so now we don't expose CompressorFunc at all. Do you still want to have a Go enum for the compression algorithms?

Copy link
Collaborator

@mtrmac mtrmac Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please

  • let’s not have an untyped
    desiredCompressionFormat = "gzip"
    when it could use a compression.Gzip constant
  • A named Go type would prevent accidentally/unnecessarily mixing raw strings and format values.
  • Depending on what the type exactly ends up being, making GetCompressor a method on the format type might eliminate the
    c, found := compressors[name]
    if !found {
    failure path. (If not, no harm done.)
  • OTOH, that might force us to add a compression.GetFormatFromString(), to set types.SystemContext.CompressionFormat with the typed value — but that would be a good thing, because invalid values would have to be recognized early at the CLI level, instead of deep inside the compressGoroutine.

copy/copy.go Outdated Show resolved Hide resolved
pkg/compression/compression.go Outdated Show resolved Hide resolved
@giuseppe giuseppe force-pushed the zstd branch 6 times, most recently from 00db5ed to 38a18ac Compare June 12, 2019 11:16
@giuseppe giuseppe changed the title [WIP] compression: add support for the zstd algorithm compression: add support for the zstd algorithm Jun 13, 2019
@giuseppe
Copy link
Member Author

rebased ⬆️

@giuseppe
Copy link
Member Author

@vrothberg @rhatdan PTAL

@vrothberg
Copy link
Member

@giuseppe needs another rebase (will review now)

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just a few nits. @giuseppe, could we add some tests to make sure we don't regress? pkg/compress has some tests in place that could be extended.

copy/copy.go Outdated Show resolved Hide resolved
vendor.conf Outdated Show resolved Hide resolved
types/types.go Outdated Show resolved Hide resolved
@giuseppe giuseppe force-pushed the zstd branch 3 times, most recently from bfe8784 to 16b9f39 Compare June 20, 2019 09:25
@rhatdan
Copy link
Member

rhatdan commented Jun 20, 2019

If we add support for zstd? Would an image created with this algorythm, still be able to be used on Docker? Older versions of CRI-O, Buildah, Podman?

@vrothberg
Copy link
Member

@giuseppe, could you add tests? Besides that, LGTM.

@giuseppe
Copy link
Member Author

If we add support for zstd? Would an image created with this algorythm, still be able to be used on Docker? Older versions of CRI-O, Buildah, Podman?

no, unfortunately it won't be compatible with other versions of the tools that have no support for zstd.

@giuseppe, could you add tests? Besides that, LGTM.

I've added some new tests to copy/copy_test.go. Would you like me to test something in particular?

@rhatdan
Copy link
Member

rhatdan commented Jun 20, 2019

What error will docker report if it tries to use an image created with this compression?

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@giuseppe, please also open a PR for Skopeo to avoid the CI being red for too long.

@giuseppe
Copy link
Member Author

giuseppe commented Aug 1, 2019

@giuseppe, please also open a PR for Skopeo to avoid the CI being red for too long.

I've opened it here: containers/skopeo#671

I'll revendor containers/image and drop the WIP once this is merged

@rhatdan
Copy link
Member

rhatdan commented Aug 1, 2019

@mtrmac Can we merge this? Once this is merged, should we open a PR for Moby to support the new compression algorithm, so we can maintain compatibility.
I would love to have some numbers on this?

Does this increase speed of builds? Pulls? Pushes?

@vrothberg
Copy link
Member

I agree with @rhatdan. This screams for a blog! @giuseppe, are you interested in writing one? Once #563 gets merged (after this one), we can fully support zstd.

@giuseppe
Copy link
Member Author

giuseppe commented Aug 2, 2019

so these are some numbers I've collected:

#!/bin/sh
sudo rm -rf /tmp/nginx
mkdir /tmp/nginx

for i in gzip zstd; do
./skopeo copy --dest-compress --dest-compress-format $i --dest-compress-level=20 docker://docker.io/nginx dir:/tmp/nginx/$i
done > /dev/null

du --si -s /tmp/nginx/*

for i in gzip zstd; do
./skopeo copy --dest-compress-format $i dir:/tmp/nginx/$i docker://quay.io/giuseppe/nginx:$i
done > /dev/null

printf "\npull...\n"

for i in gzip zstd; do
printf "\n%s\n" $i
time sudo unshare -m ./skopeo copy docker://quay.io/giuseppe/nginx:$i "containers-storage:[overlay@/tmp/nginx/storage/$i+/tmp/nginx/storage/run]localhost/nginx:$i" > /dev/null
done

I've got:

51M	/tmp/nginx/gzip
44M	/tmp/nginx/zstd

pull...

gzip

real	0m6.314s
user	0m5.486s
sys	0m0.913s

zstd

real	0m5.254s
user	0m3.893s
sys	0m0.772s

@rhatdan
Copy link
Member

rhatdan commented Aug 3, 2019

Not bad at all.

@rhatdan
Copy link
Member

rhatdan commented Aug 10, 2019

@giuseppe Should tests be passing now?

@giuseppe
Copy link
Member Author

The Skopeo failures are expected as the dependency is not there yet

@vrothberg
Copy link
Member

@giuseppe, can you open a PR at Skopeo using this branch of c/image? This way, we can make sure to not introduce a regression. Once the PR here in c/image is merged, you can update the PR at Skopeo.

@giuseppe
Copy link
Member Author

@giuseppe, can you open a PR at Skopeo using this branch of c/image? This way, we can make sure to not introduce a regression. Once the PR here in c/image is merged, you can update the PR at Skopeo.

The PR for Skopeo is here: containers/skopeo#671

@vrothberg
Copy link
Member

vrothberg commented Aug 12, 2019

I can't see if containers/skopeo#671 passes as Travis builds don't seem to be triggered due to the merge conflicts.

Other than that, LGTM. @mtrmac, @rhatdan, @mrunalp PTAL

@giuseppe
Copy link
Member Author

I can't see if containers/skopeo#671 passes as Travis builds don't seem to be triggered due to the merge conflicts.

I've rebased it now

@rhatdan
Copy link
Member

rhatdan commented Aug 12, 2019

@giuseppe Still failing?
@mtrmac PTAL

zstd is a compression algorithm that has a very fast decoder, while
providing also good compression ratios.  The fast decoder makes it
suitable for container images, as decompressing the tarballs is a very
expensive operation.  This is a first step at supporting zstd as we We
don't yet generate zstd layers.

In my testing, copying the Fedora image from a local dir: repository,
the wall clock time passed from ~8s with gzip compression to ~4.5s
with zstd.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
add the possibility to choose what compression format must be used and
the compression level to use.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
the default size is too small and it has a negative impact on the
compression results.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member Author

@rhatdan tests for the Skopeo PR are passing.

If everything is fine, please let's move this forward

@rhatdan
Copy link
Member

rhatdan commented Aug 21, 2019

LGTM

@rhatdan rhatdan merged commit e003ccf into containers:master Aug 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants