Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace gzip with zstd for speed + size benefits #7950

Merged
merged 1 commit into from
Sep 1, 2023

Conversation

jeffwidman
Copy link
Member

@jeffwidman jeffwidman commented Sep 1, 2023

Replace gzip with zstd for both speed + size benefits.

I added zstd to the base image rather than the python-specific image because:

  1. it's only 1,695 KB
  2. We'll likely use it in other places down the road, such as unpacking deps for jobs
  3. The Python image would have required additional steps like apt-get update
  4. Originally I was afraid that installing zstd would take more time than is saved with the compression speedup--and that time would apply to all images, but looking at the raw CI logs and grep'ing for zst shows that it takes < 0.5s to install it.

This results in faster compression + space savings:

root@ca684869d4af:/usr/local/.pyenv/versions# time tar -acf 3.11.5.tar.gz 3.11.5

real	0m5.564s
user	0m5.458s
sys	0m0.626s
root@ca684869d4af:/usr/local/.pyenv/versions# time tar -acf 3.11.5.tar.zst 3.11.5

real	0m0.850s
user	0m1.003s
sys	0m0.354s
root@ca684869d4af:/usr/local/.pyenv/versions# ls -lah
-rw-r--r-- 1 root       root        32M Sep  1 17:19 3.11.5.tar.gz
-rw-r--r-- 1 root       root        30M Sep  1 17:20 3.11.5.tar.zst

As well as faster decompression too:

root@ca684869d4af:/usr/local/.pyenv/versions# ls -lah
-rw-r--r-- 1 root       root        32M Sep  1 17:38 3.11.5.tar.gz
-rw-r--r-- 1 root       root        30M Sep  1 17:36 3.11.5.tar.zst
root@ca684869d4af:/usr/local/.pyenv/versions# time tar -axf $PYENV_ROOT/versions/3.11.5.tar.gz -C $PYENV_ROOT/versions

real	0m1.113s
user	0m0.986s
sys	0m0.811s
root@ca684869d4af:/usr/local/.pyenv/versions# rm -rf 3.11.5
root@ca684869d4af:/usr/local/.pyenv/versions# time tar -axf $PYENV_ROOT/versions/3.11.5.tar.zst -C $PYENV_ROOT/versions

real	0m0.774s
user	0m0.501s
sys	0m0.695s

The -a flag to tar tells it to autoselect the compression format based on the file extension, so it's flipping between gzip and zstd.

@jeffwidman jeffwidman requested a review from a team as a code owner September 1, 2023 17:55
@jeffwidman
Copy link
Member Author

jeffwidman commented Sep 1, 2023

I have been meaning for a while to experiment with different compression formats, so took the time to play around with it for a few mins this morning for fun.

@Nishnha this is likely a better solution to the pigz problem in the runner, given how much time is spent in compression, I suspect it'd result in noticeable speed + compression size improvements in that deploy pipeline.

@jeffwidman jeffwidman marked this pull request as draft September 1, 2023 18:05
Replace `gzip` with `zstd` for both speed + size benefits.

I added `zstd` to the base image rather than the python-specific image
because:
1. it's only 1,695 KB
2. We'll likely use it in other places down the road, such as unpacking deps for jobs
3. The Python image would have required additional steps like `apt-get
   update`

This results in faster compression + space savings:
```shell
root@ca684869d4af:/usr/local/.pyenv/versions# time tar -acf 3.11.5.tar.gz 3.11.5

real	0m5.564s
user	0m5.458s
sys	0m0.626s
root@ca684869d4af:/usr/local/.pyenv/versions# time tar -acf 3.11.5.tar.zst 3.11.5

real	0m0.850s
user	0m1.003s
sys	0m0.354s
root@ca684869d4af:/usr/local/.pyenv/versions# ls -lah
-rw-r--r-- 1 root       root        32M Sep  1 17:19 3.11.5.tar.gz
-rw-r--r-- 1 root       root        30M Sep  1 17:20 3.11.5.tar.zst
```

As well as faster decompression too:
```shell
root@ca684869d4af:/usr/local/.pyenv/versions# ls -lah
-rw-r--r-- 1 root       root        32M Sep  1 17:38 3.11.5.tar.gz
-rw-r--r-- 1 root       root        30M Sep  1 17:36 3.11.5.tar.zst
root@ca684869d4af:/usr/local/.pyenv/versions# time tar -axf $PYENV_ROOT/versions/3.11.5.tar.gz -C $PYENV_ROOT/versions

real	0m1.113s
user	0m0.986s
sys	0m0.811s
root@ca684869d4af:/usr/local/.pyenv/versions# rm -rf 3.11.5
root@ca684869d4af:/usr/local/.pyenv/versions# time tar -axf $PYENV_ROOT/versions/3.11.5.tar.zst -C $PYENV_ROOT/versions

real	0m0.774s
user	0m0.501s
sys	0m0.695s
```

The `-a` flag to `tar` tells it to autoselect the compression format
based on the file extension, so it's flipping between `gzip` and `zstd`.
@jeffwidman jeffwidman marked this pull request as ready for review September 1, 2023 18:07
@jakecoffman jakecoffman merged commit 853aada into dependabot:main Sep 1, 2023
96 checks passed
@jeffwidman jeffwidman deleted the replace-gzip-with-zstd branch September 1, 2023 21:07
brettfo pushed a commit to brettfo/dependabot-core that referenced this pull request Oct 11, 2023
Replace `gzip` with `zstd` for both speed + size benefits.

I added `zstd` to the base image rather than the python-specific image
because:
1. it's only 1,695 KB
2. We'll likely use it in other places down the road, such as unpacking deps for jobs
3. The Python image would have required additional steps like `apt-get
   update`

This results in faster compression + space savings:
```shell
root@ca684869d4af:/usr/local/.pyenv/versions# time tar -acf 3.11.5.tar.gz 3.11.5

real	0m5.564s
user	0m5.458s
sys	0m0.626s
root@ca684869d4af:/usr/local/.pyenv/versions# time tar -acf 3.11.5.tar.zst 3.11.5

real	0m0.850s
user	0m1.003s
sys	0m0.354s
root@ca684869d4af:/usr/local/.pyenv/versions# ls -lah
-rw-r--r-- 1 root       root        32M Sep  1 17:19 3.11.5.tar.gz
-rw-r--r-- 1 root       root        30M Sep  1 17:20 3.11.5.tar.zst
```

As well as faster decompression too:
```shell
root@ca684869d4af:/usr/local/.pyenv/versions# ls -lah
-rw-r--r-- 1 root       root        32M Sep  1 17:38 3.11.5.tar.gz
-rw-r--r-- 1 root       root        30M Sep  1 17:36 3.11.5.tar.zst
root@ca684869d4af:/usr/local/.pyenv/versions# time tar -axf $PYENV_ROOT/versions/3.11.5.tar.gz -C $PYENV_ROOT/versions

real	0m1.113s
user	0m0.986s
sys	0m0.811s
root@ca684869d4af:/usr/local/.pyenv/versions# rm -rf 3.11.5
root@ca684869d4af:/usr/local/.pyenv/versions# time tar -axf $PYENV_ROOT/versions/3.11.5.tar.zst -C $PYENV_ROOT/versions

real	0m0.774s
user	0m0.501s
sys	0m0.695s
```

The `-a` flag to `tar` tells it to autoselect the compression format
based on the file extension, so it's flipping between `gzip` and `zstd`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants