Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow decompression speed of the FUSE driver #117

Closed
ariel-miculas opened this issue Oct 6, 2023 · 3 comments
Closed

Slow decompression speed of the FUSE driver #117

ariel-miculas opened this issue Oct 6, 2023 · 3 comments
Assignees

Comments

@ariel-miculas
Copy link
Collaborator

I took a root filesystem of 658M and I've built a squashfs image and two puzzlefs images, one compressed and one uncompressed:

puzzlefs build -c ../test-puzzlefs/real_rootfs/barehost/rootfs/ /tmp/puzzlefs-image barehost
puzzlefs build ../test-puzzlefs/real_rootfs/barehost/rootfs/ /tmp/puzzlefs-image-uncompressed barehost

I then mounted all three images (two puzzlefs images and a squashfs image):

$ puzzlefs mount /tmp/puzzlefs-image-uncompressed barehost /tmp/puzzle-uncompressed
$ puzzlefs mount /tmp/puzzlefs-image barehost /tmp/puzzle-compressed
$ squashfuse_ll ~/work/cisco/test-puzzlefs/barehost.sqhs /tmp/squash

$ mount
...
/dev/fuse on /tmp/puzzle-uncompressed type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
/dev/fuse on /tmp/puzzle-compressed type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
squashfuse_ll on /tmp/squash type fuse.squashfuse_ll (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)

Reading every file with fd:

$ time fd -tf . /tmp/squash -x cat > /dev/null
fd -tf . /tmp/squash -x cat > /dev/null  11.07s user 4.74s system 433% cpu 3.645 total

$ time fd -tf . /tmp/puzzle-uncompressed -x cat > /dev/null
fd -tf . /tmp/puzzle-uncompressed -x cat > /dev/null  10.77s user 3.52s system 405% cpu 3.525 total

$ time fd -tf . /tmp/puzzle-compressed -x cat > /dev/null
fd -tf . /tmp/puzzle-compressed -x cat > /dev/null  9.50s user 2.70s system 84% cpu 14.419 total

This could be due to decompressing the same blob multiple times instead of caching the decompressed memory (squashfuse does readahead).

@ariel-miculas ariel-miculas self-assigned this Nov 1, 2023
@ariel-miculas
Copy link
Collaborator Author

It would be worth implementing zstd seekable compression, that way we wouldn't have to decompress the entire blob to serve one file from it, we could decompress only the blocks needed for that file.

@ariel-miculas
Copy link
Collaborator Author

ariel-miculas commented Nov 16, 2023

Results with seekable zstd:

$ time fd -tf . /tmp/squash -x cat > /dev/null
fd -tf . /tmp/squash -x cat > /dev/null  8.04s user 2.72s system 449% cpu 2.393 total

$ time fd -tf . /tmp/erofs -x cat > /dev/null
fd -tf . /tmp/erofs -x cat > /dev/null  8.16s user 2.62s system 465% cpu 2.316 total

$ time fd -tf . /tmp/puzzle-uncompressed -x cat > /dev/null
fd -tf . /tmp/puzzle-uncompressed -x cat > /dev/null  7.88s user 2.43s system 398% cpu 2.590 total

$ time fd -tf . /tmp/puzzle-compressed -x cat > /dev/null
fd -tf . /tmp/puzzle-compressed -x cat > /dev/null  7.77s user 2.37s system 222% cpu 4.560 total

@ariel-miculas
Copy link
Collaborator Author

Comparison between squashfs, erofs, uncompressed puzzlefs, compressed puzzlefs and compressed puzzlefs with zstd seekable support with different compression frame sizes

Setup

I'm using an image called barehost which is an Ubuntu distribution:

$ du -sh ~/work/cisco/test-puzzlefs/real_rootfs/barehost/rootfs
658M    /home/amiculas/work/cisco/test-puzzlefs/real_rootfs/barehost/rootfs

Building the images:

# squashfs
mksquashfs real_rootfs/barehost/rootfs barehost.sqhs
# erofs
~/work/erofs-utils/mkfs/mkfs.erofs ~/work/cisco/test-puzzlefs/barehost.erofs ~/work/cisco/test-puzzlefs/real_rootfs/barehost/rootfs
# uncompressed puzzlefs
target/release/puzzlefs build ../test-puzzlefs/real_rootfs/barehost/rootfs/ /tmp/puzzlefs-image-uncompressed barehost                                                              │
# unseekable compressed puzzlefs
./master-puzzlefs build -c ../test-puzzlefs/real_rootfs/barehost/rootfs /tmp/puzzlefs-unseekable-image barehost
# seekable compressed puzzlefs
target/release/puzzlefs build -c ../test-puzzlefs/real_rootfs/barehost/rootfs /tmp/puzzlefs-seekable-image barehost

Mounting the images:

# squashfs
squashfuse_ll ~/work/cisco/test-puzzlefs/barehost.sqhs /tmp/squash
# erofs
~/work/erofs-utils/fuse/erofsfuse ~/work/cisco/test-puzzlefs/barehost.erofs /tmp/erofs
# uncompressed puzzlefs
target/release/puzzlefs mount /tmp/puzzlefs-image-uncompressed barehost /tmp/puzzle-uncompressed
# unseekable compressed puzzlefs
./master-puzzlefs mount /tmp/puzzlefs-unseekable-image barehost /tmp/puzzle-unseekable
# seekable compressed puzzlefs
target/release/puzzlefs mount /tmp/puzzlefs-seekable-image barehost /tmp/puzzle-seekable

Mounts:

erofsfuse on /tmp/erofs type fuse.erofsfuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
squashfuse_ll on /tmp/squash type fuse.squashfuse_ll (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
/dev/fuse on /tmp/puzzle-uncompressed type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
/dev/fuse on /tmp/puzzle-unseekable type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
/dev/fuse on /tmp/puzzle-seekable type fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)

Results

Squashfs

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/squash -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/squash -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     11.105 s ±  0.223 s    [User: 6.798 s, System: 1.737 s]
  Range (min … max):   10.607 s … 11.410 s    10 runs

Erofs

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/erofs -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/erofs -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     10.133 s ±  0.065 s    [User: 6.612 s, System: 1.572 s]
  Range (min … max):    9.971 s … 10.231 s    10 runs

uncompressed puzzlefs

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-uncompressed -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-uncompressed -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):      9.934 s ±  0.071 s    [User: 6.581 s, System: 1.613 s]
  Range (min … max):    9.850 s … 10.038 s    10 runs

unseekable compressed puzzlefs

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-unseekable -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-unseekable -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     21.396 s ±  0.414 s    [User: 6.771 s, System: 1.715 s]
  Range (min … max):   20.615 s … 21.639 s    10 runs

seekable compressed puzzlefs (1KB frame size)

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     12.475 s ±  0.067 s    [User: 6.733 s, System: 1.700 s]
  Range (min … max):   12.410 s … 12.589 s    10 runs

seekable compressed puzzlefs (2KB frame size)

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     12.056 s ±  0.083 s    [User: 6.700 s, System: 1.671 s]
  Range (min … max):   11.941 s … 12.169 s    10 runs

seekable compressed puzzlefs (4KB frame size)

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     11.784 s ±  0.046 s    [User: 6.692 s, System: 1.681 s]
  Range (min … max):   11.678 s … 11.825 s    10 runs

seekable compressed puzzlefs (8KB frame size)

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     11.657 s ±  0.038 s    [User: 6.676 s, System: 1.664 s]
  Range (min … max):   11.616 s … 11.722 s    10 runs

seekable compressed puzzlefs (16KB frame size)

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches'  "find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null"
Benchmark 1: find /tmp/puzzle-seekable -type f -exec cat {} \; > /dev/null
  Time (mean ± σ):     11.662 s ±  0.076 s    [User: 6.691 s, System: 1.668 s]
  Range (min … max):   11.533 s … 11.818 s    10 runs

Conclusion

It seems 4KB is a good choice for the zstd frame size, considering the above
results and also keeping in mind that the average chunk size produced by
FastCDC with our current parameters is 80KB. Seekable compression reduces the
mean reading time of the entire image from ~21.4s to ~11.8s, achieving similar
performance to squashfuse (11.1s). This disregards any parallel operations on
the filesystem.
The image increases from 259MB for compression without seekable support to 289MB
for compression with seekable support, for an image of size 658MB.

$ du -sh /tmp/puzzlefs-unseekable-image
259M    /tmp/puzzlefs-unseekable-image
/tmp
$ du -sh /tmp/puzzlefs-seekable-image
289M    /tmp/puzzlefs-seekable-image

Besides the overhead of the seekable frames, because each frame is compressed individually, the compression ratio probably goes down.

ariel-miculas added a commit to ariel-miculas/puzzlefs that referenced this issue Dec 14, 2023
Fixes project-machine#117

Signed-off-by: Ariel Miculas <amiculas@cisco.com>
ariel-miculas added a commit to ariel-miculas/puzzlefs that referenced this issue Dec 14, 2023
Fixes project-machine#117

Signed-off-by: Ariel Miculas <amiculas@cisco.com>
@hallyn hallyn closed this as completed in 9df93f1 Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant