Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please support seekable zstd with jump table #266

Open
joshtriplett opened this issue Apr 10, 2024 · 6 comments
Open

Please support seekable zstd with jump table #266

joshtriplett opened this issue Apr 10, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@joshtriplett
Copy link
Contributor

There's a seekable zstd format, which adds a jump table to be able to start decompressing from the start of any chunk marked at compression time. I'd love to have support for producing this format (with a way to mark the start of a chunk) and consuming this format (parsing the jump table and starting a decoder from the start of a chunk).

@NobodyXu
Copy link
Collaborator

Does crate zstd support it?

If it is supported, then I would happily review and merge in such PR, along with new interface for doing so.

@robjtede robjtede added the enhancement New feature or request label Apr 10, 2024
@joshtriplett
Copy link
Contributor Author

@NobodyXu I don't think it does yet, no. There's a separate zstd-seekable crate with support, but ideally the support would get merged into zstd-rs.

@NobodyXu
Copy link
Collaborator

blocked on gyscos/zstd-rs#272

@cgwalters
Copy link

One thing not fully clear to me is that over in #271 (comment) I wrote a test program which compares async_compression vs the existing zstd crate, and it's just async_compression that fails to decompress archives with skippable frames. My intuition is this has something to do with how it's easier to handle skippable frames in a direct copy model vs what async_compression needs to do in having the decompressor write to partial buffers, where there's more need for state tracking?

Tangentially related, it looks like the zstd-seekable codebase imported the zstd-rs code without preserving git history, so it's a bit hard to figure out what's going on there.

@Nemo157
Copy link
Member

Nemo157 commented Apr 30, 2024

I wrote a test program which compares async_compression vs the existing zstd crate, and it's just async_compression that fails to decompress archives with skippable frames.

By default async-compression only decodes a single frame, if you want it to decode more you'll need to set multiple_members(true). I'm surprised it worked with the zstd crate if it doesn't support skippable frames though, my expectation would have been getting an "unknown magic number" error when it gets to the frame.

@Fuuzetsu
Copy link

FWIW there's some existing code in zstd-seekable-s3 that uses zstd-seekable crate and makes it into a stream; see https://github.com/Fuuzetsu/zstd-seekable-s3/blob/master/src/compress.rs for example

In hindsight, this should have lived in a crate on its own because it's useful regardless of the S3 part...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants