-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block-Level Sequence Producer API #3333
Conversation
Changes since review:
CI is passing except for a single test, which seems unrelated to this PR (and is failing on other open PRs). I'm going to merge! :) |
Congrats on landing this @embg. I am excited about the use of hardware-accelerated matchfinders in zstd. |
External Sequence Producer idea is brilliant. Although processing the source in 128KB chunks is limiting in terms of compression ratio and possibly hardware performance. As an alternative, I did see in contrib/seqBench this
https://github.com/facebook/zstd/blob/dev/contrib/seqBench/seqBench.c#L29 Here, one could call back the external producer instead of ZSTD_generateSequences(), and the caller is not limited to the 128KB block size. I verified this experimentally. I concatenated the same file twice and compressed it using seqBench.c.
|
Hi @adalib, thanks for reaching out! I would love to learn more about why you are interested in this API and the use-cases you are targeting. Feel free to reply here or email me at You are correct that The motivation for block-level offload is that it integrates with existing compression APIs such as
So, the block-level API does support offsets larger than 128KB. External sequence producer functions are passed a The missing component is access to the actual content of the history window. In the future, that will be provided by the This feature is on our long-term roadmap, it simply hasn't been implemented yet. If there is a real-world need for this feature we can add it in the near future.
Your concern is absolutely valid. Breaking the input into 128KB chunks may require more back-and-forth communication with the hardware. This is a trade-off we made to maintain compatibility with existing APIs. Please let me know if you have any further questions! |
This PR introduces an API for external block-level sequence producers to plug into zstd. The user provides a function pointer and state object for the external sequence producer, and zstd will call it to generate sequences for each block. Entropy compression of sequences still remains entirely within the library's internal functions.
Potential applications of the API include hardware-accelerated sequence producers and sequence producers specialized to particular types of data.
There are some subtleties around fallback, sequence validation, memory ownership, etc. Users should read all of the documentation added by this PR to zstd.h before using the API. Note: that documentation has been updated in subsequent PRs, so make sure to look at a recent commit.
An example program is provided (see contrib/externalSequenceProducer) which demonstrates how to use the API with a simple LZ parser.
Note: the original version of this PR used the term "External Matchfinder API". The above summary has been updated to use the new term "Block-Level Sequence Producer API", but the code in this PR still uses old symbol names. Updated symbol names were introduced to the code in #3484.