Block-Level Sequence Producer API #3333

embg · 2022-12-07T17:48:39Z

This PR introduces an API for external block-level sequence producers to plug into zstd. The user provides a function pointer and state object for the external sequence producer, and zstd will call it to generate sequences for each block. Entropy compression of sequences still remains entirely within the library's internal functions.

Potential applications of the API include hardware-accelerated sequence producers and sequence producers specialized to particular types of data.

There are some subtleties around fallback, sequence validation, memory ownership, etc. Users should read all of the documentation added by this PR to zstd.h before using the API. Note: that documentation has been updated in subsequent PRs, so make sure to look at a recent commit.

An example program is provided (see contrib/externalSequenceProducer) which demonstrates how to use the API with a simple LZ parser.

Note: the original version of this PR used the term "External Matchfinder API". The above summary has been updated to use the new term "Block-Level Sequence Producer API", but the code in this PR still uses old symbol names. Updated symbol names were introduced to the code in #3484.

contrib/externalMatchfinder/main.c

lib/compress/zstd_compress.c

contrib/externalMatchfinder/main.c

contrib/externalMatchfinder/matchfinder.c

tests/zstreamtest.c

lib/compress/zstd_compress.c

lib/zstd.h

embg · 2022-12-28T21:45:04Z

Changes since review:

Rebase onto upstream/dev
Additional docs on API limitations: 8052b10
Fix minor @Cyan4973 nits: 1e60543
Refactor maxNbSeq calculation into a helper function: 49cd2e8
Fix copyright: 241f2a7

CI is passing except for a single test, which seems unrelated to this PR (and is failing on other open PRs).

I'm going to merge! :)

nadavrot · 2022-12-29T18:42:22Z

Congrats on landing this @embg. I am excited about the use of hardware-accelerated matchfinders in zstd.

abalib · 2023-10-18T22:28:16Z

External Sequence Producer idea is brilliant. Although processing the source in 128KB chunks is limiting in terms of compression ratio and possibly hardware performance.

As an alternative, I did see in contrib/seqBench this

ZSTD_generateSequences(zc, seqs, seqsSize, inBuf, inBufSize);
ZSTD_CCtx_setParameter(zc, ZSTD_c_blockDelimiters, ZSTD_sf_explicitBlockDelimiters);
size_t outBufSize = ZSTD_compressSequences(zc, outBuf, inBufSize, seqs, seqsSize, inBuf, inBufSize);

https://github.com/facebook/zstd/blob/dev/contrib/seqBench/seqBench.c#L29

Here, one could call back the external producer instead of ZSTD_generateSequences(), and the caller is not limited to the 128KB block size. I verified this experimentally. I concatenated the same file twice and compressed it using seqBench.c.
The last 6 lines show multiple blocks present and offsets reaching as far back as 200,000 > 128K

zstd/contrib/seqBench$ head -c 200000 junk >> junk1
zstd/contrib/seqBench$ head -c 200000 junk >> junk1
 ./seqBench junk1 
LL      ML      OFFS    REP
9       91      1       1
14      13      8       0
...
0       22      19457   0
1       4       19457   1
0       13      572     0
2       62144   200000  0
0       0       0       0
1       131071  200000  1
0       0       0       0
1       6783    200000  1
0       0       0       0

embg · 2023-10-19T01:00:57Z

Hi @adalib, thanks for reaching out! I would love to learn more about why you are interested in this API and the use-cases you are targeting. Feel free to reply here or email me at [my github username]@meta.com.

You are correct that ZSTD_compressSequences() can be used with externally-generated sequences. This is the approach taken by Intel's QATzip.

The motivation for block-level offload is that it integrates with existing compression APIs such as ZSTD_compress2() and ZSTD_compressStream2(). This is particularly important for streaming compression, which is impossible with ZSTD_compressSequences(). But even for small compressions which don't require streaming, maintaining compatibility with the common APIs used in production is an important feature. That's why we added the block-level API, which is used by Intel's zstd plugin.

processing the source in 128KB chunks is limiting in terms of compression ratio

So, the block-level API does support offsets larger than 128KB. External sequence producer functions are passed a windowSize parameter and are allowed to produce any offset which is compatible with that history window. The precise requirements on sequences returned by an external callback are given here.

The missing component is access to the actual content of the history window. In the future, that will be provided by the dict and dictSize parameters. Currently we pass in NULL as the dictionary buffer, but we could instead provide access to the previous windowSize bytes of history.

This feature is on our long-term roadmap, it simply hasn't been implemented yet. If there is a real-world need for this feature we can add it in the near future.

...and possibly hardware performance.

Your concern is absolutely valid. Breaking the input into 128KB chunks may require more back-and-forth communication with the hardware. This is a trade-off we made to maintain compatibility with existing APIs. ZSTD_compressSequences() is an option to potentially use hardware more efficiently, but that API also has significant downsides (as discussed above).

Please let me know if you have any further questions!

facebook-github-bot added the CLA Signed label Dec 7, 2022

embg requested a review from Cyan4973 December 7, 2022 17:49

Cyan4973 mentioned this pull request Dec 7, 2022

Offload API embg/zstd#35

Closed

Cyan4973 reviewed Dec 8, 2022

View reviewed changes

contrib/externalMatchfinder/main.c Show resolved Hide resolved

Cyan4973 reviewed Dec 8, 2022

View reviewed changes

contrib/externalMatchfinder/main.c Outdated Show resolved Hide resolved

GarenJian-Intel reviewed Dec 9, 2022

View reviewed changes

lib/compress/zstd_compress.c Show resolved Hide resolved

Cyan4973 reviewed Dec 12, 2022

View reviewed changes

contrib/externalMatchfinder/main.c Outdated Show resolved Hide resolved

Cyan4973 reviewed Dec 12, 2022

View reviewed changes

contrib/externalMatchfinder/main.c Show resolved Hide resolved

Cyan4973 reviewed Dec 12, 2022

View reviewed changes

contrib/externalMatchfinder/main.c Outdated Show resolved Hide resolved