Fuzz large offsets through sequence compression api #3447

daniellerozenblit · 2023-01-23T16:21:35Z

This PR introduces a few changes to the sequence compression api fuzzer in order to better test cases with large offsets without generating many MB of input data.

Rather than generating a new dictionary of max size 256KB with each fuzzer call, we now generate a huge global dictionary of size 1 << ZSTD_WINDOWLOG_MAX_32 that is reused between calls. The dictionary is calloc'd in order to fit the memory constraints of the oss-fuzz environment.
We now use a maximum window size of ZSTD_WINDOWLOG_MAX, rather than ZSTD_WINDOWLOG_MAX_32.
We now generate the literalBuffer using a randomly generated seed, rather than using the same seed (0) each time.

This PR also fixes a bug exposed by #3439 and ensures that the seqStore bounds check is accurate forZSTD_copySequencesToSeqStoreNoBlockDelim().

We now increment the sequence idx after checking whether or not we have room to store an additional sequence.

terrelln · 2023-01-23T18:23:08Z

tests/fuzz/sequence_compression_api.c

+        FILE* dictFile;
+        ZSTD_compressionParameters cParams;
+
+        /* Generate a large dictionary file and mmap to buffer */
+        generateDictFile(ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, producer);
+        dictFile = fopen(ZSTD_FUZZ_DICT_FILE, "r");
+        dictBuffer = mmap(NULL, ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, PROT_READ, MAP_PRIVATE, fileno(dictFile), 0);
+        FUZZ_ASSERT(dictBuffer);
+        fclose(dictFile);


@daniellerozenblit lets simplify this a little bit and just calloc() the dictBuffer.

Suggested change

FILE* dictFile;

ZSTD_compressionParameters cParams;

/* Generate a large dictionary file and mmap to buffer */

generateDictFile(ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, producer);

dictFile = fopen(ZSTD_FUZZ_DICT_FILE, "r");

dictBuffer = mmap(NULL, ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, PROT_READ, MAP_PRIVATE, fileno(dictFile), 0);

FUZZ_ASSERT(dictBuffer);

fclose(dictFile);

dictBuffer = calloc(ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, 1);

This should drastically reduce the memory usage, and should make the initialization basically free, as long as we never write to this memory (which we don't).

Any generality we lose through this simplification should be worth it for the simplicity. And we can always extend it later if we need to.

…ng a random buffer

for testing

terrelln · 2023-01-23T22:20:06Z

lib/compress/zstd_compress.c

@@ -6327,7 +6327,7 @@ ZSTD_copySequencesToSeqStoreNoBlockDelim(ZSTD_CCtx* cctx, ZSTD_sequencePosition*
            /* Move to the next sequence */
            endPosInSequence -= currSeq.litLength + currSeq.matchLength;
            startPosInSequence = 0;
-            idx++;
+            idx++; /* Next Sequence */
        } else {


I don't think this check is right when the else branch is taken and we split the sequence (we set finalMatchSplit = 1; on line 6352. Since we don't increment idx but still store a sequence.

You could simplify this a bit by moving the idx++ to the bottom of the loop and doing:

if (!finalMatchSplit) ++idx;

Oh thanks, great catch!

terrelln

Awesome!

daniellerozenblit added 2 commits January 23, 2023 07:55

modify sequence compression api fuzzer

638d502

merge dev

f75afb6

facebook-github-bot added the CLA Signed label Jan 23, 2023

terrelln reviewed Jan 23, 2023

View reviewed changes

calloc dictionary in sequence compression fuzzer rather than generati…

7fc00c1

…ng a random buffer

daniellerozenblit force-pushed the fuzz-sequence-compression branch from 260dc7e to 7fc00c1 Compare January 23, 2023 18:42

Merge branch 'dev' into fuzz-sequence-compression

0a91b31

for testing

daniellerozenblit force-pushed the fuzz-sequence-compression branch from eeb6bd5 to fe06ffa Compare January 23, 2023 21:44

terrelln reviewed Jan 23, 2023

View reviewed changes

fix bound check for ZSTD_copySequencesToSeqStoreNoBlockDelim()

7d600c6

daniellerozenblit force-pushed the fuzz-sequence-compression branch from fe06ffa to 7d600c6 Compare January 24, 2023 14:41

daniellerozenblit marked this pull request as ready for review January 24, 2023 16:29

daniellerozenblit marked this pull request as draft January 24, 2023 16:43

daniellerozenblit marked this pull request as ready for review January 24, 2023 16:59

daniellerozenblit requested a review from terrelln January 24, 2023 22:55

terrelln approved these changes Jan 25, 2023

View reviewed changes

daniellerozenblit merged commit f3255bf into facebook:dev Jan 25, 2023

daniellerozenblit deleted the fuzz-sequence-compression branch March 8, 2023 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzz large offsets through sequence compression api #3447

Fuzz large offsets through sequence compression api #3447

daniellerozenblit commented Jan 23, 2023 •

edited

Loading

terrelln Jan 23, 2023 •

edited

Loading

terrelln Jan 23, 2023

daniellerozenblit Jan 24, 2023

terrelln left a comment

Fuzz large offsets through sequence compression api #3447

Fuzz large offsets through sequence compression api #3447

Conversation

daniellerozenblit commented Jan 23, 2023 • edited Loading

terrelln Jan 23, 2023 • edited Loading

Choose a reason for hiding this comment

terrelln Jan 23, 2023

Choose a reason for hiding this comment

daniellerozenblit Jan 24, 2023

Choose a reason for hiding this comment

terrelln left a comment

Choose a reason for hiding this comment

daniellerozenblit commented Jan 23, 2023 •

edited

Loading

terrelln Jan 23, 2023 •

edited

Loading