-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzz large offsets through sequence compression api #3447
Fuzz large offsets through sequence compression api #3447
Conversation
FILE* dictFile; | ||
ZSTD_compressionParameters cParams; | ||
|
||
/* Generate a large dictionary file and mmap to buffer */ | ||
generateDictFile(ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, producer); | ||
dictFile = fopen(ZSTD_FUZZ_DICT_FILE, "r"); | ||
dictBuffer = mmap(NULL, ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, PROT_READ, MAP_PRIVATE, fileno(dictFile), 0); | ||
FUZZ_ASSERT(dictBuffer); | ||
fclose(dictFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daniellerozenblit lets simplify this a little bit and just calloc()
the dictBuffer
.
FILE* dictFile; | |
ZSTD_compressionParameters cParams; | |
/* Generate a large dictionary file and mmap to buffer */ | |
generateDictFile(ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, producer); | |
dictFile = fopen(ZSTD_FUZZ_DICT_FILE, "r"); | |
dictBuffer = mmap(NULL, ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, PROT_READ, MAP_PRIVATE, fileno(dictFile), 0); | |
FUZZ_ASSERT(dictBuffer); | |
fclose(dictFile); | |
dictBuffer = calloc(ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, 1); |
This should drastically reduce the memory usage, and should make the initialization basically free, as long as we never write to this memory (which we don't).
Any generality we lose through this simplification should be worth it for the simplicity. And we can always extend it later if we need to.
…ng a random buffer
260dc7e
to
7fc00c1
Compare
eeb6bd5
to
fe06ffa
Compare
@@ -6327,7 +6327,7 @@ ZSTD_copySequencesToSeqStoreNoBlockDelim(ZSTD_CCtx* cctx, ZSTD_sequencePosition* | |||
/* Move to the next sequence */ | |||
endPosInSequence -= currSeq.litLength + currSeq.matchLength; | |||
startPosInSequence = 0; | |||
idx++; | |||
idx++; /* Next Sequence */ | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this check is right when the else
branch is taken and we split the sequence (we set finalMatchSplit = 1;
on line 6352. Since we don't increment idx
but still store a sequence.
You could simplify this a bit by moving the idx++
to the bottom of the loop and doing:
if (!finalMatchSplit)
++idx;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thanks, great catch!
fe06ffa
to
7d600c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
This PR introduces a few changes to the sequence compression api fuzzer in order to better test cases with large offsets without generating many MB of input data.
256KB
with each fuzzer call, we now generate a huge global dictionary of size1 << ZSTD_WINDOWLOG_MAX_32
that is reused between calls. The dictionary is calloc'd in order to fit the memory constraints of the oss-fuzz environment.ZSTD_WINDOWLOG_MAX
, rather thanZSTD_WINDOWLOG_MAX_32
.This PR also fixes a bug exposed by #3439 and ensures that the
seqStore
bounds check is accurate forZSTD_copySequencesToSeqStoreNoBlockDelim()
.idx
after checking whether or not we have room to store an additional sequence.