-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix root cause of #3416 #3419
fix root cause of #3416 #3419
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a test case that would trigger this bug?
The fix LGTM
ZSTD_memcpy(nextHuf, prevHuf, sizeof(*prevHuf)); | ||
return ZSTD_compressRleLiteralsBlock(dst, dstCapacity, src, srcSize); | ||
} | ||
if ((srcSize >= 8) || allBytesIdentical(src, srcSize)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment as to why this check is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This is a really hard test to define. |
Ok, it'd be great if we had it, but if it is too hard, we have the fuzzers, and we know that they will quickly catch any regressions. |
A minor change in 5434de0 changed a `<=` into a `<`, and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress (previous limit was effectively 7 literals). This is not in itself a problem, as the threshold is merely an heuristic, but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit. This bug would make the literal compressor believes that all literals are the same symbol, but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions, this outcome could be false, resulting in data corruption. Replaced the blind heuristic by an actual test for all limit cases, so that even if the threshold is changed again in the future, the detection of RLE mode will remain reliable.
as requested by @terrelln
A minor update in 5434de0 changed a
<=
into a<
, and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress (previous limit was effectively 7 literals).This is not in itself a problem, as the threshold is merely an heuristic, but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit. This bug would make the literal compressor believes that all literals are the same symbol, but for the exact case where
nbLiterals==6
, plus a pretty wild combination of several exceptional conditions, this outcome could be false, resulting in data corruption.Replaced the blind heuristic by an actual test for all limit cases. Even if the threshold is changed again in the future, the detection of Repeated Literal mode will remain reliable.
fix #3416