-
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json:parse_bjdata_fuzzer reaches assertion #3475
Comments
this is embarrassing. I assume this keeps triggering fuzzer error is because I do not fully understand the exception handling in the code. right now, if using json.cpp with #3461 patch, I see the above code, as well as the adapted fuzzer code in #3461, output the below message
upon reading the code, I do not think it should be triggered - here is the part throwing this error
because there is no can you explain to me what |
|
never mind, I was running a wrong fuzzer code. with the above code, the error is
tracing it now. |
ok, essentially this is the same bug that triggered #3461 - it seems that in both cases, the if I replace I am still trying to understand why the |
Unfortunately, I currently can't get AFL to work locally. I think these bugs can be triggered by letting it run for a few minutes. |
I've just installed AFL, if you give me instructions or pointers on what to do next, I can give it a shot. Edit: I found the make targets, We should add one for BJData. |
I haven't tried this in a while. LibFuzzer also worked once - I should add a proper documentation for this... (added #3477) |
Fuzzer is running. Will report back when it finds something. Edit: I switched to AFL++ and the fuzzer has now been running for almost 5h with only one crash due to a stack overflow. |
I added documentation how to use the fuzzers: #3478 Feedback welcome! |
FYI: LibFuzzer immediately detects similar errors in the BJData parser:
|
Indeed. It fails within seconds. Now I'm wondering if I made a mistake with AFL++... |
You should be able to pass the failing example from libFuzzer to the AFL binary and see how it reacts. I only played around a bit with afl++ in Docker just long enough to write the documentation. I did not wait for an error either. |
I've tried again using the redqueen mutator (also in persistent mode, which is significantly faster) and I'm getting thousands of crashes in a few minutes (6 unique ones so far). |
Can the persistent mode be used together with libFuzzer or do we still need a wrapper? |
If by wrapper you're referring to |
@fangq Triggered by this input: std::vector<std::uint8_t> vec1{0x5b, 0x5b, 0x5d, 0x5b, 0x24, 0x44, 0x23, 0x69, 0xe5}; Near the end of the snippet there's a call to bool get_ubjson_array()
{
std::pair<std::size_t, char_int_type> size_and_type;
if (JSON_HEDLEY_UNLIKELY(!get_ubjson_size_type(size_and_type)))
{
return false;
}
// detect and encode bjdata ndarray as an object in JData annotated array format (https://github.com/NeuroJSON/jdata):
// {"_ArrayType_" : "typeid", "_ArraySize_" : [n1, n2, ...], "_ArrayData_" : [v1, v2, ...]}
if (input_format == input_format_t::bjdata && size_and_type.first != string_t::npos && size_and_type.first >= (1ull << (sizeof(std::size_t) * 8 - 1)))
{
std::map<char_int_type, string_t> bjdtype = {{'U', "uint8"}, {'i', "int8"}, {'u', "uint16"}, {'I', "int16"},
{'m', "uint32"}, {'l', "int32"}, {'M', "uint64"}, {'L', "int64"}, {'d', "single"}, {'D', "double"}, {'C', "char"}
};
string_t key = "_ArrayType_";
if (JSON_HEDLEY_UNLIKELY(bjdtype.count(size_and_type.second) == 0 || !sax->key(key) || !sax->string(bjdtype[size_and_type.second]) ))
{
return false;
} Edit: string_t key = "_ArrayType_";
if (JSON_HEDLEY_UNLIKELY(bjdtype.count(size_and_type.second) == 0 || !sax->start_object(2) || !sax->key(key) || !sax->string(bjdtype[size_and_type.second]) ))
{
return false;
} |
@falbrechtskirchinger, thanks for troubleshooting this. when an ND-array construct is parses, the
3 keys are expected: The caller
_ArraySize_ ), and then, just keep writing the remaining two keys.
I suppose using the highest bit of |
@fangq Ah, thanks for clarifying. That makes sense. Otherwise, I'm sure a missing |
I think I know what happened - in all of these cases, the length following looks like I should find another way to indicate that I am in the middle of decoding an ND array, or throw an error in-place when the length in a non-NDarray construct can I query if I am inside an object using |
I was just going to write that. :-) Unfortunately, the SAX interface is one-way only. You cannot query it. I think you might want to create a struct to replace the |
I have the feeling there is still a lot in flux for BJData - in particular since the written format is not yet fully optimized. Any idea how long it would take to complete this? |
@nlohmann and @falbrechtskirchinger, the fuzzer errors should now be fixed with the above commit. All CI tests have passed. |
I noticed that not just bjdata output file has room to compress, UBJSON output can also generate smaller sizes if the for example
because |
As promised, here's my list of test inputs that are triggering assertion failures: Edit: They are indeed all fixed by #3479. Including the one reported by Niels. :-) // (1) Niels' input from the issue description
std::vector<std::uint8_t> vec1{0x5b, 0x23, 0x49, 0x20, 0xff};
// (2) fixed by #3479
std::vector<std::uint8_t> vec1{0x5b, 0x5b, 0x5d, 0x5b, 0x24, 0x44, 0x23, 0x69, 0xe5};
// can be simplified to
std::vector<std::uint8_t> vec1{0x5b, 0x24, 0x44, 0x23, 0x69, 0xe5};
// (3) fixed by #3479
std::vector<std::uint8_t> vec1{0x7b, 0x23, 0x4d, 0x76, 0xdd, 0x5d, 0x48, 0x5b, 0x00, 0x49, 0x83, 0x83, 0x49, 0x7b, 0xdd, 0x5b, 0x49, 0x5d, 0x6c, 0x22, 0x5b, 0x4d, 0x83, 0xdd, 0x5d, 0x5d, 0x6c, 0x22, 0x5b, 0x00, 0x48, 0x5b};
// (4) fixed by #3479
std::vector<std::uint8_t> vec1{0x7b, 0x23, 0x4d, 0x83, 0x2c, 0x5b, 0x4d, 0x83, 0xff, 0xff, 0xff, 0xff};
// (5) fixed by #3479
std::vector<std::uint8_t> vec1{0x5b, 0x24, 0x49, 0x23, 0x69, 0xd9, 0x7b, 0x71, 0x65, 0x38, 0x49, 0x62, 0x00, 0x04, 0x7b, 0x49, 0x00, 0x62, 0x94, 0x61, 0x2d, 0x56, 0x00, 0x69, 0xd0, 0x2b, 0xd4, 0xca, 0xfc, 0x7f, 0x00, 0x00, 0x00, 0x00, 0x00, 0x1f, 0xff, 0xfb, 0x00, 0x00, 0x00, 0x00, 0xde, 0xde, 0xfa, 0xde, 0xde, 0xde, 0xde, 0xc7, 0xde, 0xde, 0xde, 0xde, 0xde, 0xde, 0xde, 0xde, 0xde, 0xde, 0xde, 0xde, 0xde, 0xde, 0xde, 0x00, 0x49, 0x01, 0x00, 0xf7, 0xff, 0xff, 0xff, 0x00, 0x48, 0x00, 0x00, 0x00, 0x00, 0xff, 0x7f, 0x5b, 0x74, 0x31, 0x39, 0x5f, 0x53, 0x70, 0x5f, 0x6d, 0x61, 0x6b, 0x65, 0x5f, 0x73, 0x68, 0x67, 0x5f, 0x6b, 0x53, 0x74, 0x31, 0x43, 0x69, 0x61, 0x87, 0x65, 0xff, 0xff, 0x74, 0x61, 0x67};
// (6) fixed by #3479
std::vector<std::uint8_t> vec1{0x7b, 0x23, 0x4d, 0xed, 0xed, 0xed, 0xed, 0xed, 0xed, 0xed, 0xed, 0xed};
// (7) fixed by #3479
std::vector<std::uint8_t> vec1{0x5b, 0x24, 0x49, 0x23, 0x69, 0xf4, 0x40, 0x00, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x58, 0x49, 0x69, 0x49, 0x49, 0x31, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x49, 0x02, 0xd2};
// (8) fixed by #3479
std::vector<std::uint8_t> vec1{0x7b, 0x23, 0x4d, 0x83, 0x22, 0x5b, 0x4d, 0x83, 0xdd, 0x5d, 0xff, 0xff, 0x00, 0xff, 0xff, 0xff, 0xff, 0x7f, 0x5d, 0x6c}; |
It seems to me, that various input results in a discarded value without raising an exception. @nlohmann That is only expected if any SAX call returns For example, these std::pair<std::size_t, char_int_type> size_and_type;
if (JSON_HEDLEY_UNLIKELY(!get_ubjson_size_type(size_and_type)))
{
return false;
}
if (input_format == input_format_t::bjdata && size_and_type.first != string_t::npos && size_and_type.first >= (1ull << (sizeof(std::size_t) * 8 - 1)))
{
return false;
} Likewise, statements like these if (JSON_HEDLEY_UNLIKELY(!get_ubjson_string(key) || !sax->key(key)))
{
return false;
} should be separated so that a parse error can be raised when |
Yes, the binary parsers all end with return res ? result : basic_json(value_t::discarded); where
It seems you're right. |
I think I was wrong about that. AFL++ ships with a libFuzzer driver that supposedly uses persistent mode. |
…3479) * change bjdata ndarray flag to detect negative size, fix #3475 * fix CI error * fix CI on 32bit windows * remove platform specific out_of_range error messages * Incorporate suggestions from @nlohmann and @falbrechtskirchinger * fix CI errors * add coverage * fix sax event order * fix coverage
I got a confirmation from OSS-Fuzz that the issue is fixed now. Thanks @fangq and @falbrechtskirchinger for the quick help! |
Description
The input
0x5b, 0x23, 0x49, 0x20, 0xff
triggers an assertion in the fuzzer for BJData.Reproduction steps
0x5b, 0x23, 0x49, 0x20, 0xff
.Expected vs. actual results
No assertion should be triggered. Either the fuzzer must be made more robust against or there is a bug in the library that must be fixed.
Minimal code example
Adapted fuzzer:
Error messages
Assertion
is triggered. In the original context, the following stack trace is produced:
Compiler and operating system
macOS 12.3.1, Apple clang version 13.1.6
Library version
develop
Validation
develop
branch is used.The text was updated successfully, but these errors were encountered: