Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadratic random from mangle_getLen overflows causing failure to corrupt past 64 KiB #360

Closed
gamozolabs opened this issue Aug 4, 2020 · 7 comments

Comments

@gamozolabs
Copy link

gamozolabs commented Aug 4, 2020

Due to an integer overflow in mangle_getLen, which is the function largely used to pick the offsets to corrupt in the input file, there is a hard limit of this function returning 65536. Thus, for input files larger than 64 KiB, the fuzzer no longer will be able to corrupt past the 64 KiB boundary (except for when it expands/splices past it).

For example mangle_Bit uses mangle_getOffSet to get the index of the byte to corrupt. This goes into mangle_getLen which will overflow if the input size exceeds 64 KiB, thus causing a failure to generate offsets past the 64 KiB boundary.

Due to the nature of the overflow, the numerator overflows but the divisor does not (except when the input exceeds ~2.52 MiB, when even the divisor will overflow when cubed). This is ultimately due to the numerator potentially being the input length to the fourth power, which overflows a 64-bit int at a 64 KiB input. While the numerator overflows and the divisor does not, the window of indices provided by mangle_getOffSet reduces even more, resulting in the inability to corrupt even past the first 16 bytes of an input when the input size is 1 MiB.

Here's the possible indices returned, and thus the indices which can be used to corrupt the input based on the input file size.

image

@robertswiecki
Copy link
Collaborator

Thanks for analyzing this. Yeah.. big fail on my part. Will fix soon!

@gamozolabs
Copy link
Author

gamozolabs commented Aug 4, 2020

Hehe, no problem. I've had my fair share of silly mistakes!

I've been porting some of the mutators in Rust and based on the input size (and what turns into x^4), I either use a u32, u64, or u128 for the exprand.

Historically for fuzzing I've used rand() % (rand() % size + 1) which has some exponential properties, but it doesn't have as nice of a shape as the honggfuzz rand. Honggfuzz exprand converges to no worse than 1/2 the chance of a uniform probability which is super nice.

I'm not sure how viable it is, but I've also used the following, which gives some exponential bias, but never can be more than 2x worse than uniform distribution. There's also no risk of overflow or div by zero if size > 0:

let rand = if rand() % 2 == 0 {
    // "exp" rand
    rand() % (rand() % size + 1)
} else {
    // Uniform rand
    rand() % size
}

Here's what the difference between honggfuzz rand and my rand is. The x axis is the byte offset, and the Y axis is the ratio of selection compared to uniform selection. Thus, y=0.5 means that value is selected half as frequently as it would be when uniformly generated , and y=20 means it's observed 20x more frequently than if it was a uniform distribution.

They seem fairly similar, but there's definitely enough of a difference that it could matter.

image

@vanhauser-thc
Copy link
Contributor

rand() uses futexes and that hurts performance a lot. there PRNGs that are faster, have better randomness and don't bottleneck.
in afl++ we use a xoroshiro variant, but random123 or a PCG variant are fine too.

@gamozolabs
Copy link
Author

gamozolabs commented Aug 4, 2020

Oh yeah, when I say rand(), I mean a random function. I don't think anyone here is using libc rand(). The cost of nearly any rand implementation (xorshift64, xoroshiro, doesn't really matter) is "free" WRT the cost of the modulo, it shouldn't matter at all.

@vanhauser-thc
Copy link
Contributor

I don't think anyone here is using libc rand().

well, guess what google afl is using :)

@robertswiecki
Copy link
Collaborator

Of course, I will fix it, though this approach to favoring lower offsets vs others is at best handwavy and based on my experience than any solid data at hand :)

The idea from #360 (comment) seems interesting since "looks" like something we'd want. I will experiment a bit with it and implement this or something similar.

I've looked through various exprand implementations in other projects, and they seem to be mostly implemented using floats, and since I don't want to spend too much time on analyzing whether float resolution will cause similar problems (e.g. skipping certain values because resolution becomes lower with higher values), I'd rather stick to some implementation based on integers.

robertswiecki added a commit that referenced this issue Aug 4, 2020
offsets for fuzzing.

It didn't do proper job (as per #360), so it's been replaced by
something simpler. Now it's still prefering smaller values, but it's
a step function-like distribution now.
@robertswiecki
Copy link
Collaborator

robertswiecki commented Aug 4, 2020

I've replaced it by something simpler - c950e75 - which still has useful properties from my subjective PoV, but less of a chance of misbehaving.

Feel free to discuss the topic anyway in this thread, if you'd like - it's an interesting topic, tho I worry that we might not have more substantial data to evaluate such offset/mangling strategies, than just our experience with various file data formats.

And, thanks for the report and for the analysis, which was very enlightening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants