-
Notifications
You must be signed in to change notification settings - Fork 656
Implement round-trip fuzzers for finding correctness bugs #4559
Conversation
✅ Deploy Preview for docs-rometools canceled.Built without sensitive environment variables
|
I saw your amazing work on Ruff! This is some amazing work that I want to steal for my project https://github.com/Boshen/oxc as well 😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
I'm noticing that the formatter is introducing syntax errors even into the samples I'm pulling from the repository. I think my assumptions about the formatter properties are too strict... Definitely need a second set of eyes here. |
Great PR! Have you seen astral-sh/ruff#3721 (comment) ? |
Certainly. This solution is more oriented towards CI pipelines and continuous testing, but its test case minimisation strategy could be used to reduce the broken files generated in the other issue into minimum reproductions. That said, I do think this strategy potentially covers all the test cases that the file generator will. We would just need to add fuzzers for the linter. The fuzzers here will find violations that the other testing strategy cannot, namely because it uses the property oracles while at the same time potentially triggering crashes. Potentially, the best thing we could use the broken file generator to create a corpus of inputs. There's not a lot of typescript source code corpora out there 🙂 Let me clarify this a bit further. Recently, there was a work called Fuzztruction, which showed that the use of erroneous input generation can accelerate a fuzzer's exploration of program coverage. In some cases, the input generator on its own was able to explore coverage well, but in many cases required a fuzzer to be used in parallel. Moreover, I've evaluated this work on my own and found that its performance significantly reduces when there are not many generators in use in parallel, being squarely outperformed by input gen + fuzzer. Note also that the corpora used for the experiments in this paper are very small; the results may not be comparable to a high-performing fuzzer with a strong corpus. Since the ultimate purpose of this is to identify bugs in code for which there are insufficient unit tests, we want to keep these runs small and use a relatively small amount of compute resources (that way, we can put it in CI). Input generation, combined with fuzzing, works well for long runs with high parallelism, but a strong corpus and a simple fuzzer will outperform even the combination of the two in short runs. |
How can we reproduce the issue? Do you have some sample of broken code, so we can help? |
I'd like to suggest one more variant to check: "formatted code should pass lint". |
What happens if the original code doesn't pass lint? |
Oh, it's a good point. |
You can run
These three test cases all cause the formatter to introduce a syntax error. Let me update the fuzzer to emit a text diff so this is easier to see. |
This is good 🙂 Let me try it. |
I believe that it's an example #4553 |
Ah, on my system, sh is symlinked to bash. I will fix. |
@denbezrukov I believe I have implemented your suggestion. Check it out 🙂 |
The formatter fuzzers are now extremely aggressive, and flag many failure cases. However, I'm not sure if all of these failure cases are considered bugs. Would a maintainer inspect the |
- Formatting code twice will have the same result as formatting code once | ||
|
||
In this way, we verify the [idempotency](https://en.wikipedia.org/wiki/Idempotence) and syntax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💜
@@ -11,13 +11,15 @@ fi | |||
|
|||
if [ ! -d corpus/rome_format_all ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would a build.rs be crazy here? instead of a shell?
Seems like the fuzzer couldn't compile https://github.com/rome/tools/actions/runs/5249047207/jobs/9485118431#step:4:1 |
Bleh, yeah, I've seen this locally. Some linkage issue on 1.69; it works just fine on 1.70. |
I hope to get this merged soon #4563 |
Compiler is OOMing for the fuzzer 😬 I'll try to resolve this locally. |
Summary
This PR implements fuzzers for testing the correctness of the parser and the formatter. These fuzzers will identify invalid UTF-8 indexing issues, panics/unreachables, logic errors, and violations of the round-trip property of parsing and formatting. Much of the source code and layout is based on the recent ruff fuzzer.
I will open issues detailing bugs identified by the fuzzer in the coming days. At time of writing, it is quite late at night. I have marked this PR as a draft as I still need to add some documentation regarding the fuzzers both in the source files and in the README, and add the fuzzer builds to the CI.
Test Plan
This adds additional testing features.
Changelog
Documentation