-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a script to fuzz the parser (courtesy of pysource-codegen
)
#11015
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea if this is the correct place to put this script! This adds a whole new scripts/
directory to the crate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a scripts
directory at the project root, I would just put it there in a subdirectory along with a requirements.in
(and generated requirements.txt
) file. And, the module docstring could go in the README.md
of that directory. Refer to scripts/release
or scripts/benchmarks
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, if you prefer to keep it in ruff_python_parser
, we could rename it to fuzz
instead to make the intent clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it to the scripts/
directory. I'd rather keep the module docstring, though, as it currently doubles as the text that's shown if you pass -h
or --help
as a CLI argument to the script:
(ruff) (add-fuzzing-script) % py scripts/fuzz-parser/fuzz.py -h ~/dev/ruff
usage: fuzz.py [-h] [--only-new-bugs] [--quiet] seeds [seeds ...]
Run the parser on randomly generated (but syntactically valid) Python source-code files.
To install all dependencies for this script into an environment using `uv`, run:
uv pip install -r scripts/fuzz-parser/requirements.txt
Example invocations of the script:
- Run the fuzzer using seeds 0, 1, 2, 78 and 93 to generate the code:
`python scripts/fuzz-parser/fuzz.py 0-2 78 93`
- Run the fuzzer concurrently using seeds in range 0-10 inclusive,
but only reporting bugs that are new on your branch:
`python scripts/fuzz-parser/fuzz.py 0-10 --new-bugs-only`
- Run the fuzzer concurrently on 10,000 different Python source-code files,
and only print a summary at the end:
`python scripts/fuzz-parser/fuzz.py 1-10000 --quiet
N.B. The script takes a few seconds to get started, as the script needs to compile
your checked out version of ruff with `--release` as a first step before it
can actually start fuzzing.
positional arguments:
seeds Either a single seed, or an inclusive range of seeds in the format `0-5`
options:
-h, --help show this help message and exit
--only-new-bugs Only report bugs if they exist on the current branch, but *didn't* exist on the released version of Ruff installed into the Python environment we're running in
--quiet Print fewer things to the terminal while running the fuzzer
There also doesn't actually seem to be a README.md
file for the scripts/
directory right now
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me! Comments are just a typo and something that's probably not worth acting on.
The location of the script seems reasonable to me, but you could wait for someone who actually knows something about preferred ruff directory structure to comment on that, if you want to.
|
||
@dataclass | ||
class ResolvedCliArgs: | ||
seeds: list[Seed] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we'd have to actually partition the list of seeds, but we don't; we just feed them to the executors one by one, which is a lot simpler (and balances better, too.)
That also means technically we wouldn't have to fully materialize the list of seeds like we do now; instead we could just store the list[int | range]
here, and have a method that yields seeds one at a time.
But I think this doesn't really matter, and for smaller sizes of list, materializing is probably faster than a generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
if len(args.seeds) <= 5: | ||
bugs = run_fuzzer_sequentially(args) | ||
else: | ||
bugs = run_fuzzer_concurrently(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I nerd snip you into making it concurrent :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 😁 @carljm and I looked at it in our pairing session yesterday
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making it concurrent was a very good call, though. It's much faster now :)
parser.add_argument( | ||
"--only-new-bugs", | ||
action="store_true", | ||
help=( | ||
"Only report bugs if they exist on the current branch, " | ||
"but *didn't* exist on the released version of Ruff " | ||
"installed into the Python environment we're running in" | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice. So we can also just use it to test the parser in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a scripts
directory at the project root, I would just put it there in a subdirectory along with a requirements.in
(and generated requirements.txt
) file. And, the module docstring could go in the README.md
of that directory. Refer to scripts/release
or scripts/benchmarks
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Cc. @15r10nk -- |
That's great 🚀. It makes me really happy to know that pysource-codegen is useful for you. |
Summary
This PR adds a script that can be used to fuzz the parser against randomly generated, syntactically correct Python source-code files.
Test Plan
I reverted the
crates/ruff_python_parser
directory back to how it was as of 13ffb5b (in order to deliberately reintroduce some parser bugs), then ran the script several times with various invocations. Example output with the bugs reintroduced:Screenshot to show how it looks with colour: