Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create fuzzers for testing correctness of parsing, linting and fixing #4822

Merged
merged 21 commits into from
Jun 7, 2023

Conversation

addisoncrump
Copy link
Contributor

Summary

This PR introduces multiple fuzzers which test the correctness of Ruff. Namely:

  • ruff_parse_simple, which attempts to simply crash the parser (though is mainly useful as a utility for generating inputs)
  • ruff_parse_idempotency, which searches for idempotency violations in the parse/unparse utilities of Ruff
  • ruff_fix_validity, which checks that fixes applied by Ruff do not introduce syntax violations (using ruff::test::test_snippet)

Test Plan

This introduces new tests. I will open PRs with a few of the bugs discovered by the fuzzer and link them to this PR to demonstrate some of the things it is able to find.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 3, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      6.2±0.02ms     6.6 MB/sec    1.19      7.3±0.02ms     5.5 MB/sec
formatter/numpy/ctypeslib.py               1.00   1257.7±9.70µs    13.2 MB/sec    1.14   1439.6±2.26µs    11.6 MB/sec
formatter/numpy/globals.py                 1.00    145.1±1.20µs    20.3 MB/sec    1.09    157.8±0.74µs    18.7 MB/sec
formatter/pydantic/types.py                1.00      2.7±0.01ms     9.4 MB/sec    1.16      3.1±0.03ms     8.1 MB/sec
linter/all-rules/large/dataset.py          1.02     15.0±0.12ms     2.7 MB/sec    1.00     14.7±0.07ms     2.8 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.02      3.6±0.01ms     4.6 MB/sec    1.00      3.6±0.00ms     4.7 MB/sec
linter/all-rules/numpy/globals.py          1.01    364.7±0.89µs     8.1 MB/sec    1.00    362.4±0.82µs     8.1 MB/sec
linter/all-rules/pydantic/types.py         1.01      6.2±0.01ms     4.1 MB/sec    1.00      6.1±0.01ms     4.1 MB/sec
linter/default-rules/large/dataset.py      1.01      7.4±0.03ms     5.5 MB/sec    1.00      7.3±0.09ms     5.5 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1535.2±3.31µs    10.8 MB/sec    1.01   1550.2±3.51µs    10.7 MB/sec
linter/default-rules/numpy/globals.py      1.00    164.5±0.16µs    17.9 MB/sec    1.01    165.6±0.61µs    17.8 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.3±0.01ms     7.7 MB/sec    1.01      3.3±0.00ms     7.7 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      6.1±0.18ms     6.6 MB/sec    1.00      6.1±0.08ms     6.6 MB/sec
formatter/numpy/ctypeslib.py               1.00  1232.2±44.55µs    13.5 MB/sec    1.01  1250.1±48.50µs    13.3 MB/sec
formatter/numpy/globals.py                 1.00    138.4±2.74µs    21.3 MB/sec    1.03    142.3±3.75µs    20.7 MB/sec
formatter/pydantic/types.py                1.00      2.7±0.05ms     9.5 MB/sec    1.01      2.7±0.07ms     9.4 MB/sec
linter/all-rules/large/dataset.py          1.01     14.6±0.15ms     2.8 MB/sec    1.00     14.5±0.12ms     2.8 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      3.7±0.04ms     4.5 MB/sec    1.00      3.7±0.04ms     4.5 MB/sec
linter/all-rules/numpy/globals.py          1.00   435.5±11.61µs     6.8 MB/sec    1.00    434.4±6.63µs     6.8 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.2±0.07ms     4.1 MB/sec    1.00      6.2±0.14ms     4.1 MB/sec
linter/default-rules/large/dataset.py      1.00      7.2±0.06ms     5.6 MB/sec    1.01      7.3±0.07ms     5.6 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00  1531.6±23.47µs    10.9 MB/sec    1.01  1542.7±25.13µs    10.8 MB/sec
linter/default-rules/numpy/globals.py      1.00    174.5±4.46µs    16.9 MB/sec    1.00    174.8±6.41µs    16.9 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.3±0.04ms     7.8 MB/sec    1.01      3.3±0.04ms     7.8 MB/sec

@zanieb
Copy link
Member

zanieb commented Jun 3, 2023

This looks pretty cool! I'm looking forward to seeing this uncover more bugs — the two you linked already are nice finds.

@addisoncrump
Copy link
Contributor Author

This looks pretty cool! I'm looking forward to seeing this uncover more bugs — the two you linked already are nice finds.

Thanks! I hope it gets some good use -- manually sifting through the bugs right now to deduplicate, but hopefully once all the low-hanging fruit issues are out of the way we can integrate it into the standard testing process. 😁

@addisoncrump
Copy link
Contributor Author

Also, it should be fairly straightforward to extend the idempotency ideas to #4798 for testing the formatter as well, when this is completed.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome and very well done. Thank you so much. Also thank you for filling some issues already.

I've a few comments but overall looking good.

- name: "Install Rust toolchain"
run: rustup show
- uses: Swatinem/rust-cache@v2
- run: cargo install cargo-fuzz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unfamiliar with bininstall. I will try this 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, looking at the CI, it seems that none of the other installations in ci.yaml use binstall. Perhaps we should make this a separate PR instead.

Copy link
Member

@MichaReiser MichaReiser Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. They probably should. But I understand we want to remove the CI step for now anyway because there would be too many false positives. I recommend either dropping the CI step or adding it in its own PR to resolve the conversation for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to keep the build step. In the future, we should add the fuzz runs as another CI step as well.

crates/ruff/src/rules/airflow/mod.rs Outdated Show resolved Hide resolved
contents: &str,
path: &Path,
settings: &Settings,
max_iterations: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for test_path. We should extract the logic into a TestContentsRunner and use it inside of test_contents. This way, it becomes possible for you to use the advanced runner API without having to change all call sites (and reduces the number of arguments).

crates/ruff/src/test.rs Outdated Show resolved Hide resolved
fuzz/Cargo.toml Show resolved Hide resolved
fuzz/fuzz_targets/ruff_fix_validity.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/ruff_fix_validity.rs Outdated Show resolved Hide resolved
fuzz/fuzz_targets/ruff_parse_idempotency.rs Outdated Show resolved Hide resolved
fuzz/init-fuzzer.sh Show resolved Hide resolved
.github/workflows/ci.yaml Show resolved Hide resolved
Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is impressive work.

@addisoncrump
Copy link
Contributor Author

This last commit adds some sugar so I can fuzz locally with libafl (disclosure: this is a fuzzer I am a maintainer of), which is orders of magnitude faster than libfuzzer but has less support. It's not default and shouldn't affect normal users.

@jvoisin
Copy link

jvoisin commented Jun 6, 2023

It would be glorious to add this to OSS-Fuzz <3

@addisoncrump
Copy link
Contributor Author

That's the plan 😉

crates/ruff/src/rules/airflow/mod.rs Outdated Show resolved Hide resolved
@addisoncrump
Copy link
Contributor Author

Potentially, yes 🙂 Though, this seems really roundabout. Why does this need to be a global/constant? Should this perhaps be a Setting entry? This pattern also appears here: https://github.com/charliermarsh/ruff/blob/1ed5d7e437a1dda0f4c2ddae09f5a7aa7d713bde/crates/ruff/src/linter.rs#L247

@MichaReiser
Copy link
Member

MichaReiser commented Jun 6, 2023

Potentially, yes slightly_smiling_face Though, this seems really roundabout. Why does this need to be a global/constant? Should this perhaps be a Setting entry? This pattern also appears here:

https://github.com/charliermarsh/ruff/blob/1ed5d7e437a1dda0f4c2ddae09f5a7aa7d713bde/crates/ruff/src/linter.rs#L247

The settings is what we use in production. I would prefer to keep max iterations local to the testing infrastructure.

@addisoncrump
Copy link
Contributor Author

I see. This would work in my case, yes. Applying the change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants