Create fuzzers for testing correctness of parsing, linting and fixing #4822

addisoncrump · 2023-06-03T03:54:36Z

Summary

This PR introduces multiple fuzzers which test the correctness of Ruff. Namely:

ruff_parse_simple, which attempts to simply crash the parser (though is mainly useful as a utility for generating inputs)
ruff_parse_idempotency, which searches for idempotency violations in the parse/unparse utilities of Ruff
ruff_fix_validity, which checks that fixes applied by Ruff do not introduce syntax violations (using ruff::test::test_snippet)

Test Plan

This introduces new tests. I will open PRs with a few of the bugs discovered by the fuzzer and link them to this PR to demonstrate some of the things it is able to find.

github-actions · 2023-06-03T04:54:05Z

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      6.2±0.02ms     6.6 MB/sec    1.19      7.3±0.02ms     5.5 MB/sec
formatter/numpy/ctypeslib.py               1.00   1257.7±9.70µs    13.2 MB/sec    1.14   1439.6±2.26µs    11.6 MB/sec
formatter/numpy/globals.py                 1.00    145.1±1.20µs    20.3 MB/sec    1.09    157.8±0.74µs    18.7 MB/sec
formatter/pydantic/types.py                1.00      2.7±0.01ms     9.4 MB/sec    1.16      3.1±0.03ms     8.1 MB/sec
linter/all-rules/large/dataset.py          1.02     15.0±0.12ms     2.7 MB/sec    1.00     14.7±0.07ms     2.8 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.02      3.6±0.01ms     4.6 MB/sec    1.00      3.6±0.00ms     4.7 MB/sec
linter/all-rules/numpy/globals.py          1.01    364.7±0.89µs     8.1 MB/sec    1.00    362.4±0.82µs     8.1 MB/sec
linter/all-rules/pydantic/types.py         1.01      6.2±0.01ms     4.1 MB/sec    1.00      6.1±0.01ms     4.1 MB/sec
linter/default-rules/large/dataset.py      1.01      7.4±0.03ms     5.5 MB/sec    1.00      7.3±0.09ms     5.5 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00   1535.2±3.31µs    10.8 MB/sec    1.01   1550.2±3.51µs    10.7 MB/sec
linter/default-rules/numpy/globals.py      1.00    164.5±0.16µs    17.9 MB/sec    1.01    165.6±0.61µs    17.8 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.3±0.01ms     7.7 MB/sec    1.01      3.3±0.00ms     7.7 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      6.1±0.18ms     6.6 MB/sec    1.00      6.1±0.08ms     6.6 MB/sec
formatter/numpy/ctypeslib.py               1.00  1232.2±44.55µs    13.5 MB/sec    1.01  1250.1±48.50µs    13.3 MB/sec
formatter/numpy/globals.py                 1.00    138.4±2.74µs    21.3 MB/sec    1.03    142.3±3.75µs    20.7 MB/sec
formatter/pydantic/types.py                1.00      2.7±0.05ms     9.5 MB/sec    1.01      2.7±0.07ms     9.4 MB/sec
linter/all-rules/large/dataset.py          1.01     14.6±0.15ms     2.8 MB/sec    1.00     14.5±0.12ms     2.8 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      3.7±0.04ms     4.5 MB/sec    1.00      3.7±0.04ms     4.5 MB/sec
linter/all-rules/numpy/globals.py          1.00   435.5±11.61µs     6.8 MB/sec    1.00    434.4±6.63µs     6.8 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.2±0.07ms     4.1 MB/sec    1.00      6.2±0.14ms     4.1 MB/sec
linter/default-rules/large/dataset.py      1.00      7.2±0.06ms     5.6 MB/sec    1.01      7.3±0.07ms     5.6 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00  1531.6±23.47µs    10.9 MB/sec    1.01  1542.7±25.13µs    10.8 MB/sec
linter/default-rules/numpy/globals.py      1.00    174.5±4.46µs    16.9 MB/sec    1.00    174.8±6.41µs    16.9 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.3±0.04ms     7.8 MB/sec    1.01      3.3±0.04ms     7.8 MB/sec

zanieb · 2023-06-03T05:06:15Z

This looks pretty cool! I'm looking forward to seeing this uncover more bugs — the two you linked already are nice finds.

addisoncrump · 2023-06-03T05:16:44Z

This looks pretty cool! I'm looking forward to seeing this uncover more bugs — the two you linked already are nice finds.

Thanks! I hope it gets some good use -- manually sifting through the bugs right now to deduplicate, but hopefully once all the low-hanging fruit issues are out of the way we can integrate it into the standard testing process. 😁

addisoncrump · 2023-06-03T05:19:47Z

Also, it should be fairly straightforward to extend the idempotency ideas to #4798 for testing the formatter as well, when this is completed.

MichaReiser

This is awesome and very well done. Thank you so much. Also thank you for filling some issues already.

I've a few comments but overall looking good.

MichaReiser · 2023-06-05T06:32:20Z

.github/workflows/ci.yaml

+      - name: "Install Rust toolchain"
+        run: rustup show
+      - uses: Swatinem/rust-cache@v2
+      - run: cargo install cargo-fuzz


Does cargo-bininstall support cargo-fuzz? It could help to speed up the CI step

https://github.com/charliermarsh/ruff/blob/466719247bc68c0b608bc85be341c5aa5e2a5ec2/.github/workflows/benchmark.yaml#L81-L85

I am unfamiliar with bininstall. I will try this 🙂

Hm, looking at the CI, it seems that none of the other installations in ci.yaml use binstall. Perhaps we should make this a separate PR instead.

Good point. They probably should. But I understand we want to remove the CI step for now anyway because there would be too many false positives. I recommend either dropping the CI step or adding it in its own PR to resolve the conversation for now.

We want to keep the build step. In the future, we should add the fuzz runs as another CI step as well.

crates/ruff/src/rules/airflow/mod.rs

MichaReiser · 2023-06-05T06:39:55Z

crates/ruff/src/test.rs

+    contents: &str,
+    path: &Path,
+    settings: &Settings,
+    max_iterations: usize,


Same as for test_path. We should extract the logic into a TestContentsRunner and use it inside of test_contents. This way, it becomes possible for you to use the advanced runner API without having to change all call sites (and reduces the number of arguments).

crates/ruff/src/test.rs

fuzz/Cargo.toml

fuzz/fuzz_targets/ruff_fix_validity.rs

fuzz/fuzz_targets/ruff_parse_idempotency.rs

fuzz/init-fuzzer.sh

.github/workflows/ci.yaml

charliermarsh

This is impressive work.

addisoncrump · 2023-06-06T10:10:55Z

This last commit adds some sugar so I can fuzz locally with libafl (disclosure: this is a fuzzer I am a maintainer of), which is orders of magnitude faster than libfuzzer but has less support. It's not default and shouldn't affect normal users.

jvoisin · 2023-06-06T12:02:41Z

It would be glorious to add this to OSS-Fuzz <3

addisoncrump · 2023-06-06T12:04:09Z

That's the plan 😉

crates/ruff/src/rules/airflow/mod.rs

addisoncrump · 2023-06-06T15:44:37Z

Potentially, yes 🙂 Though, this seems really roundabout. Why does this need to be a global/constant? Should this perhaps be a Setting entry? This pattern also appears here: https://github.com/charliermarsh/ruff/blob/1ed5d7e437a1dda0f4c2ddae09f5a7aa7d713bde/crates/ruff/src/linter.rs#L247

MichaReiser · 2023-06-06T16:53:06Z

Potentially, yes slightly_smiling_face Though, this seems really roundabout. Why does this need to be a global/constant? Should this perhaps be a Setting entry? This pattern also appears here:

https://github.com/charliermarsh/ruff/blob/1ed5d7e437a1dda0f4c2ddae09f5a7aa7d713bde/crates/ruff/src/linter.rs#L247

The settings is what we use in production. I would prefer to keep max iterations local to the testing infrastructure.

addisoncrump · 2023-06-06T16:56:35Z

I see. This would work in my case, yes. Applying the change!

…complicated (more precision)

…#4822) Co-authored-by: Micha Reiser <micha@reiser.io>

This was referenced Jun 3, 2023

F523: Single variable with single quotes triggers syntax violation #4823

Closed

Parser Idempotency: Generator drops indentation on nested def #4825

Closed

addisoncrump mentioned this pull request Jun 3, 2023

F523: Single variable leads to literal rendering of newline #4826

Closed

This was referenced Jun 3, 2023

F541: Invalid replacement with string concatenation #4827

Closed

E703: Incorrect detection and replacement of semicolon #4828

Closed

MichaReiser reviewed Jun 5, 2023

View reviewed changes

addisoncrump mentioned this pull request Jun 5, 2023

F523: Regression: index out-of-bounds panic on invalid parameter indices #4863

Closed

addisoncrump force-pushed the main branch from 86cee83 to 3532ad2 Compare June 5, 2023 13:02

addisoncrump mentioned this pull request Jun 5, 2023

F523 should not be marked as always-fixable #4865

Closed

addisoncrump force-pushed the main branch from a21aedd to ecd1d3c Compare June 5, 2023 17:29

charliermarsh reviewed Jun 5, 2023

View reviewed changes

addisoncrump mentioned this pull request Jun 6, 2023

F522 should not be marked always-fixable #4892

Closed

addisoncrump force-pushed the main branch from ecd1d3c to a730f92 Compare June 6, 2023 07:40

This was referenced Jun 6, 2023

F601: Duplicated keys with different parenthesisation fixed incorrectly #4897

Closed

F504: Carriage return prevents expression extraction #4899

Closed

F841: unused local in 'with' statement with parethesised value does not converge #4901

Closed

MichaReiser approved these changes Jun 6, 2023

View reviewed changes

crates/ruff/src/rules/airflow/mod.rs Outdated Show resolved Hide resolved

addisoncrump added 3 commits June 6, 2023 19:00

init fuzzers

ea53932

init fuzzers

dd578ad

fixup readme

6110667

addisoncrump added 13 commits June 6, 2023 19:00

minor updates for parser idempotency, fix README

35a61b2

fix for CI, again

c5f1a63

init fuzzer with test suite, regardless of whether dataset is downloaded

0bc2cfa

Optional => OnceLock, add timeout to usage to catch infinite loops

ba921fd

remove accidental paste

a2f1b21

add text diffing for idempotency

83ab834

add reinit-fuzzer for new unit tests

bf2a2e4

actually, we should use fix_validity to merge because it is the most …

7819134

…complicated (more precision)

change base corpus

d4a6c84

oh we don't need those deps at all!

0aac4f0

add author

798afb2

add libafl compatibility

3659ec9

revert big test changes

2bc1640

addisoncrump force-pushed the main branch from 6c2b158 to 2bc1640 Compare June 6, 2023 17:31

fixup test complaints

918241c

This was referenced Jun 7, 2023

E731: Introduction of invalid indentation #4924

Open

E712: Invalid parentheses removal #4925

Closed

Use cargo binstall

b61830d

MichaReiser added the internal An internal refactor or improvement label Jun 7, 2023

MichaReiser merged commit 2f125f4 into astral-sh:main Jun 7, 2023

konstin mentioned this pull request Jun 7, 2023

Use taiki-e/install-action to install cargo fuzz #4928

Merged

charliermarsh mentioned this pull request Jun 8, 2023

Fuzzer support #3507

Closed

This was referenced Jun 8, 2023

Use ruff_fix_validity to catch regressions in CI not detected by unit tests #4972

Open

Implement round-trip fuzzers for finding correctness bugs rome/tools#4559

Merged

konstin pushed a commit that referenced this pull request Jun 13, 2023

Create fuzzers for testing correctness of parsing, linting and fixing (…

bbeea9f

…#4822) Co-authored-by: Micha Reiser <micha@reiser.io>

This was referenced Jun 17, 2023

F401 Deletes import, cause EOF error #5156

Closed

F541 deletes fstring w/o adding additional whitespace #5281

Closed

MahnurA mentioned this pull request Jul 11, 2023

Parser Idempotency issue: braces added every time after each pass boa-dev/boa#3133

Closed

0xalpharush mentioned this pull request Aug 3, 2023

fuzz testing the parser NomicFoundation/slang#440

Open

addisoncrump mentioned this pull request Jan 10, 2024

ruff: initial integration google/oss-fuzz#11471

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create fuzzers for testing correctness of parsing, linting and fixing #4822

Create fuzzers for testing correctness of parsing, linting and fixing #4822

addisoncrump commented Jun 3, 2023

github-actions bot commented Jun 3, 2023 •

edited

Loading

zanieb commented Jun 3, 2023

addisoncrump commented Jun 3, 2023

addisoncrump commented Jun 3, 2023

MichaReiser left a comment

MichaReiser Jun 5, 2023

addisoncrump Jun 5, 2023

addisoncrump Jun 5, 2023

MichaReiser Jun 6, 2023 •

edited

Loading

addisoncrump Jun 6, 2023

MichaReiser Jun 5, 2023

charliermarsh left a comment

addisoncrump commented Jun 6, 2023

jvoisin commented Jun 6, 2023

addisoncrump commented Jun 6, 2023

addisoncrump commented Jun 6, 2023

MichaReiser commented Jun 6, 2023 •

edited

Loading

addisoncrump commented Jun 6, 2023

Create fuzzers for testing correctness of parsing, linting and fixing #4822

Create fuzzers for testing correctness of parsing, linting and fixing #4822

Conversation

addisoncrump commented Jun 3, 2023

Summary

Test Plan

github-actions bot commented Jun 3, 2023 • edited Loading

PR Check Results

Ecosystem

Benchmark

Linux

Windows

zanieb commented Jun 3, 2023

addisoncrump commented Jun 3, 2023

addisoncrump commented Jun 3, 2023

MichaReiser left a comment

Choose a reason for hiding this comment

MichaReiser Jun 5, 2023

Choose a reason for hiding this comment

addisoncrump Jun 5, 2023

Choose a reason for hiding this comment

addisoncrump Jun 5, 2023

Choose a reason for hiding this comment

MichaReiser Jun 6, 2023 • edited Loading

Choose a reason for hiding this comment

addisoncrump Jun 6, 2023

Choose a reason for hiding this comment

MichaReiser Jun 5, 2023

Choose a reason for hiding this comment

charliermarsh left a comment

Choose a reason for hiding this comment

addisoncrump commented Jun 6, 2023

jvoisin commented Jun 6, 2023

addisoncrump commented Jun 6, 2023

addisoncrump commented Jun 6, 2023

MichaReiser commented Jun 6, 2023 • edited Loading

addisoncrump commented Jun 6, 2023

github-actions bot commented Jun 3, 2023 •

edited

Loading

MichaReiser Jun 6, 2023 •

edited

Loading

MichaReiser commented Jun 6, 2023 •

edited

Loading