Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag potentially duplicate rules in CI #12481

Open
dylwil3 opened this issue Jul 23, 2024 · 3 comments
Open

Flag potentially duplicate rules in CI #12481

dylwil3 opened this issue Jul 23, 2024 · 3 comments
Labels
ci Related to internal CI tooling

Comments

@dylwil3
Copy link
Collaborator

dylwil3 commented Jul 23, 2024

I wonder if it would be helpful to add a test or CI step to check for potentially duplicate rules.

The most naive version of this I can think of is as follows:

CI fails if there is a pair of rules, Rule A and Rule B, which identify identical spans in crates/ruff_linter/resources/test/.../A.py as violations.

In this case I would guess that either Rule A and Rule B are identical, or else it should be possible to disambiguate them by adding more test cases. (But maybe there is an error in that logic). If this isn't true, perhaps a list of exceptional pairs can be maintained.

Presumably there is also a clever way to implement this such that the first time the check is run it might be slow but is fast on subsequent runs.

(Bonus points: Apply the same principle to speed up/automate the detection of rules from other packages which have already been implemented in Ruff.)

This is sort of tangentially related to the rule re-categorization effort #1774 and of course to the question of aliasing #2186.

Would love to know if there's interest or suggestions. If so, I can try to implement this.

@MichaReiser
Copy link
Member

An automated way of detecting duplicated rules would be very helpful, but I don't have a good feeling for how well the described approach would work regarding false positives and false negatives. I think a good first step would be to identify duplicate rules that were later merged/removed and play it through manually.

@MichaReiser MichaReiser added the ci Related to internal CI tooling label Jul 24, 2024
@dylwil3
Copy link
Collaborator Author

dylwil3 commented Jul 24, 2024

Glad to hear there is some interest!

I did a quick experiment:

Methodology

  • Step 1: Run all Ruff rules on all test fixtures and export results to json. (Note: I did this just by selecting "ALL", which removes a few conflicting rules, but I'm ignoring that for now.)
  • Step 2: For each rule, record all tuples of the form (filename, start_line, end_line) where a check occurs.
  • Step 3: Compute the intersection over the union of these sets of "check locations" for each pair of rules.

Results

There appear to be no exact matches between existing rules, but we can see if there are any interesting "near matches". The top 10 most similar pairs (which happens to coincide with those pairs having similarity of over 50%) are as follows:

  • D400 (ends-in-period) and D415 (ends-in-punctuation) (94.20%)
    • An obvious relationship: one set of violations contains the other.
  • INP001 (implicit-namespace-package) and D100 (undocumented-public-module) (73.10%)
    • Probably an artifact of the file structure of test/fixtures.
  • TD003 (missing-todo-link) and FIX002 (line-contains-todo) (71.80%)
    • One set of violations contains the other.
  • F509 (percent-format-unsupported-format-character) and PLE1300 (bad-string-format-character (66.70%)
  • TD001 (invalid-todo-tag) and FIX001 (line-contains-fixme) (64.70%)
    • I believe the set of possible violations of the first contains the set of possible violations for the second
  • ANN201 (missing-return-type-undocumented-public-function) and D103 (undocumented-public-function) (63.90%)
    • Artifact of the test cases
  • PTH101 (os-chmod) and S103 (bad-file-permissions) (63.60%)
    • Artifact of the test cases
  • TD003 (missing-todo-link) and TD002 (missing-todo-author) (60.30%)
    • Artifact of the test cases
  • G004 (logging-f-string) and TRY401 (verbose-log-message) (53.30%)
    • Artifact of the test cases
  • FIX002 (line-contains-todo) and TD002 (missing-todo-author) (50.70%)
    • One set of violations contains the other.

The number of false positives for this test is either a feature or a bug depending on whether or not it's helpful to spot places where additional test cases should be added to the test fixture.

One can play with the threshold a bit, depending on how many candidate pairs you want to consider. There are

  • 10 pairs with >50% match
  • 13 pairs with >40% match
  • 30 pairs with >30% match
  • 56 pairs with >20% match
  • 135 pairs with >10% match
  • 300 pairs with >5% match
  • 2045 pairs with >0% match
  • 217470 pairs total

Next?

So perhaps having an automated check that checks a threshold of similarity would be helpful? If CI is too unwieldy for this, one could also manually run a script for this before releases/periodically.

The json and notebook used are at this repo if you'd like to play around yourself.

Let me know if you'd like me to go further with this (eg continue experimenting, add something to the scripts folder, etc.)

An example of an additional experiment might be to augment the set of files used to generate each rule's "profile" by including one or all of the codebases in the ecosystem checks. This has the advantage of comparing rule check behavior in a more natural setting, but the disadvantage that rarely tripped rules may get buried.

@MichaReiser
Copy link
Member

Sorry for the late reply. Overall, this does seem useful, but it seems too noisy for a CI job. It requires a manual review of the flagged rules.

The time I would find such a check the most useful is when there's a PR for a new rule, similar to the ecosystem check. But I must admit it's unclear to me how to design the check so that it is approachable, understandable, and has a good false-positive/false-negative ratio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Related to internal CI tooling
Projects
None yet
Development

No branches or pull requests

2 participants