Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fast-path for comment detection #9808

Merged
merged 1 commit into from
Feb 5, 2024
Merged

Conversation

charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Feb 3, 2024

Summary

When we fall through to parsing, the comment-detection rule is a significant portion of lint time. This PR adds an additional fast heuristic whereby we abort if a comment contains two consecutive name tokens (via the zero-allocation lexer). For the ctypeslib.py, which has a few cases that are now caught by this, it's a 2.5x speedup for the rule (and a 20% speedup for token-based rules).

Copy link

codspeed-hq bot commented Feb 3, 2024

CodSpeed Performance Report

Merging #9808 will improve performances by 4.84%

Comparing charlie/eradicate (c63bc5e) with main (b47f85e)

Summary

⚡ 2 improvements
✅ 28 untouched benchmarks

Benchmarks breakdown

Benchmark main charlie/eradicate Change
linter/all-with-preview-rules[numpy/ctypeslib.py] 24.4 ms 23.2 ms +4.84%
linter/all-rules[numpy/ctypeslib.py] 21.7 ms 20.7 ms +4.44%

crates/ruff_linter/src/rules/eradicate/detection.rs Outdated Show resolved Hide resolved
@@ -182,7 +182,7 @@ fn to_keyword_or_other(source: &str) -> SimpleTokenKind {
"case" => SimpleTokenKind::Case,
"with" => SimpleTokenKind::With,
"yield" => SimpleTokenKind::Yield,
_ => SimpleTokenKind::Other, // Potentially an identifier, but only if it isn't a string prefix. We can ignore this for now https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
_ => SimpleTokenKind::Name, // Potentially an identifier, but only if it isn't a string prefix. We can ignore this for now https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we avoid returning a Name for a string prefix?

@charliermarsh charliermarsh force-pushed the charlie/eradicate branch 6 times, most recently from 03a1844 to a78aec3 Compare February 4, 2024 21:32
@charliermarsh charliermarsh marked this pull request as ready for review February 4, 2024 21:37
@charliermarsh charliermarsh added the performance Potential performance improvement label Feb 4, 2024
Copy link
Contributor

github-actions bot commented Feb 4, 2024

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check encountered linter errors. (no lint changes; 1 project error)

sphinx-doc/sphinx (error)

ruff failed
  Cause: Selection of unstable rules without the `--preview` flag is not allowed. Enable preview or remove selection of:
	- FURB113
	- FURB131
	- FURB132

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 2 project errors)

sphinx-doc/sphinx (error)

ruff format --no-preview --exclude tests/roots/test-pycode/cp_1251_coded.py

ruff failed
  Cause: Selection of unstable rules without the `--preview` flag is not allowed. Enable preview or remove selection of:
	- FURB113
	- FURB131
	- FURB132

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/dalle/Image_generations_edits_and_variations_with_DALL-E.ipynb:3:7:8: Unexpected token 'prompt'

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/dalle/Image_generations_edits_and_variations_with_DALL-E.ipynb:3:7:8: Unexpected token 'prompt'

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel comfortable having such an important constraint in an inline comment that violates the basic properties of SimpleTokenizer. We should explore if we can support proper name lexing in SimpleTokenizer without degrading performance.

crates/ruff_python_trivia/src/tokenizer.rs Outdated Show resolved Hide resolved
crates/ruff_python_trivia/src/tokenizer.rs Outdated Show resolved Hide resolved
crates/ruff_python_trivia/src/tokenizer.rs Outdated Show resolved Hide resolved
@charliermarsh charliermarsh merged commit 9781563 into main Feb 5, 2024
17 checks passed
@charliermarsh charliermarsh deleted the charlie/eradicate branch February 5, 2024 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants