-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move file-level rule exemption to lexer-based approach #5567
Conversation
I thought this might end up being faster, but given that it's slower, IDK, open to not merging. I do think handling |
PR Check ResultsEcosystem✅ ecosystem check detected no changes. BenchmarkLinux
Windows
|
3c26030
to
92f34e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat. As mentioned in the other PR. An alternative would have been to write a more "traditional" lexer that returns a sequence of Tokens instead (with their ranges). See SimpleTokenizer
for an example. But this works too.
crates/ruff/src/noqa.rs
Outdated
@@ -256,38 +257,148 @@ enum ParsedFileExemption<'a> { | |||
impl<'a> ParsedFileExemption<'a> { | |||
/// Return a [`ParsedFileExemption`] for a given comment line. | |||
fn try_extract(line: &'a str) -> Option<Self> { | |||
let line = line.trim_whitespace_start(); | |||
let line = ParsedFileExemption::lex_whitespace(line); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ParsedFileExemption
-> Self
crates/ruff/src/noqa.rs
Outdated
let mut chars = line.chars(); | ||
if chars | ||
.next() | ||
.map_or(false, |c| c.to_ascii_lowercase() == 'n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised there isn't a better way to write this, but at least i found the issue for the missing methos rust-lang/rfcs#2566
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could match on the byte slice because we only compare with byte characters. But you then still need to do the invariant which is annoying?
match line.as_str().bytes() {
['n' | 'N', 'o' | 'O', 'q' | 'Q', 'a' | 'A'] => tada,
_ => nope
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is that we want to select the string after "noqa"
and converting back from bytes isn't safe.
Which reminds me, would line.strip_prefix("noqa")
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think line.strip_prefix("noqa")
matches case-insensitively, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is that we want to select the string after "noqa" and converting back from bytes isn't safe.
I think it would be safe here right because we know that all variations of noqa
have an exact length of 4 bytes
Anyway. I think the current code is fine. it's verbose but does the job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, I changed it.
How did you compute those ns timings? |
I use use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion};
use ruff::noqa::ParsedFileExemption;
pub fn directive_benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("Directive");
for i in [
"# noqa: F401",
"# noqa: F401, F841",
"# flake8: noqa: F401, F841",
"# ruff: noqa: F401, F841",
"# flake8: noqa",
"# ruff: noqa",
"# noqa",
"# type: ignore # noqa: E501",
"# type: ignore # nosec",
"# some very long comment that # is interspersed with characters but # no directive",
]
.iter()
{
group.bench_with_input(BenchmarkId::new("Regex", i), i, |b, _i| {
b.iter(|| ParsedFileExemption::try_regex(black_box(i)))
});
group.bench_with_input(BenchmarkId::new("Lexer", i), i, |b, _i| {
b.iter(|| ParsedFileExemption::try_extract(black_box(i)))
});
}
group.finish();
}
criterion_group!(benches, directive_benchmark);
criterion_main!(benches); In [[bench]]
name = "benchmark"
harness = false To Then |
92f34e5
to
5f78342
Compare
## Summary Similar to #5567, we can remove the use of regex, plus simplify the representation (use `Option`), add snapshot tests, etc. This is about 100x faster than using a regex for cases that match (2.5ns vs. 250ns). It's obviously not a hot path, but I prefer the consistency with other similar comment-parsing. I may DRY these up into some common functionality later on.
## Summary Similar to #5567, we can remove the use of regex, plus simplify the representation (use `Option`), add snapshot tests, etc. This is about 100x faster than using a regex for cases that match (2.5ns vs. 250ns). It's obviously not a hot path, but I prefer the consistency with other similar comment-parsing. I may DRY these up into some common functionality later on.
Summary
In addition to
# noqa
codes, we also support file-level exemptions, which look like:# flake8: noqa
(ignore all rules in the file, for compatibility)# ruff: noqa
(all rules in the file)# ruff: noqa: F401
(ignoreF401
in the file, Flake8 doesn't support this)This PR moves that logic to something that looks a lot more like our
# noqa
parser. Performance is actually quite a bit worse than the previous approach (lexing# flake8: noqa
goes from 2ns to 11ns; lexing# ruff: noqa: F401, F841
is about the same; lexing
# type: ignore # noqa: E501` fgoes from 4ns to 6ns), but the numbers are very small so it's... maybe worth it?The primary benefit here is that we now properly support flexible whitespace, like:
#flake8:noqa
. Previously, we required exact string matching, and we also didn't support all case-insensitive variants ofnoqa
.