-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve heuristics whether format_args
string is a source literal
#106195
Improve heuristics whether format_args
string is a source literal
#106195
Conversation
r? @oli-obk (rustbot has picked a reviewer for you, use r? to override) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, r=me with or without comments addressed since they're minor
unescape::unescape_literal(string, unescape::Mode::Str, &mut |_, unescaped_char| { | ||
match unescaped_char { | ||
Ok(c) => buf.push(c), | ||
Err(_) => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, should we handle escape errors and bail in find_width_map_from_snippet
if we see any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think it matters much but doesnt hurt
// Macros like `println` add a newline at the end. That technically doens't make them "literals" anymore, but it's fine | ||
// since we will never need to point our spans there, so we lie about it here by ignoring it. | ||
// Since there might actually be newlines in the source code, we need to normalize away all trailing newlines. | ||
let input_no_nl = input.trim_end_matches('\n'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we only want to trim only one \n
off the literal to deal with the println!
case, then check strict literal equality? Or am I missing something more subtle here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are just like I was when I tried this at first.
If we have format!("uwu\n")
, then trimming it off the input will cause a mismatch, as it's still present in the snippet.
Doing "only trim it when the snippet doesn't end with a newline" also doesn't work because it could be println!("uwu\n")
.
We would need to actually count the trailing newlines and match and only trim it off when the input has one more, but doing it like I did is a little simpler.
I will explain this better in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, I guess the important part is that we can't distinguish when the newline is added by the macro or already present in the literal. Oh well.
I also added a new enum for |
b2ab034
to
7e682cb
Compare
This comment has been minimized.
This comment has been minimized.
Previously, it only checked whether there was _a_ literal at the span of the first argument, not whether the literal actually matched up. This caused issues when a proc macro was generating a different literal with the same span. This requires an annoying special case for literals ending in `\n` because otherwise `println` wouldn't give detailed diagnostics anymore which would be bad.
This makes the relationship between the vec and the boolean clearer.
7e682cb
to
31b490d
Compare
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (0c0b403): comparison URL. Overall result: ❌ regressions - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
…ther-it-really-is-a-literal, r=compiler-errors Revert "Improve heuristics whether `format_args` string is a source literal" This reverts commit e6c02aa (from rust-lang#106195). Keeps the code improvements from the PR and the test (as a known-bug). Works around rust-lang#106408 while a proper fix is discussed more thoroughly in rust-lang#106505, as proposed by `@tmandry.` Reopens rust-lang#106191 r? compiler-errors
…-episode-2, r=petrochenkov Properly allow macro expanded `format_args` invocations to uses captures Originally, this was kinda half-allowed. There were some primitive checks in place that looked at the span to see whether the input was likely a literal. These "source literal" checks are needed because the spans created during `format_args` parsing only make sense when it is indeed a literal that was written in the source code directly. This is orthogonal to the restriction that the first argument must be a "direct literal", not being exanpanded from macros. This restriction was imposed by [RFC 2795] on the basis of being too confusing. But this was only concerned with the argument of the invocation being a literal, not whether it was a source literal (maybe in spirit it meant it being a source literal, this is not clear to me). Since the original check only really cared about source literals (which is good enough to deny the `format_args!(concat!())` example), macros expanding to `format_args` invocations were able to use implicit captures if they spanned the string in a way that lead back to a source string. The "source literal" checks were not strict enough and caused ICEs in certain cases (see rust-lang#106191). So I tightened it up in rust-lang#106195 to really only work if it's a direct source literal. This caused the `indoc` crate to break. `indoc` transformed the source literal by removing whitespace, which made it not a "source literal" anymore (which is required to fix the ICE). But since `indoc` spanned the literal in ways that made the old check think that it's a literal, it was able to use implicit captures (which is useful and nice for the users of `indoc`). This commit properly seperates the previously introduced concepts of "source literal" and "direct literal" and therefore allows `indoc` invocations, which don't create "source literals" to use implicit captures again. Fixes rust-lang#106191 [RFC 2795]: https://rust-lang.github.io/rfcs/2795-format-args-implicit-identifiers.html#macro-hygiene
Previously, it only checked whether there was a literal at the span of the first argument, not whether the literal actually matched up. This caused issues when a proc macro was generating a different literal with the same span.
This requires an annoying special case for literals ending in
\n
because otherwiseprintln
wouldn't give detailed diagnostics anymore which would be bad.Fixes #106191