Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve heuristics whether format_args string is a source literal #106195

Conversation

Noratrieb
Copy link
Member

Previously, it only checked whether there was a literal at the span of the first argument, not whether the literal actually matched up. This caused issues when a proc macro was generating a different literal with the same span.

This requires an annoying special case for literals ending in \n because otherwise println wouldn't give detailed diagnostics anymore which would be bad.

Fixes #106191

@rustbot
Copy link
Collaborator

rustbot commented Dec 27, 2022

r? @oli-obk

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 27, 2022
Copy link
Member

@compiler-errors compiler-errors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, r=me with or without comments addressed since they're minor

unescape::unescape_literal(string, unescape::Mode::Str, &mut |_, unescaped_char| {
match unescaped_char {
Ok(c) => buf.push(c),
Err(_) => {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, should we handle escape errors and bail in find_width_map_from_snippet if we see any?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think it matters much but doesnt hurt

// Macros like `println` add a newline at the end. That technically doens't make them "literals" anymore, but it's fine
// since we will never need to point our spans there, so we lie about it here by ignoring it.
// Since there might actually be newlines in the source code, we need to normalize away all trailing newlines.
let input_no_nl = input.trim_end_matches('\n');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we only want to trim only one \n off the literal to deal with the println! case, then check strict literal equality? Or am I missing something more subtle here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are just like I was when I tried this at first.
If we have format!("uwu\n"), then trimming it off the input will cause a mismatch, as it's still present in the snippet.
Doing "only trim it when the snippet doesn't end with a newline" also doesn't work because it could be println!("uwu\n").
We would need to actually count the trailing newlines and match and only trim it off when the input has one more, but doing it like I did is a little simpler.

I will explain this better in the comment.

Copy link
Member

@compiler-errors compiler-errors Dec 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, I guess the important part is that we can't distinguish when the newline is added by the macro or already present in the literal. Oh well.

@Noratrieb
Copy link
Member Author

I also added a new enum for find_width_map_from_snippet because I didnt like the tuple return type

@Noratrieb Noratrieb force-pushed the no-more-being-clueless-whether-it-really-is-a-literal branch from b2ab034 to 7e682cb Compare December 27, 2022 22:08
@rust-log-analyzer

This comment has been minimized.

Previously, it only checked whether there was _a_ literal at the span of
the first argument, not whether the literal actually matched up. This
caused issues when a proc macro was generating a different literal with
the same span.

This requires an annoying special case for literals ending in `\n`
because otherwise `println` wouldn't give detailed diagnostics anymore
which would be bad.
This makes the relationship between the vec and the boolean clearer.
@Noratrieb Noratrieb force-pushed the no-more-being-clueless-whether-it-really-is-a-literal branch from 7e682cb to 31b490d Compare December 28, 2022 16:43
@compiler-errors
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Dec 28, 2022

📌 Commit 31b490d has been approved by compiler-errors

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 28, 2022
@bors
Copy link
Contributor

bors commented Dec 29, 2022

⌛ Testing commit 31b490d with merge 0c0b403...

@bors
Copy link
Contributor

bors commented Dec 29, 2022

☀️ Test successful - checks-actions
Approved by: compiler-errors
Pushing 0c0b403 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 29, 2022
@bors bors merged commit 0c0b403 into rust-lang:master Dec 29, 2022
@rustbot rustbot added this to the 1.68.0 milestone Dec 29, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (0c0b403): comparison URL.

Overall result: ❌ regressions - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.6% [1.6%, 1.6%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.2% [-3.2%, -0.8%] 3
Improvements ✅
(secondary)
-2.2% [-4.7%, -0.7%] 19
All ❌✅ (primary) -2.2% [-3.2%, -0.8%] 3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.2% [-1.2%, -1.2%] 1
All ❌✅ (primary) - - 0

@Noratrieb Noratrieb deleted the no-more-being-clueless-whether-it-really-is-a-literal branch December 29, 2022 15:13
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 18, 2023
…ther-it-really-is-a-literal, r=compiler-errors

Revert "Improve heuristics whether `format_args` string is a source literal"

This reverts commit e6c02aa (from rust-lang#106195).

Keeps the code improvements from the PR and the test (as a known-bug).

Works around rust-lang#106408 while a proper fix is discussed more thoroughly in rust-lang#106505, as proposed by `@tmandry.`

Reopens rust-lang#106191

r? compiler-errors
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 14, 2023
…-episode-2, r=petrochenkov

Properly allow macro expanded `format_args` invocations to uses captures

Originally, this was kinda half-allowed. There were some primitive checks in place that looked at the span to see whether the input was likely a literal. These "source literal" checks are needed because the spans created during `format_args` parsing only make sense when it is indeed a literal that was written in the source code directly.

This is orthogonal to the restriction that the first argument must be a "direct literal", not being exanpanded from macros. This restriction was imposed by [RFC 2795] on the basis of being too confusing. But this was only concerned with the argument of the invocation being a literal, not whether it was a source literal (maybe in spirit it meant it being a source literal, this is not clear to me).

Since the original check only really cared about source literals (which is good enough to deny the `format_args!(concat!())` example), macros expanding to `format_args` invocations were able to use implicit captures if they spanned the string in a way that lead back to a source string.

The "source literal" checks were not strict enough and caused ICEs in certain cases (see rust-lang#106191). So I tightened it up in rust-lang#106195 to really only work if it's a direct source literal.

This caused the `indoc` crate to break. `indoc` transformed the source literal by removing whitespace, which made it not a "source literal" anymore (which is required to fix the ICE). But since `indoc` spanned the literal in ways that made the old check think that it's a literal, it was able to use implicit captures (which is useful and nice for the users of `indoc`).

This commit properly seperates the previously introduced concepts of "source literal" and "direct literal" and therefore allows `indoc` invocations, which don't create "source literals" to use implicit captures again.

Fixes rust-lang#106191

[RFC 2795]: https://rust-lang.github.io/rfcs/2795-format-args-implicit-identifiers.html#macro-hygiene
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid span created when using format_args with modified literal span and multibyte chars
7 participants