Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added byte position range for proc_macro::Span #109002

Merged
merged 7 commits into from
Apr 21, 2023
Merged

Added byte position range for proc_macro::Span #109002

merged 7 commits into from
Apr 21, 2023

Conversation

michaelvanstraten
Copy link
Contributor

@michaelvanstraten michaelvanstraten commented Mar 10, 2023

Currently, the Debug implementation for proc_macro::Span calls the debug function implemented in the trait implementation of server::Span for the type Rustc in the rustc-expand crate.

The current implementation, of the referenced function, looks something like this:

fn debug(&mut self, span: Self::Span) -> String {
    if self.ecx.ecfg.span_debug {
        format!("{:?}", span)
    } else {
        format!("{:?} bytes({}..{})", span.ctxt(), span.lo().0, span.hi().0)
    }
}

It returns the byte position of the Span as an interpolated string.

Because this is currently the only way to get a spans position in the file, I might lead someone, who is interested in this information, to parsing this interpolated string back into a range of bytes, which I think is a very non-rusty way.

The proposed position(), method implemented in this PR, gives the ability to directly get this info.
It returns a std::ops::Range wrapping the lowest and highest byte of the Span.

I put it behind the proc_macro_span feature flag because many of the other functions that have a similar footprint also are annotated with it, I don't actually know if this is right.

It would be great if somebody could take a look at this, thank you very much in advanced.

@rustbot
Copy link
Collaborator

rustbot commented Mar 10, 2023

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @petrochenkov (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 10, 2023
@rustbot
Copy link
Collaborator

rustbot commented Mar 10, 2023

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@michaelvanstraten
Copy link
Contributor Author

Sorry, I did not check any of the src/tools, could somebody tell me how I can compile all tools using x?

@est31
Copy link
Member

est31 commented Mar 10, 2023

@michaelvanstraten I think the failing CI job just runs ./x check. In general you can get help in this zulip stream.

@michaelvanstraten
Copy link
Contributor Author

@michaelvanstraten I think the failing CI job just runs ./x check. In general you can get help in this zulip stream.

Thanks, got the same error locally.

@rustbot
Copy link
Collaborator

rustbot commented Mar 10, 2023

Some changes occurred in src/tools/rust-analyzer

cc @rust-lang/wg-rls-2

@est31
Copy link
Member

est31 commented Mar 11, 2023

Personally I think the API itself is a good addition, but for other things I wonder about different design:

  • the position name isn't intuitive, especially as it returns a range, while position implies a point. It also doesn't specify the unit. Maybe byte_range is better?
  • I'm not sure about the chosen type of u32: it means that one cannot process files that are larger than 4 gigabytes. This is already a restriction inside rustc, and it's fine to have that restriction there, as one can change it whenever the need arises. But with position we are talking about an API that will hopefully be stable one day, so more caution needs to be applied. LineColumn uses usize as the type, which might be the best solution here for consistency reasons. With u64 one would be at the safe side but inconsistent with LineColumn. I think we should use usize for now and then decide later about maybe switching all types to u64.

@michaelvanstraten
Copy link
Contributor Author

I get where you are coming from with the name, I had an issue with that as well.

I cast the u32 to an usize to be consistent with LineColumn, but that will probably be only temporary.

@michaelvanstraten
Copy link
Contributor Author

But with position we are talking about an API that will hopefully be stable one day, so more caution needs to be applied.

I follow the stability of various features on the proc-macro crate for a wile now and I don't really understand why many of the things related to the Span have not been stabilized by now.

@petrochenkov
Copy link
Contributor

How is the returned value supposed to be used?
It's meaningless outside of rustc internals (especially the absolute values).
@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 13, 2023
@petrochenkov
Copy link
Contributor

If the byte range is from the "root" lib.rs/main.rs file then it should be interpretable by a human from what I understand (and therefore make sense in debug printing), but otherwise it's basically an arbitrary number.

@michaelvanstraten
Copy link
Contributor Author

michaelvanstraten commented Mar 13, 2023

It's meaningless outside of rustc internals (especially the absolute values).

It has some meaning in the context of the proc-macro crate in the sense that you get a relative token stream. If you take any token from that stream, get its byte range, and then subtract the lower byte index of the first token in the stream, you get a relative position.

Making the position relative is first of all tedious to implementation and second of all you lose the ability to get a absolute positive of the token.

@michaelvanstraten
Copy link
Contributor Author

michaelvanstraten commented Mar 13, 2023

Let me give some context, I am working on a crate which parses the source text of a token stream using some grammar rules. The parser produces tokens with a relative offset byte offset to the start of the input string slice. If want to translate these tokens span back to the original i need to know relative byte offset of each input span.

Why would I like to translate back to the original spans? The first reason is the issue of error reporting, which, if you have worked with proc-macro, is terrible if you don't have the original tokens to work of. And the second reason is because I what some of the original token from the input stream in the output stream.

I know that my problem is mainly caused by the lack of custom error reporting available in the proc-macro crate but the issue that addresses this has not been updated for many years and this pull request offers a simple solution.

Also, nothing prevents anybody to use the Debug implementation to parse this information programmatically even though it should be interpreted by a human.

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 13, 2023
@petrochenkov
Copy link
Contributor

Also, nothing prevents anybody to use the Debug implementation to parse this information programmatically even though it should be interpreted by a human.

Except it being entirely undocumented and unstable, and being able to change at any moment.

@petrochenkov
Copy link
Contributor

I still don't understand the use case from #109002 (comment).

crate which parses the source text of a token stream using some grammar rules.

What is the "source text" of a token stream?
Token stream is the input, how is it turned into a text? Why does that text need to be parsed again if we already have the token stream with all the spans as the input?
Why is there a need to translate back to the original spans in the original spans are already available?

Having an example would be much more clear.
@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 14, 2023
@michaelvanstraten
Copy link
Contributor Author

Having an example would be much more clear. @rustbot author

let mut input_stream_iter = input.into_iter()
let input_span = match (input_stream_iter.next(), input_stream_iter.last()) {
    (Some(first_token), Some(last_token)) => first_token
        .span()
        .join(last_token.span())
        .expect("the two spans are from the same file"),
    (Some(first_token), None) => first_token.span(),
    _ => return Ok(...),
};

let Some(source_text) = input_span.source_text() else {
    return Err(...);
};

I extract the source text from a given token stream, which then gets parsed using some sort of grammar.
I currently use Pest for this.

The parsed grammar then gets validated and in case of an error I want to attribute that error back to a Span in the input token stream.

@petrochenkov
Copy link
Contributor

petrochenkov commented Mar 17, 2023

@michaelvanstraten

Would this be mergeable if this the byte range would be relative?

Could you give an example?
That sounds very similar to span subtraction.

@michaelvanstraten
Copy link
Contributor Author

michaelvanstraten commented Mar 17, 2023

@petrochenkov

Could you give an example? That sounds very similar to span substraction.

With relative byte range, I mean that the lowest byte of the first span in a token stream passed to a proc-macro would be zero and every other span would have a relative byte range to the first.

@michaelvanstraten
Copy link
Contributor Author

They do not return absolute positions, they return file-relative positions (which is also bad for incremental compilation possiblities, but less bad, and it also makes sense outside of rustc internals). Absolute positions as implemented in this PR are positions in a "source map", which is something that is glued together from all source files in a project.

I am sorry I did not know that, that really makes the absolute value meaningless.
But then the question arises why it is included in the debug output, I mean the "source map" would never be read by a human @petrochenkov?

@petrochenkov
Copy link
Contributor

petrochenkov commented Mar 17, 2023

I don't know 🤷
As mentioned in #109002 (comment) it may make sense in the root file of a project (because it happens to go first in the source map), or for a compiler developer, need to check the history.

@michaelvanstraten
Copy link
Contributor Author

Okay, still if the byte span values are meaningless in the file they still give relative information.

Could we merge this with a notice that the values are from the start of the source map?

@michaelvanstraten
Copy link
Contributor Author

I think this would be translatable using the lookup_byte_offset() method on the SourceMap.

@michaelvanstraten
Copy link
Contributor Author

The byte offset should now be relative to the source file.

Is this now mergeable @petrochenkov?

@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 19, 2023
@petrochenkov
Copy link
Contributor

Should be fine as an unstable method, but I wouldn't want to stabilize neither this method nor start/end returning lines and columns, nor anything else that is not relative to some existing span.

Span difference and length (#109002 (comment)) are better stabilized instead to address your use case.

@bors r+

@bors
Copy link
Contributor

bors commented Apr 21, 2023

📌 Commit 342c5fb has been approved by petrochenkov

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 21, 2023
@bors
Copy link
Contributor

bors commented Apr 21, 2023

⌛ Testing commit 342c5fb with merge 1151ea6...

@bors
Copy link
Contributor

bors commented Apr 21, 2023

☀️ Test successful - checks-actions
Approved by: petrochenkov
Pushing 1151ea6 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 21, 2023
@bors bors merged commit 1151ea6 into rust-lang:master Apr 21, 2023
@rustbot rustbot added this to the 1.71.0 milestone Apr 21, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (1151ea6): comparison URL.

Overall result: ❌ regressions - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.4% [0.4%, 0.4%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [0.4%, 0.4%] 1

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.3% [4.3%, 4.3%] 1
Improvements ✅
(primary)
-0.5% [-0.6%, -0.4%] 2
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) -0.5% [-0.6%, -0.4%] 2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.1% [3.1%, 3.1%] 1
Regressions ❌
(secondary)
8.2% [8.0%, 8.5%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 3.1% [3.1%, 3.1%] 1

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 5, 2023
…nkov

Added byte position range for `proc_macro::Span`

Currently, the [`Debug`](https://doc.rust-lang.org/beta/proc_macro/struct.Span.html#impl-Debug-for-Span) implementation for [`proc_macro::Span`](https://doc.rust-lang.org/beta/proc_macro/struct.Span.html#) calls the debug function implemented in the trait implementation of `server::Span` for the type `Rustc` in the `rustc-expand` crate.

The current implementation, of the referenced function, looks something like this:
```rust
fn debug(&mut self, span: Self::Span) -> String {
    if self.ecx.ecfg.span_debug {
        format!("{:?}", span)
    } else {
        format!("{:?} bytes({}..{})", span.ctxt(), span.lo().0, span.hi().0)
    }
}
```

It returns the byte position of the [`Span`](https://doc.rust-lang.org/beta/proc_macro/struct.Span.html#) as an interpolated string.

Because this is currently the only way to get a spans position in the file, I might lead someone, who is interested in this information, to parsing this interpolated string back into a range of bytes, which I think is a very non-rusty way.

The proposed `position()`, method implemented in this PR, gives the ability to directly get this info.
It returns a [`std::ops::Range`](https://doc.rust-lang.org/std/ops/struct.Range.html#) wrapping the lowest and highest byte of the [`Span`](https://doc.rust-lang.org/beta/proc_macro/struct.Span.html#).

I put it behind the `proc_macro_span` feature flag because many of the other functions that have a similar footprint also are annotated with it, I don't actually know if this is right.

It would be great if somebody could take a look at this, thank you very much in advanced.
bors added a commit to rust-lang/rust-analyzer that referenced this pull request Jun 6, 2023
Add span to group.

This appears to fix #14959, but I've never contributed to rust-analyzer before and there were some things that confused me:

- I had to add the `fn byte_range` method to get it to build. This was added to rust in [April](rust-lang/rust#109002), so I don't understand why it wasn't needed until now
- When testing, I ran into the fact that rust recently updated its `METADATA_VERSION`, so I had to test this with nightly-2023-05-20. But then I noticed that rust has its own copy of `rust-analyzer`, and the metadata version bump has already been [handled there](rust-lang/rust@60e95e7). So I guess I don't really understand the relationship between the code there and the code here.
bors added a commit to rust-lang/rust-analyzer that referenced this pull request Jun 9, 2023
Add span to group.

This appears to fix #14959, but I've never contributed to rust-analyzer before and there were some things that confused me:

- I had to add the `fn byte_range` method to get it to build. This was added to rust in [April](rust-lang/rust#109002), so I don't understand why it wasn't needed until now
- When testing, I ran into the fact that rust recently updated its `METADATA_VERSION`, so I had to test this with nightly-2023-05-20. But then I noticed that rust has its own copy of `rust-analyzer`, and the metadata version bump has already been [handled there](rust-lang/rust@60e95e7). So I guess I don't really understand the relationship between the code there and the code here.
bors added a commit to rust-lang/rust-analyzer that referenced this pull request Jun 9, 2023
Add span to group.

This appears to fix #14959, but I've never contributed to rust-analyzer before and there were some things that confused me:

- I had to add the `fn byte_range` method to get it to build. This was added to rust in [April](rust-lang/rust#109002), so I don't understand why it wasn't needed until now
- When testing, I ran into the fact that rust recently updated its `METADATA_VERSION`, so I had to test this with nightly-2023-05-20. But then I noticed that rust has its own copy of `rust-analyzer`, and the metadata version bump has already been [handled there](rust-lang/rust@60e95e7). So I guess I don't really understand the relationship between the code there and the code here.
bors added a commit to rust-lang/rust-analyzer that referenced this pull request Jun 10, 2023
Add span to group.

This appears to fix #14959, but I've never contributed to rust-analyzer before and there were some things that confused me:

- I had to add the `fn byte_range` method to get it to build. This was added to rust in [April](rust-lang/rust#109002), so I don't understand why it wasn't needed until now
- When testing, I ran into the fact that rust recently updated its `METADATA_VERSION`, so I had to test this with nightly-2023-05-20. But then I noticed that rust has its own copy of `rust-analyzer`, and the metadata version bump has already been [handled there](rust-lang/rust@60e95e7). So I guess I don't really understand the relationship between the code there and the code here.
bors added a commit to rust-lang/rust-analyzer that referenced this pull request Jun 10, 2023
Add span to group.

This appears to fix #14959, but I've never contributed to rust-analyzer before and there were some things that confused me:

- I had to add the `fn byte_range` method to get it to build. This was added to rust in [April](rust-lang/rust#109002), so I don't understand why it wasn't needed until now
- When testing, I ran into the fact that rust recently updated its `METADATA_VERSION`, so I had to test this with nightly-2023-05-20. But then I noticed that rust has its own copy of `rust-analyzer`, and the metadata version bump has already been [handled there](rust-lang/rust@60e95e7). So I guess I don't really understand the relationship between the code there and the code here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants