Tracking issue for RFC 2151, Raw Identifiers #48589

Centril · 2018-02-27T18:53:22Z

This is a tracking issue for RFC 2151 (rust-lang/rfcs#2151).

Steps:

Implement the RFC (Implementation of RFC 2151, Raw Identifiers #48942)
Adjust documentation (see instructions on forge)
Stabilization PR (see instructions on forge)
- Settle on a final syntax for raw identifiers.

Unresolved questions:

Do macros need any special care with such identifier tokens?
Probably not.
Should diagnostics use the r# syntax when printing identifiers that overlap keywords?
Depends on the edition?
Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*

The text was updated successfully, but these errors were encountered:

nikomatsakis · 2018-02-28T16:49:10Z

@Centril you rock! @petrochenkov, think you could supply some mentoring instructions here?

Lymia · 2018-03-06T15:48:03Z

I'd like to take a shot at this. It seems like it'd be a decent way to learn how rustc works.

Manishearth · 2018-03-07T01:39:06Z

The relevant code is probably this, you'll want to make the ('r', Some('#'), _) case allow for the third character to be alphabetic or an underscore, and in that case skip the r and the # before running ident_continue.

We could add an is_raw boolean to token::Ident as well.

You'll also need to feature gate this but we can do that later.

lmk if you have questions

petrochenkov · 2018-03-07T09:02:42Z

We could add an is_raw boolean to token::Ident as well.

This is possible, but would be unfortunate. Idents are used everywhere and supposed to be small.

Ideally we should limit the effect of r# to lexer, for example by interning r#keyword identifiers into separate slots (like gensyms) so r#keyword and keyword have different ~~Name~~Symbols.

EDIT: The first paragraph is about ast::Ident, token::Ident may actually be the appropriate place.

petrochenkov · 2018-03-07T09:10:27Z

Some clarifications are needed:

How r# affects context-dependent identifiers (aka weak keywords) like default.
Do they lose their special context-dependent properties and turn into "normal identifiers"?

// `union` is a normal ident, this is not an error
union U {
    ....
}

// `union` is a raw ident, is this an error?
r#union U {
    ...
}

How does r# affect keywords that are "semantically special" and not "syntactically special"?
I'm talking about path segment keywords specifically.
For example, Self in Self::A::B is already treated as normal identifier during parsing, it only gains special abilities during name resolution when we resolve identifiers named Self (or self/super/etc) in a special way.

#[derive(Default)]
struct S;

impl S {
    fn f() -> S {
        r#Self::default() // Is this an error?
    }  
}

Manishearth · 2018-03-07T09:16:18Z

oh, I didn't realize we reuse Ident from the lexer.

I think r#union is an error (when used to create a union). We'll need the ident lexing step to return a bool on the lexed ident's raw-ness.

I think it's ok for r#Self to work; but don't mind either way

petrochenkov · 2018-03-07T20:48:04Z

Also, lifetime identifiers weren't covered by the RFC - r#'ident or 'r#ident.
(One more case of ident vs lifetime mismatch caused by lifetime token being a separate entity rather than a combination of ' and identifier, cc https://internals.rust-lang.org/t/pre-rfc-splitting-lifetime-into-two-tokens/6716).

Manishearth · 2018-03-07T21:05:51Z

I think it's fine if we don't have raw lifetime identifiers. Lifetimes are crate-local, their identifiers never need to be used by consumers of your crate, so lifetimes clashing with keywords can simply be fixed on epochs. Admittedly, writing a lint that makes that automatic may be tricky.

Raw identifiers are primarily necessary because people may need to call e.g. functions named catch() in crates on an older epoch. This problem doesn't occur for lifetimes.

petrochenkov · 2018-03-07T21:57:22Z

Yeah, it's mostly a consistency question rather than a practical issue.

Lymia · 2018-03-07T22:17:44Z

From what I've been seeing while looking around the codebase, I think the best way to implement this is to add a new parameter to token::Ident, rather than messing with the Symbol itself?

I think this would make implementing epoch-specific keywords easier, since there's no question of what Symbol should be used when, and in what epoch. (For example, you'd have to make sure the Symbol for catch being used as an identifier in 2015 epoch code is the same as the Symbol for r#catch being used in a epoch where it's a full keyword.) This was already something I wasn't sure how to handle with contextual keywords.

My main questions, right now, would be:

Does this actually sound like the best approach?
Reading the code, it looks like most feature gating is done on the AST after parsing, and not during parsing. Since nothing in the AST would reflect the raw identifiers being there at all with this approach, the feature check would have to be in parser.rs. Would adding one there be an issue? How would I go about doing that, considering that module doesn't have other feature checks that I can use as a template?
A minor code style point: right now, token::Ident is declared as Ident(ast::Ident). To add an is_raw field would mean having a mystery unnamed bool field in a tuple struct, or making it use named fields, in which case, matching on token::Idents becomes nastier. One idea that did come to mind is adding an RawIdent(ast::Ident) variant, but then the compiler can't help me find places I might need to worry about raw identifiers. Any advice on this?

I'll implement lifetime parameters if it turns out to be easy to, I guess. As Manishearth said, it's not something you really need to escape ever.

Manishearth · 2018-03-07T22:28:34Z

Yes, we should not be affecting Symbol.

Regarding the feature gate, we can solve the problem later, but I was thinking of doing a delayed error or something since we don't know what feature gates are available whilst lexing

a mystery unnamed bool field in a tuple struct

I think that's fine. Folks usually do this as Ident(ast::Ident, /* is_raw */ bool)

Manishearth · 2018-03-07T22:29:32Z

I think it's best if we don't allow this to work for lifetime parameters, actually. We restrict the number of places where raw identifiers are allowed at a first pass, and if we need this for a later epoch, we add it then

Manishearth · 2018-03-07T22:29:50Z

But yeah, your plan sounds good otherwise

Lymia · 2018-03-07T23:35:34Z

.... right, that makes sense. The lexer obviously doesn't know what feature flags there are, because it's busy lexing them. :D

nikomatsakis · 2018-03-08T16:14:12Z

@petrochenkov

In my opinion, r#foo should always be a "generic identifier" and hence not eligible as a contextual keyword. So:

How r# affects context-dependent identifiers (aka weak keywords) like default.

They are not special anymore. r#union Foo would be an error.

How does r# affect keywords that are "semantically special" and not "syntactically special"?

Is this just about self and Self? Does this apply to super too? We currently talk about self and Self as if they were keywords. I am therefore inclined to think that r#self would be "just another name" and not have the special properties that self ordinarily has.

So e.g. use r#self::foo would be an absolute path.

cc @rust-lang/lang -- do others agree?

scottmcm · 2018-03-08T18:42:19Z

@nikomatsakis That's exactly what I'd have expected.

That said, I don't know what that means with macros, since apparently this works:

macro_rules! foo {
    ($i:ident) => {
        $i Foo {
            x: u32,
            y: i32,
        }
    }
}

foo!(union);

(I wish it didn't, but it might be too late?)

cuviper · 2018-03-08T18:50:01Z

I hesitate about treating self and r#self as distinct things, because it seems like this would allow overlapping names in the same scope, like:

impl Foo {
    fn foo(self, r#self: Bar) {
        println!("I have distinct {} and {} at the same time?", self, r#self);
    }
}

Maybe that's in fact OK, but if so it's at least a corner case to test...

(In general, foo and r#foo are supposed to refer to the same thing.)

cuviper · 2018-03-08T18:52:14Z

@scottmcm I'd expect your foo!(union) to work, but not foo!(r#union). That is, macros will have to preserve the metadata whether a particular :ident is raw or not.

Lymia · 2018-03-09T06:06:42Z

I've found some other unexpected places where raw identifiers might show up while implementing this. Should these be allowed?

#[r#struct]
macro_rules! foo { ($r#struct:expr) => { r#expr } }

There's some weirdness with the built-in procedural macros taking raw identifiers too:

concat_args!(r#abc, r#def)
format!("{struct}", r#struct = 0);

Also, libproc_macro seems to need to be able to deal with raw identifiers somehow too. I've been creating artificial Symbols with content like r#struct, but this doesn't seem like a very good solution since these Symbols wouldn't be used anywhere else. Any advice here?

Manishearth · 2018-03-09T06:09:13Z

Procedural macros see them after lexing, so that will Just Work.

I don't think libproc_macro needs to know anything? Again, this is all after lexing.

I personally think it's fine to allow all those. Simplifies things.

Lymia · 2018-03-09T06:15:23Z

libproc_macro has an unstable enum that represent a token: https://doc.rust-lang.org/nightly/proc_macro/enum.TokenNode.html

This needs to know about the new field in token::Ident to properly serialize/deserialize it. I don't know how much unstable stuff depends on it.

Manishearth · 2018-03-09T06:16:34Z

Yeah, fair. As long as it's unstable we can tweak it. ८ मार्च, २०१८ १०:१५ म.उ. रोजी, "Lymia Aluysia" <notifications@github.com> ने लिहिले:

…

libproc_macro has an unstable enums that represent a token: https://doc.rust-lang.org/nightly/proc_macro/enum.TokenNode.html This needs to know about the new field in token::Ident to properly serialize/deserialize it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#48589 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABivSCOe13j2WQTrr0q94xEQlHXUW0aUks5tch4SgaJpZM4SVdSg> .

Lymia · 2018-03-09T06:22:25Z

That said, I don't know what that means with macros, since apparently this works:

macro_rules! foo {
    ($i:ident) => {
        $i Foo {
            x: u32,
            y: i32,
        }
    }
}

foo!(union);

Actually, checking on the playground, it looks like this isn't even unique to contextual keywords: https://play.rust-lang.org/?gist=98bba154f78cd9aba5838bf82ac2fbb4&version=stable

Lymia · 2018-03-10T04:15:42Z

Hrm, another strange case that came up while writing tests:

Given this macro definition, which branch should test_macro!(r#a) match:

macro_rules! test_macro {
    (a) => { ... };
    (r#a) => { ... };
}

Manishearth · 2018-03-10T04:17:39Z

@nikomatsakis ^ ?

Could even make this change based on the epoch. idk.

Manishearth · 2018-03-10T04:17:56Z

macro matching is kinda-sorta-breakable already

rfcbot · 2018-07-24T19:56:17Z

🔔 This is now entering its final comment period, as per the review above. 🔔

eddyb · 2018-07-31T16:00:13Z

You can't detect $foo properly in a macro_rules macro, but if you have a proc macro, it should work (at least now, maybe it has had issues in the past).

rfcbot · 2018-08-03T19:57:38Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

Mark-Simulacrum · 2018-08-08T15:19:53Z

I've nominated this mostly so that someone (lang team, maybe) can find or select someone to write up docs and stabilize this feature.

cramertj · 2018-08-09T19:13:04Z

This is in need of a stabilization PR! There's a stabilization guide on the forge. This feature is already documented in the edition guide, so much of that documentation can probably be reused in the stabilization PRs. Please post here if you plan to take this issue! (I'll circulate it around and see if anyone wants to take it as a good first or third PR)

alexreg · 2018-08-09T20:16:17Z

I'll have a go. This looks pretty straightforward.

alexreg · 2018-08-09T20:55:00Z

When it comes to new documentation, is there anything that should be updated besides the Reference? I'm not sure it belongs in the Book.

@cramertj

…mertj Stabilise raw_identifiers feature * [Reference PR](rust-lang/reference#395) * [Book PR](rust-lang/book#1480) * [Rust by Example PR](rust-lang/rust-by-example#1095) Closes rust-lang#48589. r? @cramertj CC @cuviper @Centril

@cramertj

…mertj Stabilise raw_identifiers feature * [Reference PR](rust-lang/reference#395) * [Book PR](rust-lang/book#1480) * [Rust by Example PR](rust-lang/rust-by-example#1095) Closes rust-lang#48589. r? @cramertj CC @cuviper @Centril

@cramertj

Stabilise raw_identifiers feature * [Reference PR](rust-lang/reference#395) * [Book PR](rust-lang/book#1480) * [Rust by Example PR](rust-lang/rust-by-example#1095) Closes #48589. r? @cramertj CC @cuviper @Centril

alexreg · 2018-08-21T16:36:19Z

Merged! I think some boxes can be ticked off now, @Centril. :-)

alexreg · 2018-08-21T17:09:32Z

As for the unresolved questions:

Do macros need any special care with such identifier tokens?
No, I can't see how this would affect macros. Perhaps @petrochenkov could second this, however.
Should diagnostics use the r# syntax when printing identifiers that overlap keywords?
Yes, I think so, although possibly we should not do this if we're using an old edition and the keyword was only introduced in a later one.
Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*
I'm not sure of this. What do you mean by pub use old_epoch::*, @Centril?

cuviper · 2018-08-21T17:25:01Z

Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*
I'm not sure of this. What do you mean by pub use old_epoch::*

For instance, if the old_epoch crate had a fn catch() which it could declare without bothering with raw identifiers, and a new_epoch crate used and re-exported it. The new crate would have had to use raw identifiers if it had declared that function itself. Should this affect the way rustdoc presents it?

alexreg · 2018-08-21T17:31:10Z

@cuviper Right, makes sense. I think the rules should be the same as what I proposed for diagnostics, in that case.

thatname · 2021-04-22T08:03:15Z

@Centril
I think it's still impossible for rust to reference external C++ symbols even with raw identifiers.
Because identifier names mangled by MS C++ ABI has characters such as '?', '@', which are not valid in rust identifiers.
For example, this is valid C++ code:

#ifdef _MSC_VER 
	// MS C++ ABI mangled identifier
	void* vtbl = __identifier("??_7BaseShader@pe3@@6B@");
#else
	// Itanium C++ ABI mangled name, all *nix toolchain use this.
	extern "C" void * _ZTVN3pe310BaseShaderE;
	void* vtbl = _ZTVN3pe310BaseShaderE ;
#endif

The __identifer("??_7BaseShader@pe3@@6B@") is a valid raw identifier in MS C++,
but r#??_7BaseShader@pe3@@6B@ is not valid in rust.
Will raw identifier be relaxed to accept these characters?

Nemo157 · 2021-04-22T08:26:04Z

Inter-language-operability was not a goal of this RFC, it was purely limited to inter-edition-operability within Rust. AFAIK #[link_name] should be enough for FFI purposes, there's no need to be able to have the identifiers in the Rust code identical to what is used to link to them.

Centril added B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. T-lang Relevant to the language team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Feb 27, 2018

Centril mentioned this issue Feb 27, 2018

RFC: Raw Identifiers rust-lang/rfcs#2151

Merged

rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Jul 24, 2018

rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Aug 3, 2018

Mark-Simulacrum added I-nominated P-high High priority labels Aug 7, 2018

cramertj self-assigned this Aug 9, 2018

cramertj added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Aug 9, 2018

Centril removed the I-nominated label Aug 11, 2018

bors closed this as completed in #53236 Aug 21, 2018

Tracking issue for RFC 2151, Raw Identifiers #48589

Tracking issue for RFC 2151, Raw Identifiers #48589

Comments

Centril commented Feb 27, 2018 • edited Loading

nikomatsakis commented Feb 28, 2018

Lymia commented Mar 6, 2018

Manishearth commented Mar 7, 2018

petrochenkov commented Mar 7, 2018 • edited Loading

petrochenkov commented Mar 7, 2018

Manishearth commented Mar 7, 2018

petrochenkov commented Mar 7, 2018

Manishearth commented Mar 7, 2018

petrochenkov commented Mar 7, 2018

Lymia commented Mar 7, 2018 • edited Loading

Manishearth commented Mar 7, 2018

Manishearth commented Mar 7, 2018

Manishearth commented Mar 7, 2018

Lymia commented Mar 7, 2018 • edited Loading

nikomatsakis commented Mar 8, 2018

scottmcm commented Mar 8, 2018

cuviper commented Mar 8, 2018 • edited Loading

cuviper commented Mar 8, 2018

Lymia commented Mar 9, 2018

Manishearth commented Mar 9, 2018

Lymia commented Mar 9, 2018 • edited Loading

Manishearth commented Mar 9, 2018 via email

Lymia commented Mar 9, 2018

Lymia commented Mar 10, 2018

Manishearth commented Mar 10, 2018

Manishearth commented Mar 10, 2018

rfcbot commented Jul 24, 2018

eddyb commented Jul 31, 2018

rfcbot commented Aug 3, 2018

Mark-Simulacrum commented Aug 8, 2018

cramertj commented Aug 9, 2018

alexreg commented Aug 9, 2018

alexreg commented Aug 9, 2018

alexreg commented Aug 21, 2018

alexreg commented Aug 21, 2018 • edited Loading

cuviper commented Aug 21, 2018

alexreg commented Aug 21, 2018

thatname commented Apr 22, 2021 • edited Loading

Nemo157 commented Apr 22, 2021

Centril commented Feb 27, 2018 •

edited

Loading

petrochenkov commented Mar 7, 2018 •

edited

Loading

Lymia commented Mar 7, 2018 •

edited

Loading

Lymia commented Mar 7, 2018 •

edited

Loading

cuviper commented Mar 8, 2018 •

edited

Loading

Lymia commented Mar 9, 2018 •

edited

Loading

alexreg commented Aug 21, 2018 •

edited

Loading

thatname commented Apr 22, 2021 •

edited

Loading