Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syntax_pos::symbol::Symbol::gensym() is incompatible with stable hashing. #49300

Closed
michaelwoerister opened this issue Mar 23, 2018 · 19 comments · Fixed by #64623
Closed

syntax_pos::symbol::Symbol::gensym() is incompatible with stable hashing. #49300

michaelwoerister opened this issue Mar 23, 2018 · 19 comments · Fixed by #64623
Assignees
Labels
A-incr-comp Area: Incremental compilation A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-resolve Area: Name resolution A-syntaxext Area: Syntax extensions C-bug Category: This is a bug. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@michaelwoerister
Copy link
Member

The current scheme for "gensym"ing names via Symbol::gensym() is rather clever and elegant but unfortunately it defies being hashed in a stable way as is needed for incremental compilation. So far, incr. comp. has erroneously treated differing Symbols with equal string contents as being the same. Now situations are starting to arise where this leads to problems (#48923, #49065).

I consider this a somewhat urgent issue.

cc @rust-lang/compiler, esp. @petrochenkov & @jseyfried

@michaelwoerister michaelwoerister added A-syntaxext Area: Syntax extensions A-resolve Area: Name resolution A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-incr-comp Area: Incremental compilation C-bug Category: This is a bug. labels Mar 23, 2018
@michaelwoerister michaelwoerister added I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 23, 2018
@eddyb
Copy link
Member

eddyb commented Mar 23, 2018

I generally consider gensym problematic and would prefer it if all hygiene was done through Span.
There are maybe a few non-hygiene uses but they should probably be done in other ways.

@petrochenkov
Copy link
Contributor

petrochenkov commented Mar 23, 2018

We should investigate implementing gensyms via SyntaxContext (which means via Span after #49154).
In theory, every situation when symbol != gensym(symbol) makes sense should include hygiene into comparison as well.
If "gensymness" is kept in SyntaxContext side tables somehow, then symbol1 == symbol2 will always be equivalent to string_content(symbol1) == string_content(symbol2), that also looks like a simplification of the mental model to me.

@petrochenkov
Copy link
Contributor

petrochenkov commented Mar 30, 2018

Blocked on removing emulation of hygiene with gensyms (can't emulate gensyms with hygiene if we are doing exactly the opposite thing right now!) (https://github.com/petrochenkov/rust/tree/spident2, in progress), which in its turn is waiting for #49154 and #49718.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 12, 2018

Could gensym generate a name with a stable prefix (proc macro crate name + version), a stable ID (proc macro invocation local) and some character that is illegal in names, so noone can name it? Maybe even throw in the stable hash of the code it is invoked upon?

@nikomatsakis
Copy link
Contributor

Categorizing as P-high so that this keeps annoying us, even though we don't have anyone assigned to it or actively working on it.

@michaelwoerister
Copy link
Member Author

We've been coming back to this issue in the @rust-lang/compiler meeting without progress for a few weeks in a row now. It would be great if we could come up with some action items. @petrochenkov , the two blocking PRs you mentioned above have be merged in the meantime. What would be the next steps? Or broader question: Do we know what to do and just need to implement it or are there still open questions about how this can be solved?

@petrochenkov
Copy link
Contributor

petrochenkov commented May 26, 2018

@michaelwoerister

Or broader question: Do we know what to do and just need to implement it or are there still open questions about how this can be solved?

With regards to the first part (removing emulation of hygiene with gensyms) we know what to do and just need to implement it.
I've opened #51072 that does this for fields, but there are also associated items and lifetimes.
This is mostly a mechanical work, but it's also a great opportunity to do audit of identifier use and fix hygiene bugs (this part is not so mechanical).

The good news is that I have a few free days before I leave on vacation on May 31, and I was going to spend them on Macro 1.2 stuff, but there's a good chance I'll have time left on Identification of HIR.
If something remains unfinished after May 31 someone else can do it without identifier audit.

With regards to the second part, I didn't yet think about how exactly gensyms should be encoded in HygieneData structures so they are compatible with stable hashing. I'm not sure what are requirements - stable representation for different compilation sessions for the same crate, different compilation sessions for different crates?

bors added a commit that referenced this issue May 26, 2018
Use `Ident`s for fields in HIR

Continuation of #49718, part of #49300
@michaelwoerister
Copy link
Member Author

That sounds like good news, thanks @petrochenkov!

Regarding stability: The main requirement is that a gensymed symbol hashes the same way in two different compilation sessions regardless of what other HIR items have been added before or after it. If something in the item that contains the symbol has changed, then the hash can also change.

An example what we need for stable hashing are HirIds. They consist of a DefIndex (for which we hash the DefPath which does not change, even if we insert other items) and an ItemLocalId, which just numbers all nodes within the given item.

bors added a commit that referenced this issue Jun 11, 2018
Use `Ident`s in HIR and remove emulation of hygiene with gensyms

continuation of #51072, part of #49300

Not all `Name`s in HIR are replaced with `Ident`s, only those needed for hygiene or already having attached spans.
bors added a commit that referenced this issue Jun 28, 2018
Use `Ident`s in HIR and remove emulation of hygiene with gensyms

continuation of #51072, part of #49300

Not all `Name`s in HIR are replaced with `Ident`s, only those needed for hygiene or already having attached spans.
@pnkfelix
Copy link
Member

visiting for triage.

assigning to @michaelwoerister to ensure that it gets addressed (potentially via further delegation).

@petrochenkov
Copy link
Contributor

petrochenkov commented Jul 12, 2018

Some status update: 94ef9f5 was merged, so all macros (including legacy builtin ones) can now easily generate identifiers with def-site hygiene.
So most of gensyms (although perhaps not all of them) should now be replaceable with hygienic identifiers.

@michaelwoerister
Copy link
Member Author

Thanks for the awesome work so far, @petrochenkov!

The current situation is that we don't use gensym anymore but we are still not hashing things properly: #51492 (comment)

I haven't had time to think about the various solution strategies yet.

@pnkfelix
Copy link
Member

pnkfelix commented Aug 2, 2018

visiting for triage. No progress yet. Should consider either reassigning or lowering priority if we are still stalled next week.

@nikomatsakis
Copy link
Contributor

The problem remains, but at least we have some kind of assertion that should catch problems related to it. Going to downgrade to P-medium for now.

@nikomatsakis nikomatsakis added P-medium Medium priority and removed P-high High priority labels Aug 9, 2018
@michaelwoerister
Copy link
Member Author

... but at least we have some kind of assertion that should catch problems related to it

In particular, this one:

// If the following assertion triggers, it can have two reasons:
// 1. Something is wrong with DepNode creation, either here or
// in DepGraph::try_mark_green()
// 2. Two distinct query keys get mapped to the same DepNode
// (see for example #48923)
assert!(!self.dep_graph.dep_node_exists(&dep_node),
"Forcing query with already existing DepNode.\n\
- query-key: {:?}\n\
- dep-node: {:?}",
key, dep_node);

@petrochenkov
Copy link
Contributor

I haven't seen this written out explicitly, but expansion/hygiene data can be stored in metadata/incremental-data in a stable way using the same DefPath approach as other things.

The basic thing to encode here is ExpnId and it has a well-defined parent DefIndex (currently available through the Definitions::invocation_parents table).
If multiple expansions share the parent def-path, then, I think, they should be able to just get local disambuguators.

mod m {
    fn foo() { // def-path: m::foo
        // Here's how the closures' `NodeId`s get a stable representation.
        let closure1 = || {}; // m::foo(1)
        let closure2 = || {}; // m::foo(2)

        // Same idea, but for macro invocations and their `ExpnId`s instead of `NodeId`s
        mac1!(); // m::foo!(1)
        mac2!(); // m::foo!(2)
    }
}

If we can encode ExpnId, then we can encode SyntaxContext as well (a chain of ExpnIds), then Span and everything else.
ExpnId is the only ID-like thing in this area, other components of hygiene structures are plain data.


#63919 should be one of few last steps in the removal of gensyms, after that we'll only have hygiene, which should be encodable as described above, if I'm not missing anything.

@michaelwoerister
Copy link
Member Author

Hm, making macro invocations a regular DefPath component... I haven't thought about this deeply yet but it might be genius :) Then DefPath could already be assigned during parsing, with macros just adding more of them without conflicts. cc @eddyb @nikomatsakis

@eddyb
Copy link
Member

eddyb commented Aug 28, 2019

@michaelwoerister Yeah, that's also roughly my vision for moving incremental further back, I think I discussed it with @nikomatsakis before.

I was supposed to write about this months ago but other things came up.

In short, there'd be "Def" layer that parsing, expansion and cross-crate metadata would all interact with, with AST/HIR remaining mainly for the intra-definition syntactic categories and miscellaneous syntax.

A first step could be splitting HIR into "HIR Def" and "everything else", but moving def paths by themselves further back would also make sense.

@petrochenkov
Copy link
Contributor

Def paths are currently created during expansion, when a freshly expanded AST fragment is integrated into the partially built AST.
That's already early enough for representing expansions with def paths.

bors added a commit that referenced this issue Sep 29, 2019
Remove last uses of gensyms

Bindings are now indexed in resolve with an additional disambiguator that's used for underscore bindings.  This is the last use of gensyms in the compiler.

I'm not completely happy with this approach, so suggestions are welcome. Moving undescore bindings into their own map didn't turn out any better: master...matthewjasper:remove-underscore-gensyms.

closes #49300
cc #60869

r? @petrochenkov
@bors bors closed this as completed in af3d9e5 Oct 16, 2019
@michaelwoerister
Copy link
Member Author

🎉 🎉 🎉 🎉 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-resolve Area: Name resolution A-syntaxext Area: Syntax extensions C-bug Category: This is a bug. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants