Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline Symbol, InternedString, and LocalInternedString. #60869

Closed
nnethercote opened this issue May 16, 2019 · 33 comments
Closed

Streamline Symbol, InternedString, and LocalInternedString. #60869

nnethercote opened this issue May 16, 2019 · 33 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nnethercote
Copy link
Contributor

nnethercote commented May 16, 2019

We currently have three closely-related symbol types.

Symbol is the fundamental type. A Symbol is an index. All operations work on that index. StableHash is not implemented for it, but there's no reason why it couldn't be. A Symbol can be a gensym, which gets special treatment -- it's a guaranteed unique index, even if its chars have been seen before.

InternedString is a thin wrapper around Symbol. You can convert a Symbol to an InternedString. It has two differences with Symbol.

  • Its PartialOrd/Ord/Hash impls use the chars, rather than the index.
  • Gensym-ness is ignored/irrelevant.

LocalInternedString is an alternative that contains a &str. You can convert both Symbol and InternedString to LocalInternedString. Its PartialOrd/Ord/Hash impls (plus PartialEq/Eq) naturally work on chars. Its main use is to provide a way to look some or all of the individual chars within a Symbol or InternedString, which is sometimes necessary.

I have always found the differences between these types confusing and hard to remember. Furthermore, the distinction between Symbol and InternedString is subtle and has caused
bugs.

Also, gensyms in general make things a lot more complicated, and it would be great to eliminate them.

Here's what I would like as a final state.

  • Symbol exists.
  • InternedString does not exist.
  • LocalInternedString perhaps exists, but is only used temporarily when code needs access to the chars within a Symbol. Alternatively, Symbol could provide a with() method (like InternedString currently has) that provides access to the chars, and then LocalInternedString wouldn't be needed.
  • Symbol's impl of Hash uses the index, and its impl of StableHash uses the chars.
  • Not sure about Symbol's impl of PartialOrd/Ord. If a stable ordering is really needed (perhaps for error messages?) we could introduce a StableOrd trait and use that in the relevant places, or do a custom sort, or something.
  • Gensyms don't really exist. They are simulated: when you call gensym(), it just appends a unique suffix. It's worth noting that gensyms are always identifiers, and so the unique suffix can use a non-identifier char. And Interner could keep a counter. So "foo" would gensym to something lke "foo$1", "foo$2", etc. Once the suffix is added, they would just be treated as normal symbols (in terms of hashing, etc.) I would hope that identifier gensyms would never be compared with non-identifier symbols, so a false positive equality match should be impossible. (Different types for identifier symbols and non-identifier symbols would protect against that, but might cause other difficulties.) Alternatively, syntax_pos::symbol::Symbol::gensym() is incompatible with stable hashing. #49300 talks about other ways of dealing with gensyms.
  • All this should also help performance, because we'd end up with more operations on indexes, and only the necessary ones on chars (which require TLS lookups).

I haven't even touched on the way lifetimes work in the interner, which are subtle and error-prone. But having fewer types would only make improvements on that front simpler.

Thoughts?

CC @petrochenkov @Zoxc @eddyb @Mark-Simulacrum @michaelwoerister

@estebank estebank added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 16, 2019
@petrochenkov
Copy link
Contributor

StableHash is not implemented for it, but there's no reason why it couldn't be.

Small correction: StableHash is implemented for Symbol, see impl<'a> HashStable<StableHashingContext<'a>> for ast::Name in impls_syntax.rs.

(ast::Name is an alias of Symbol conventionally used for identifier symbols (in AST/HIR/etc. coming from token::Ident, lowered from ast::Ident.))

@michaelwoerister
Copy link
Member

In general, I'm in favor of simplifying the setup here (and properly documenting it in the process). The main problem here, from the point of view of incremental compilation, is that gensym depends on the state of the entire crate, so changing something in one source file will change gensyms in (potentially) all other source files. As long as gensym indices are generated from a global counter, every tiny change will look to incremental compilation as if you'd renamed half things in your code base.

I think that:

  • the gensym stuff should be factored out of the string interning infrastructure entirely (there's no fundamental reason to intertwine the two), and
  • gensym index generation should use a more stable scheme rather than a global counter.

@petrochenkov
Copy link
Contributor

I'm all for reducing the number of types if that's possible.
I'd rather use a separate "stable" set of operations/traits used by incremental (and query?) infra, than introduce separate types that have to leak into all other parts of the compiler, if that's possible.

I'm not sure the scheme with simply incrementing a session-local counter will work for gensyms.
Gensyms can come from metadata from other crates, encoded as strings and "foo$3" from other crate may mean entirely different thing than "foo$3" based on the local counter.
Perhaps appending long enough hashes will work better.
Anyway, I think the proper solution is to use hygiene and move gensymness from Symbol to Ident, so I'm not sure any intermediate solutions would improve significantly enough on the current situation to implement them. (But you can certainly try.)

@michaelwoerister
Copy link
Member

StableHash is not implemented for it, but there's no reason why it couldn't be.

Small correction: StableHash is implemented for Symbol, see impl<'a> > HashStable<StableHashingContext<'a>> for ast::Name in impls_syntax.rs.

(ast::Name is an alias of Symbol conventionally used for identifier symbols (in AST/HIR/etc. coming from token::Ident, lowered from ast::Ident.))

Ah, interesting. I think the real problem is actually that StableHash must be equivalent to PartialEq, that is:

  • x == y implies stable_hash(x) == stable_hash(y), and
  • x != y implies stable_hash(x) != stable_hash(y).

That second condition makes it more strict than Hash (i.e. no hash collisions are allowed). I going to open PRs trying to document this.

@nnethercote
Copy link
Contributor Author

I tried doing the "foo$3" approach but bootstrapping failed very early on with blatant bugs in name resolution. I suspect it's a dead end for the reasons mentioned above.

I agree that moving the gensym stuff from the symbol/interning level to the ident level sounds like the right idea. (Plenty of the interned strings are not identifiers.) What would "use hygiene" look like for dealing with gensyms? I know that Spans can have a hygiene element, but I don't know what data would be added to the hygiene element.

@petrochenkov
Copy link
Contributor

petrochenkov commented May 16, 2019

What would "use hygiene" look like for dealing with gensyms? I know that Spans can have a hygiene element, but I don't know what data would be added to the hygiene element.

"Gensym span" == Span::def_site() of a unique freshly introduced macro.
This means a unique fresh outer_mark in SyntaxContextData specifically.

Most of gensyms used in the compiler (those introduced by built-in derives or desugarings) don't even need the "unique fresh" part, they can just use Span::def_site() of the macro/desugaring they are introduced by.

A relevant recent thread - #60106.

@nnethercote
Copy link
Contributor Author

I think the proper solution is to use hygiene and move gensymness from Symbol to Ident

#60903 is a first step: it moves the gensym operations from Symbol to Ident. (The gensym implementation details are still within Interner.)

@matthewjasper
Copy link
Contributor

matthewjasper commented May 21, 2019

So I've had a look through the code and done some experimentation with replacing gensyms with hygiene. To give a summary

Don't need to be unique identifiers at all

As far as I can tell, these don't escape into the AST, so the gensym call is unnecessary:

let lhs_nm = ast::Ident::from_str("lhs").gensym();
let rhs_nm = ast::Ident::from_str("rhs").gensym();

I also can't see the point of the module declared here:

let name = Ident::from_str_and_span(&format!("__register_diagnostic_{}", code), span).gensym();

Refactor out of existence

This gensym avoids an assert, before it get converted to an interned string. It could be removed now, but there's probably a larger set of refactorings around type/lifetime parameters so that we aren't using the name of a parameter to determine whether it's Self or not.

let ident = if param.ident.name == keywords::SelfUpper.name() {
param.ident.gensym()
} else {
param.ident
};

Once the async fn desugaring can be moved to HIR lowering, we can also avoid gensyms there.

Could be replaced with _

This doesn't really avoid the gensym, so much as make someone else do it.

Subject to a decision on #61019, we could use _ as the rename instead.

(orig_name_ident.gensym(), Some(orig_name_sym))

Likewise, we could use _ here as well:

Some(Ident::from_str_and_span("__dummy", new_span).gensym()),

Can use hygiene

Making built in derives have def-site hygiene appears to work (from some simple testing). format_args! appears to be similar.

Don't have an associated macro

Global allocators, tests and proc macros have AST passes that use a gensym to hide a global module. We could use hygiene, but we don't have a mark associated to a macro expansion, so the compiler will ICE here:

let def_id = self.macro_defs[&expansion];

Tests

Here we run in to trouble with because the obvious way to add hygiene runs into the problem with this code:

#![feature(decl_macro)]

macro a() {
    pub struct A;

    mod module {
        use super::A; // Fails to resolve, since the the context of `A` is stripped when we try to resolve it in the root.
    }
}

a!();

Resolve and _

Finally we use gensyms to allow multiple items called _ in the same module. I'm not sure what the motivation for using gensyms rather than a special case in resolve is. But this seems to be the case that is most likely to need "gensym spans"

@michaelwoerister
Copy link
Member

Wow, nice write-up, @matthewjasper!

@eddyb
Copy link
Member

eddyb commented May 22, 2019

Global allocators, tests and proc macros have AST passes that use a gensym to hide a global module. We could use hygiene, but we don't have a mark associated to a macro expansion, so the compiler will ICE here:

Could we get the Mark from the fact that in all of these cases there is an attribute?
And perform all the work as expanding that attribute?

@matthewjasper
Copy link
Contributor

There's one attribute per test. Since a user isn't obligated to write at least one test we don't necessarily have a mark but we still have to hide the main test runner function.

Global allocators could probably use the attribute.

@petrochenkov
Copy link
Contributor

petrochenkov commented May 22, 2019

I global allocators can be formulated in terms of a proc macro, that would be great (one less special case).

If not, then we can create fresh marks for them (Mark::fresh, which is already used in various non-macro places, even too liberally perhaps), it just needs to be "registered" in resolve, so it has a module parent and other associated data, and doesn't ICE on attempts to access them.

@eddyb
Copy link
Member

eddyb commented May 22, 2019

@matthewjasper You can make --test inject a #![test] on the crate (or #![test_harness] etc.)
I don't remember who was working in this area, but there were some plans to make it all less magical, for various reasons (fixing bugs and allowing custom test harnesses) - cc @Manishearth

@petrochenkov
Copy link
Contributor

#[global_allocator] is turned into a macro in #62735.
I'm pretty sure now that the gensym in it can be removed together with the whole generated allocator_abi module and some imports.

bors added a commit that referenced this issue Aug 13, 2019
Don't special case the `Self` parameter by name

This results in a couple of small diagnostic regressions. They could be avoided by keeping the special case just for diagnostics, but that seems worse.

closes #50125
cc #60869
Centril added a commit to Centril/rust that referenced this issue Aug 15, 2019
…ves, r=petrochenkov

Opaque builtin derive macros

* Buiilt-in derives are now opaque macros
    * This required limiting the visibility of some previously unexposed functions in `core`.
    * This also required the change to `Ident` serialization.
* All gensyms are replaced with hygienic identifiers
* Use hygiene to avoid most other name-resolution issues with buiilt-in derives.
    *  As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth.
* Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed.
    * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't).
    * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports.
    * `SyntaxContext` is no longer serializable.
    * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now.
* Move type/const parameter shadowing checks to `resolve`
    * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers.
* Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}`

cc rust-lang#60869
r? @petrochenkov
Centril added a commit to Centril/rust that referenced this issue Aug 15, 2019
…ves, r=petrochenkov

Opaque builtin derive macros

* Buiilt-in derives are now opaque macros
    * This required limiting the visibility of some previously unexposed functions in `core`.
    * This also required the change to `Ident` serialization.
* All gensyms are replaced with hygienic identifiers
* Use hygiene to avoid most other name-resolution issues with buiilt-in derives.
    *  As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth.
* Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed.
    * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't).
    * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports.
    * `SyntaxContext` is no longer serializable.
    * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now.
* Move type/const parameter shadowing checks to `resolve`
    * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers.
* Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}`

cc rust-lang#60869
r? @petrochenkov
Centril added a commit to Centril/rust that referenced this issue Aug 15, 2019
…ves, r=petrochenkov

Opaque builtin derive macros

* Buiilt-in derives are now opaque macros
    * This required limiting the visibility of some previously unexposed functions in `core`.
    * This also required the change to `Ident` serialization.
* All gensyms are replaced with hygienic identifiers
* Use hygiene to avoid most other name-resolution issues with buiilt-in derives.
    *  As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth.
* Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed.
    * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't).
    * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports.
    * `SyntaxContext` is no longer serializable.
    * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now.
* Move type/const parameter shadowing checks to `resolve`
    * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers.
* Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}`

cc rust-lang#60869
r? @petrochenkov
Centril added a commit to Centril/rust that referenced this issue Aug 15, 2019
…ves, r=petrochenkov

Opaque builtin derive macros

* Buiilt-in derives are now opaque macros
    * This required limiting the visibility of some previously unexposed functions in `core`.
    * This also required the change to `Ident` serialization.
* All gensyms are replaced with hygienic identifiers
* Use hygiene to avoid most other name-resolution issues with buiilt-in derives.
    *  As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth.
* Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed.
    * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't).
    * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports.
    * `SyntaxContext` is no longer serializable.
    * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now.
* Move type/const parameter shadowing checks to `resolve`
    * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers.
* Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}`

cc rust-lang#60869
r? @petrochenkov
bors added a commit that referenced this issue Aug 17, 2019
…ochenkov

Opaque builtin derive macros

* Buiilt-in derives are now opaque macros
    * This required limiting the visibility of some previously unexposed functions in `core`.
    * This also required the change to `Ident` serialization.
* All gensyms are replaced with hygienic identifiers
* Use hygiene to avoid most other name-resolution issues with buiilt-in derives.
    *  As far as I know the only remaining case that breaks is an ADT that has the same name as one of its parameters. Fixing this completely seemed to be more effort than it's worth.
* Remove gensym in `Ident::decode`, which lead to linker errors due to `inline` being gensymmed.
    * `Ident`now panics if incremental compilation tries to serialize it (it currently doesn't).
    * `Ident` no longer uses `gensym` to emulate cross-crate hygiene. It only applied to reexports.
    * `SyntaxContext` is no longer serializable.
    * The long-term fix for this is to properly implement cross-crate hygiene, but this seemed to be acceptable for now.
* Move type/const parameter shadowing checks to `resolve`
    * This was previously split between resolve and type checking. The type checking pass compared `InternedString`s, not Identifiers.
* Removed the `SyntaxContext` from `{ast, hir}::{InlineAsm, GlobalAsm}`

cc #60869
r? @petrochenkov
bors added a commit that referenced this issue Aug 19, 2019
Don't special case the `Self` parameter by name

This results in a couple of small diagnostic regressions. They could be avoided by keeping the special case just for diagnostics, but that seems worse.

closes #50125
cc #60869
Centril added a commit to Centril/rust that referenced this issue Sep 7, 2019
…etrochenkov

Use hygiene for AST passes

AST passes are now able to have resolve consider their expansions as if they were opaque macros defined either in some module in the current crate, or a fake empty module with `#[no_implicit_prelude]`.

* Add an ExpnKind for AST passes.
* Remove gensyms in AST passes.
* Remove gensyms in`#[test]`, `#[bench]` and `#[test_case]`.
* Allow opaque macros to define tests.
* Move tests for unit tests to their own directory.
* Remove `Ident::{gensym, is_gensymed}` - `Ident::gensym_if_underscore` still exists.

cc rust-lang#60869, rust-lang#61019

r? @petrochenkov
bors added a commit that referenced this issue Sep 29, 2019
Remove last uses of gensyms

Bindings are now indexed in resolve with an additional disambiguator that's used for underscore bindings.  This is the last use of gensyms in the compiler.

I'm not completely happy with this approach, so suggestions are welcome. Moving undescore bindings into their own map didn't turn out any better: master...matthewjasper:remove-underscore-gensyms.

closes #49300
cc #60869

r? @petrochenkov
tmandry added a commit to tmandry/rust that referenced this issue Oct 15, 2019
…=petrochenkov

Remove last uses of gensyms

Underscore bindings now use unique `SyntaxContext`s to avoid collisions. This was the last use of gensyms in the compiler, so this PR also removes them.

closes rust-lang#49300
cc rust-lang#60869

r? @petrochenkov
@matthewjasper
Copy link
Contributor

Gensyms are now gone, that should unblock the other clean up here.

@nnethercote
Copy link
Contributor Author

LocalInternedString perhaps exists, but is only used temporarily when code needs access to the chars within a Symbol. Alternatively, Symbol could provide a with() method (like InternedString currently has) that provides access to the chars, and then LocalInternedString wouldn't be needed.

#64141 and #65426 greatly reduced the usage and capability of LocalInternedString, getting it pretty close to this desired state. I also tried adding Symbol::with()... it works in principle, but in practice it's pretty annoying, and there are hundreds of use sites that would need changing. So I gave up on that.

@nnethercote
Copy link
Contributor Author

I'm looking now into removing InternedString. To do this, we must effectively merge it with Symbol. There are two obvious possibilities:

  • Change Symbol::{Ord,Hash} to work with the symbol chars. This appears to work, but I get some performance regressions of up to 3%. The regressions come much more from Hash than from Ord; the char-based hashing is more expensive, plus we have to access TLS to get the chars in the first place (which is also bad for the parallel compiler).
  • Change InternedString::{Ord,Hash} to work with the symbol index. I get a few test errors this way, on tests relating to stability of names. This would be my preferred option if I can get it to work. I suspect only a handful of places actually need the char-based operations.

@nnethercote
Copy link
Contributor Author

nnethercote commented Oct 17, 2019

Here are the first test failures I get when I change InternedString::Ord to work with the symbol index instead of the chars:

---- [run-make] run-make-fulldeps/reproducible-build stdout ----

error: make failed
status: exit code: 2
command: "make"
stdout:
------------------------------------------
rm -rf /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build && mkdir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build  linker.rs -O
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build  reproducible-build-aux.rs
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build  reproducible-build.rs -C linker=/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build  reproducible-build.rs -C linker=/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker
diff -u "/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker-arguments1" "/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker-arguments2"
--- /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker-arguments1	2019-10-18 09:31:45.248924701 +1100
+++ /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/linker-arguments2	2019-10-18 09:31:45.456923894 +1100
@@ -5,7 +5,7 @@
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.0.rcgu.o: 9024332235029870339
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.1.rcgu.o: 8252971569739609253
-/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.2.rcgu.o: 3534221490709993710
+/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.2.rcgu.o: 1155913470043241769
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.3.rcgu.o: 10237418006570950210
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.4.rcgu.o: 15369966276953066318
 /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build/reproducible-build/reproducible-build.reproducible_build.7rcbfp3g-cgu.5.rcgu.o: 320128326741088814

------------------------------------------
stderr:
------------------------------------------
make: *** [Makefile:21: smoke] Error 1

------------------------------------------


---- [run-make] run-make-fulldeps/reproducible-build-2 stdout ----

error: make failed
status: exit code: 2
command: "make"
stdout:
------------------------------------------
rm -rf /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2 && mkdir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2 -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2  reproducible-build-aux.rs
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2 -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2  reproducible-build.rs -C lto=fat
cp /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build-a
LD_LIBRARY_PATH="/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/lib:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0-bootstrap-tools/x86_64-unknown-linux-gnu/release/deps:/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage0/lib:/home/njn/local/lib:" '/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/stage2/bin/rustc' --out-dir /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2 -L /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2  reproducible-build.rs -C lto=fat
cmp "/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build-a" "/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build" || exit 1
/home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build-a /home/njn/moz/rust3/build/x86_64-unknown-linux-gnu/test/run-make-fulldeps/reproducible-build-2/reproducible-build-2/reproducible-build differ: byte 781, line 1

------------------------------------------
stderr:
------------------------------------------
make: *** [Makefile:17: fat_lto] Error 1

------------------------------------------

failures:
    [run-make] run-make-fulldeps/reproducible-build
    [run-make] run-make-fulldeps/reproducible-build-2

test result: FAILED. 0 passed; 2 failed; 199 ignored; 0 measured; 0 filtered out

@nnethercote
Copy link
Contributor Author

Here is a representative failure I get when I change InternedString::Hash to work with the symbol index instead of the chars:

---- [ui] ui/symbol-names/basic.rs#legacy stdout ----
diff of stderr:

-	error: symbol-name(_ZN5basic4main17hd72940ef9669d526E)
+	error: symbol-name(_ZN5basic4main17h0e00d3497edec60eE)
2	  --> $DIR/basic.rs:7:1
3	   |
4	LL | #[rustc_symbol_name]

5	   | ^^^^^^^^^^^^^^^^^^^^
6	
-	error: demangling(basic::main::hd72940ef9669d526)
+	error: demangling(basic::main::h0e00d3497edec60e)
8	  --> $DIR/basic.rs:7:1
9	   |
10	LL | #[rustc_symbol_name]

AFAICT the symbol suffix changed in this test is produced by get_symbol_hash(), which is interesting because it appears to only be using stable hashing, which still uses the chars:

fn get_symbol_hash<'tcx>(
tcx: TyCtxt<'tcx>,
// instance this name will be for
instance: Instance<'tcx>,
// type of the item, without any generic
// parameters substituted; this is
// included in the hash as a kind of
// safeguard.
item_type: Ty<'tcx>,
instantiating_crate: Option<CrateNum>,
) -> u64 {
let def_id = instance.def_id();
let substs = instance.substs;
debug!(
"get_symbol_hash(def_id={:?}, parameters={:?})",
def_id, substs
);
let mut hasher = StableHasher::new();
let mut hcx = tcx.create_stable_hashing_context();
record_time(&tcx.sess.perf_stats.symbol_hash_time, || {
// the main symbol name is not necessarily unique; hash in the
// compiler's internal def-path, guaranteeing each symbol has a
// truly unique path
tcx.def_path_hash(def_id).hash_stable(&mut hcx, &mut hasher);
// Include the main item-type. Note that, in this case, the
// assertions about `needs_subst` may not hold, but this item-type
// ought to be the same for every reference anyway.
assert!(!item_type.has_erasable_regions());
hcx.while_hashing_spans(false, |hcx| {
hcx.with_node_id_hashing_mode(NodeIdHashingMode::HashDefPath, |hcx| {
item_type.hash_stable(hcx, &mut hasher);
});
});
// If this is a function, we hash the signature as well.
// This is not *strictly* needed, but it may help in some
// situations, see the `run-make/a-b-a-linker-guard` test.
if let ty::FnDef(..) = item_type.kind {
item_type.fn_sig(tcx).hash_stable(&mut hcx, &mut hasher);
}
// also include any type parameters (for generic items)
assert!(!substs.has_erasable_regions());
assert!(!substs.needs_subst());
substs.hash_stable(&mut hcx, &mut hasher);
if let Some(instantiating_crate) = instantiating_crate {
(&tcx.original_crate_name(instantiating_crate).as_str()[..])
.hash_stable(&mut hcx, &mut hasher);
(&tcx.crate_disambiguator(instantiating_crate)).hash_stable(&mut hcx, &mut hasher);
}
// We want to avoid accidental collision between different types of instances.
// Especially, VtableShim may overlap with its original instance without this.
discriminant(&instance.def).hash_stable(&mut hcx, &mut hasher);
});
// 64 bits should be enough to avoid collisions.
hasher.finish::<u64>()
}

But maybe some non-stable hashing is sneaking into that computation somehow?

@Mark-Simulacrum
Copy link
Member

It would be worrisome for non-stable hashing to be sneaking in, but I suspect not impossible. I'm not sure how to try and track that down. (Maybe @michaelwoerister has thoughts?).

If the new hash is still stable though it seems fine to switch over to that? i.e., if the test just needs updating but not in a constant manner.

@nnethercote
Copy link
Contributor Author

Here is a representative failure I get when I change InternedString::Hash to work with the symbol index instead of the chars:

I changed the InternedStrings within DefPathData to Symbols and this failure reproduces, so that narrows it down a lot. I think def_path_hash() is involved somehow, and maybe Definitions::next_disambiguator. I think the new hash value is stable. So if these suffixes aren't supposed to be stable across different rustc versions then updating the test seems ok.

bors added a commit that referenced this issue Oct 18, 2019
[DO NOT MERGE] Remove `InternedString`

This is a proof of concept relating to #60869. It does the following:
- Makes `Symbol` equivalent to `InternedString`, primarily by Changing `Symbol`'s `PartialOrd`, `Ord`, and `Hash` impls to work on the chars instead of the index.
- Removes `InternedString`.

It shows that this approach works, but causes some performance regressions.

r? @ghost
@nnethercote
Copy link
Contributor Author

I have confirmed that changing Symbol to work with chars (for ordering and hashing) works: #65543 has the code. That would be the easiest path toward eliminating InternedString, but I don't think it's the best one, because it's pre-emptively giving up on the performance advantages of Symbol.

I am also working on converting InternedString occurrences to Symbol in chunks, in order to work out which conversions are problematic.

@nnethercote
Copy link
Contributor Author

nnethercote commented Oct 18, 2019

Here are the first test failures I get when I change InternedString::Ord to work with the symbol index instead of the chars:

I can reproduce these just by changing SymbolName::name from InternedString to Symbol.

@eddyb
Copy link
Member

eddyb commented Oct 18, 2019

Should either of the types even implement Ord? As for stable hashing, try removing the Hash impl from InternedString and see what errors, only StableHash should be used.

@nnethercote
Copy link
Contributor Author

Should either of the types even implement Ord?

It's used in various places, mostly sorting things for error messages.

try removing the Hash impl from InternedString and see what errors, only StableHash should be used.

Definitions::next_disambiguator seems to be the only use.

@eddyb
Copy link
Member

eddyb commented Oct 18, 2019

Sounds like next_disambiguator should be using StableHash.

It's used in various places, mostly sorting things for error messages.

Might make more sense to use String for sorting errors. cc @nikomatsakis @estebank

@nnethercote
Copy link
Contributor Author

I looked more closely, and my description of how Ord and Hash are used was woefully incomplete. I will give a more comprehensive description on Monday.

Centril added a commit to Centril/rust that referenced this issue Oct 19, 2019
…=petrochenkov

More symbol cleanups

Some minor improvements, mostly aimed at reducing unimportant differences between `Symbol` and `InternedString`. Helps a little with rust-lang#60869.

r? @petrochenkov
Centril added a commit to Centril/rust that referenced this issue Oct 19, 2019
…=petrochenkov

More symbol cleanups

Some minor improvements, mostly aimed at reducing unimportant differences between `Symbol` and `InternedString`. Helps a little with rust-lang#60869.

r? @petrochenkov
Centril added a commit to Centril/rust that referenced this issue Oct 19, 2019
…=petrochenkov

More symbol cleanups

Some minor improvements, mostly aimed at reducing unimportant differences between `Symbol` and `InternedString`. Helps a little with rust-lang#60869.

r? @petrochenkov
@nnethercote
Copy link
Contributor Author

Here is how the various impls are used. These lists may be incomplete.

Symbol::Hash needed for:

  • Ident::Hash
  • Stability::Hash
  • StabilityLevel::Hash
  • RustcDeprecation::Hash
  • BUILTIN_ATTRIBUTE_MAP
  • edition_enabled_features
  • probably other things too

Symbol::{PartialOrd,Ord} needed for:

  • ProjectionElem::{PartialOrd,Ord}
  • BoundNameCollector::regions: BTreeSet

InternedString::Hash needed for:

  • DefKey::compute_stable_hash calls name.hash() on a name obtained from a
    DisambiguatedDefPathData
  • InternedString is used within DefPathData, which derives Hash
    • Required for Definitions::next_disambiguator
  • CompileCodegenUnit/codegen_unit queries
    • Because QueryConfig::Key requires Hash, for QueryCache

InternedString::{PartialOrd,Ord} needed for:

  • ToStableHashKey impl uses Ord, for various sorts
    • Perhaps it should really use OrdStable, like it uses HashStable?

@eddyb
Copy link
Member

eddyb commented Oct 21, 2019

  • Symbol's Hash should probably be kept as-is, for O(1) hashing
  • InternedString's Hash can probably be replaced by Symbol's StableHash

Why does ProjectionElem need Ord? Sounds like uses of Symbol's Ord actually should be using InternedString's semantics, especially if user-facing.

OrdStable on Symbol and derived on containing types sounds good, if we can't get away with manually computing contents-based Orderings in a couple places (or using String in perf-uncritical code).

@nnethercote
Copy link
Contributor Author

I have a patch stack that eliminates InternedString entirely by converting to Symbol and using LocalInternedString in the handful of places where we need stability. I will file a PR once I do some more clean-ups and perf measurements.

@nnethercote
Copy link
Contributor Author

After #65657, I have pretty much everything I want:

  • InternedString is gone.
  • LocalInternedString is very limited.
  • Symbols's Hash, PartialOrd and Ord impls all use the index.

@nnethercote
Copy link
Contributor Author

nnethercote commented Oct 24, 2019

#65657 finished this off.

@nnethercote
Copy link
Contributor Author

#65776 is a final follow-up that just cleans some stuff up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants