Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICH: Replace old, transitive metadata hashing with direct hashing approach. #41141

Merged
merged 3 commits into from
Apr 12, 2017

Conversation

michaelwoerister
Copy link
Member

This PR replaces the old crate metadata hashing strategy with a new one that directly (but stably) hashes all values we encode into the metadata. Previously we would track what data got accessed during metadata encoding and then hash the input nodes (HIR and upstream metadata) that were transitively reachable from the accessed data. While this strategy was sound, it had two major downsides:

  1. It was susceptible to generating false positives, i.e. some input node might have changed without actually affecting the content of the metadata. That metadata entry would still show up as changed.
  2. It was susceptible to quadratic blow-up when many metadata nodes shared the same input nodes, which would then get hashed over and over again.

The new method does not have these disadvantages and it's also a first step towards caching more intermediate results in the compiler.

Metadata hashing/cross-crate incremental compilation is still kept behind the -Zincremental-cc flag even after this PR. Once the new method has proven itself with more tests, we can remove the flag and enable cross-crate support by default again.

r? @nikomatsakis
cc @rust-lang/compiler

{
if let Some((ref mut hcx, ref mut hasher)) = self.hcx {
let items: Vec<T> = iter.into_iter().collect();
items.hash_stable(hcx, hasher);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why would this be needed? Can't you do it with an iterator adapter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we want to hash the length of the sequence before its elements. But maybe I could only copy out into a Vec if size_hint() is ambiguous...

Copy link
Member

@eddyb eddyb Apr 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, that is annoying. It's not the only way to hash a sequence (you could imagine having 1, element, 1, element, ..., 1, element, 0 or something along those lines), but, yeah, size_hint. It might not be available in a few cases, but IIRC we have to collect anyway for a different reason.
Btw, if you do use size_hint, assert that it matches up.

_: &mut CTX,
_: &mut StableHasher<W>) {
// There's nothing to do. Whatever got encoded within this Lazy<>
// wrapper has already been hashed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange indentation (here and elsewhere).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix that.

@bors
Copy link
Contributor

bors commented Apr 7, 2017

☔ The latest upstream changes (presumably #41138) made this pull request unmergeable. Please resolve the merge conflicts.

@michaelwoerister michaelwoerister force-pushed the direct-metadata-ich-final branch from ae57ed1 to 8a96ad3 Compare April 10, 2017 09:48
@michaelwoerister
Copy link
Member Author

Rebased and @eddyb's concerns addressed (but see 8a96ad3#diff-f55f5d2e86bc0e08c341b51fc78b27e4R1162, not sure if that's an improvement).

lower_bound.hash_stable(hcx, hasher);
let ret = self.ecx.lazy_seq(iter.inspect(|item| item.hash_stable(hcx, hasher)));
debug!("metadata-hash: {:?}", hasher);
ret
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should assert that the number of times the inspect callback gets called is equal to lower_bound.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Genius!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@michaelwoerister michaelwoerister force-pushed the direct-metadata-ich-final branch 2 times, most recently from 6dd5053 to b41d98d Compare April 10, 2017 12:07
Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't finish this review but here were some (outdated-ish) comments


// The other fields just provide fast access to information that is
// also contained in `sty`, so no need to hash them.
..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to write out the names of those fields explicitly. My test for this is usually "if I added a new field, is it possible that it would require changes here?" And it seems plausible, though unlikely, that the answer would be "yes" in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to do that but one of the fields is private. I guess I have to bite the bullet and move the impl to the defining module.

}

TyError |
TyInfer(..) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Horray for being exhaustive :)

ref fru_field_types,

ref cast_kinds,
lints: _,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems... fine to skip these, at least for the purpose of metadata hashing, but not obviously fine for other possible uses (i.e., the red/green incremental algorithm). However, by then I hope that we've refactored lints into "just another case of diagnostic output". Maybe a FIXME pointing at ... idk, some issue? Perhaps not necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to spend time on something that does not make a difference now and will probably change in the future, but creating a FIXME issue is definitely a good idea.

@@ -96,6 +149,7 @@ impl<W> StableHasher<W> {
let mut buf = [0; 16];
let len = write_unsigned_leb128_to_buf(&mut buf, value);
self.state.write(&buf[..len]);
self.debug.write(&buf[..len]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to replace self.state with (in debug mode) a wrapper? This seems good but a bit easy to overlook.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it. Sounds like an improvement.

@michaelwoerister
Copy link
Member Author

All comments so far are addressed now, I think.

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this looks pretty good but I think there is one simplification that can still be made. In particular, when we do the graph compression, we consider metadata nodes as "outputs". This means that graph compression will ensure that we track their (transitive) inputs. I think that with this PR we do not need to consider those metadata nodes as outputs, since we no longer need to enumerate their transitive inputs.

@michaelwoerister
Copy link
Member Author

With the changes here, we don't create output metadata nodes at all anymore, so we never should hit the code path you linked. But I'll clean it up, make it explicit that we don't expect to find these kinds of nodes.

In the future, if we ever implement a way to update metadata incrementally and in place (and there are some compelling reasons for which I should write up in an issue), we might want to start tracking output metadata nodes again.

@michaelwoerister michaelwoerister force-pushed the direct-metadata-ich-final branch from c2dcf2a to d78453f Compare April 11, 2017 09:11
@nikomatsakis
Copy link
Contributor

In the future, if we ever implement a way to update metadata incrementally and in place (and there are some compelling reasons for which I should write up in an issue), we might want to start tracking output metadata nodes again.

Yes.

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Apr 11, 2017

📌 Commit d78453f has been approved by nikomatsakis

@bors
Copy link
Contributor

bors commented Apr 12, 2017

☔ The latest upstream changes (presumably #41237) made this pull request unmergeable. Please resolve the merge conflicts.

…roach.

Instead of collecting all potential inputs to some metadata entry and
hashing those, we directly hash the values we are storing in metadata.
This is more accurate and doesn't suffer from quadratic blow-up when
many entries have the same dependencies.
@michaelwoerister michaelwoerister force-pushed the direct-metadata-ich-final branch from d78453f to ca2dce9 Compare April 12, 2017 09:47
@michaelwoerister
Copy link
Member Author

@bors r=nikomatsakis

Rebased.

@bors
Copy link
Contributor

bors commented Apr 12, 2017

📌 Commit ca2dce9 has been approved by nikomatsakis

TimNN added a commit to TimNN/rust that referenced this pull request Apr 12, 2017
…h-final, r=nikomatsakis

ICH: Replace old, transitive metadata hashing with direct hashing approach.

This PR replaces the old crate metadata hashing strategy with a new one that directly (but stably) hashes all values we encode into the metadata. Previously we would track what data got accessed during metadata encoding and then hash the input nodes (HIR and upstream metadata) that were transitively reachable from the accessed data. While this strategy was sound, it had two major downsides:

1. It was susceptible to generating false positives, i.e. some input node might have changed without actually affecting the content of the metadata. That metadata entry would still show up as changed.
2. It was susceptible to quadratic blow-up when many metadata nodes shared the same input nodes, which would then get hashed over and over again.

The new method does not have these disadvantages and it's also a first step towards caching more intermediate results in the compiler.

Metadata hashing/cross-crate incremental compilation is still kept behind the `-Zincremental-cc` flag even after this PR. Once the new method has proven itself with more tests, we can remove the flag and enable cross-crate support by default again.

r? @nikomatsakis
cc @rust-lang/compiler
bors added a commit that referenced this pull request Apr 12, 2017
Rollup of 9 pull requests

- Successful merges: #41063, #41087, #41141, #41166, #41183, #41205, #41206, #41232, #41243
- Failed merges:
@bors bors merged commit ca2dce9 into rust-lang:master Apr 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants