-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICH: Replace old, transitive metadata hashing with direct hashing approach. #41141
ICH: Replace old, transitive metadata hashing with direct hashing approach. #41141
Conversation
{ | ||
if let Some((ref mut hcx, ref mut hasher)) = self.hcx { | ||
let items: Vec<T> = iter.into_iter().collect(); | ||
items.hash_stable(hcx, hasher); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why would this be needed? Can't you do it with an iterator adapter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we want to hash the length of the sequence before its elements. But maybe I could only copy out into a Vec
if size_hint()
is ambiguous...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, that is annoying. It's not the only way to hash a sequence (you could imagine having 1, element, 1, element, ..., 1, element, 0
or something along those lines), but, yeah, size_hint
. It might not be available in a few cases, but IIRC we have to collect anyway for a different reason.
Btw, if you do use size_hint
, assert that it matches up.
src/librustc_metadata/schema.rs
Outdated
_: &mut CTX, | ||
_: &mut StableHasher<W>) { | ||
// There's nothing to do. Whatever got encoded within this Lazy<> | ||
// wrapper has already been hashed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange indentation (here and elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix that.
☔ The latest upstream changes (presumably #41138) made this pull request unmergeable. Please resolve the merge conflicts. |
ae57ed1
to
8a96ad3
Compare
Rebased and @eddyb's concerns addressed (but see 8a96ad3#diff-f55f5d2e86bc0e08c341b51fc78b27e4R1162, not sure if that's an improvement). |
lower_bound.hash_stable(hcx, hasher); | ||
let ret = self.ecx.lazy_seq(iter.inspect(|item| item.hash_stable(hcx, hasher))); | ||
debug!("metadata-hash: {:?}", hasher); | ||
ret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should assert that the number of times the inspect
callback gets called is equal to lower_bound
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Genius!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
6dd5053
to
b41d98d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't finish this review but here were some (outdated-ish) comments
src/librustc/ich/impls_ty.rs
Outdated
|
||
// The other fields just provide fast access to information that is | ||
// also contained in `sty`, so no need to hash them. | ||
.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to write out the names of those fields explicitly. My test for this is usually "if I added a new field, is it possible that it would require changes here?" And it seems plausible, though unlikely, that the answer would be "yes" in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to do that but one of the fields is private. I guess I have to bite the bullet and move the impl to the defining module.
} | ||
|
||
TyError | | ||
TyInfer(..) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Horray for being exhaustive :)
src/librustc/ich/impls_ty.rs
Outdated
ref fru_field_types, | ||
|
||
ref cast_kinds, | ||
lints: _, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems... fine to skip these, at least for the purpose of metadata hashing, but not obviously fine for other possible uses (i.e., the red/green incremental algorithm). However, by then I hope that we've refactored lints into "just another case of diagnostic output". Maybe a FIXME pointing at ... idk, some issue? Perhaps not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to spend time on something that does not make a difference now and will probably change in the future, but creating a FIXME issue is definitely a good idea.
@@ -96,6 +149,7 @@ impl<W> StableHasher<W> { | |||
let mut buf = [0; 16]; | |||
let len = write_unsigned_leb128_to_buf(&mut buf, value); | |||
self.state.write(&buf[..len]); | |||
self.debug.write(&buf[..len]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to replace self.state
with (in debug mode) a wrapper? This seems good but a bit easy to overlook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into it. Sounds like an improvement.
b41d98d
to
c2dcf2a
Compare
All comments so far are addressed now, I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this looks pretty good but I think there is one simplification that can still be made. In particular, when we do the graph compression, we consider metadata nodes as "outputs". This means that graph compression will ensure that we track their (transitive) inputs. I think that with this PR we do not need to consider those metadata nodes as outputs, since we no longer need to enumerate their transitive inputs.
With the changes here, we don't create output metadata nodes at all anymore, so we never should hit the code path you linked. But I'll clean it up, make it explicit that we don't expect to find these kinds of nodes. In the future, if we ever implement a way to update metadata incrementally and in place (and there are some compelling reasons for which I should write up in an issue), we might want to start tracking output metadata nodes again. |
c2dcf2a
to
d78453f
Compare
Yes. |
@bors r+ |
📌 Commit d78453f has been approved by |
☔ The latest upstream changes (presumably #41237) made this pull request unmergeable. Please resolve the merge conflicts. |
…roach. Instead of collecting all potential inputs to some metadata entry and hashing those, we directly hash the values we are storing in metadata. This is more accurate and doesn't suffer from quadratic blow-up when many entries have the same dependencies.
d78453f
to
ca2dce9
Compare
@bors r=nikomatsakis Rebased. |
📌 Commit ca2dce9 has been approved by |
…h-final, r=nikomatsakis ICH: Replace old, transitive metadata hashing with direct hashing approach. This PR replaces the old crate metadata hashing strategy with a new one that directly (but stably) hashes all values we encode into the metadata. Previously we would track what data got accessed during metadata encoding and then hash the input nodes (HIR and upstream metadata) that were transitively reachable from the accessed data. While this strategy was sound, it had two major downsides: 1. It was susceptible to generating false positives, i.e. some input node might have changed without actually affecting the content of the metadata. That metadata entry would still show up as changed. 2. It was susceptible to quadratic blow-up when many metadata nodes shared the same input nodes, which would then get hashed over and over again. The new method does not have these disadvantages and it's also a first step towards caching more intermediate results in the compiler. Metadata hashing/cross-crate incremental compilation is still kept behind the `-Zincremental-cc` flag even after this PR. Once the new method has proven itself with more tests, we can remove the flag and enable cross-crate support by default again. r? @nikomatsakis cc @rust-lang/compiler
This PR replaces the old crate metadata hashing strategy with a new one that directly (but stably) hashes all values we encode into the metadata. Previously we would track what data got accessed during metadata encoding and then hash the input nodes (HIR and upstream metadata) that were transitively reachable from the accessed data. While this strategy was sound, it had two major downsides:
The new method does not have these disadvantages and it's also a first step towards caching more intermediate results in the compiler.
Metadata hashing/cross-crate incremental compilation is still kept behind the
-Zincremental-cc
flag even after this PR. Once the new method has proven itself with more tests, we can remove the flag and enable cross-crate support by default again.r? @nikomatsakis
cc @rust-lang/compiler