-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory-map the dep-graph instead of reading it up front #95543
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit e830280c8b3c26f05f2cbef989089caaf59d09ee with merge b4d4e4dac6803298c682aa051f69e224d2b1a41b... |
☀️ Try build successful - checks-actions |
Queued b4d4e4dac6803298c682aa051f69e224d2b1a41b with parent 0677edc, future comparison URL. |
Finished benchmarking commit (b4d4e4dac6803298c682aa051f69e224d2b1a41b): comparison url. Summary: This benchmark run shows 158 relevant improvements 🎉 but 8 relevant regressions 😿 to instruction counts.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never |
Thanks for the PR, @cjgillot. This looks very promising! I'll take a close look next week. |
In addition to the perf report: the "dep_graph.bin" file is roughly ~30% larger with this PR. This is the same order of magnitude as #83322. |
impl<'a, T> !MmapSafe for &'a T {} | ||
impl<'a, T> !MmapSafe for &'a mut T {} | ||
impl<T> !MmapSafe for *const T {} | ||
impl<T> !MmapSafe for *mut T {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also impl !MmapSafe
for UnsafeCell
?
let mask = align - 1; | ||
let extra = pos & mask; | ||
let padding = if extra == 0 { 0 } else { align - extra }; | ||
self.write_all(&ALIGN[..padding])?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe skip this call altogether when padding is 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering if it would be better to just assert that the data will be aligned, instead of adding padding. Then would just add some padding before the entire array? But I'm not sure if that's actually better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we can't assert alignment. When encoding a Fingerprint
(align 8) after SerializedDepNodeIndex
es (align 4), there is no reason we should expect the encoder to be aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point.
Here are some thoughts:
Adapted approach to encoding fingerprint & dependenciesI think we might be able to keep on-disk size somewhat in check and avoid adding the unsafe
|
Using odht: |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit c65e12234de16a1f4a6083a4ae782601951702ea with merge c3448cac0af5bae4050e40f9c4578d364e24ab99... |
☀️ Try build successful - checks-actions |
Queued c3448cac0af5bae4050e40f9c4578d364e24ab99 with parent 8c1fb2e, future comparison URL. |
⌛ Trying commit c7d99a215aa7aef065d6475b64a85771abb8384d with merge 7402e377e7bb74b8a133d3b7d9b88184bb0e7039... |
☀️ Try build successful - checks-actions |
Queued 7402e377e7bb74b8a133d3b7d9b88184bb0e7039 with parent f99f9e4, future comparison URL. |
Finished benchmarking commit (7402e377e7bb74b8a133d3b7d9b88184bb0e7039): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Footnotes |
Hey y'all, we briefly looked at this PR in the T-compiler triage meeting and I'm curious what the overall trajectory is. (zulip chathttps://rust-lang.zulipchat.com/#narrow/stream/238009-t-compiler.2Fmeetings/topic/.5Bweekly.5D.202022-07-28/near/291197831) Namely, it seemed to me like there were really promising performance gains back in March. But then all the changes since then seem like they've decreased those gains, and have culminated in outright regressions now. Any idea what's best to do here? |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 9b4c844 with merge b9d04ffd4dccc08a93273e66f1a77a0b48597ce6... |
☀️ Try build successful - checks-actions |
Queued b9d04ffd4dccc08a93273e66f1a77a0b48597ce6 with parent 9fa62f2, future comparison URL. |
Finished benchmarking commit (b9d04ffd4dccc08a93273e66f1a77a0b48597ce6): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Footnotes |
Seems it's not worth it any more. |
An incremental compilation session starts by reading the dep-graph from disk. This step allocates a lot of memory to store the whole graph. Most of this memory will be used at most once: a node's dependencies are only checked when before trying to force it, and its fingerprint once it has been re-computed.
This PR proposes to skip reading the fingerprints and dependencies at the beginning of the compilation session, and to only load them when necessary. To achieve that, the dep-graph file remains accessible through a memmap throughout the compilation session.
In opposition, the list of dep-nodes, along with the in-file positions of the unread fingerprints are pushed to the end of the dep-graph file. This list is also accessed through the memmap. This list is still read immediately to construct the inverse index
DepNode -> SerializedDepNodeIndex
, and remains available afterwards.Along the way, the inverse index is refactored to use a hashbrown
RawTable
. This avoids having to store all theDepNode
s twice.