-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
const_eval: Predetermine the layout of all locals when pushing a stack frame #57677
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
@bors try |
⌛ Trying commit fafeda6caed392ac71ac7e2b0ccc873ec86e558a with merge 4f2ef2516cd3f1e875bb589402bf1319967ce3a5... |
Thanks for the PR, @dotdash! Looking forward to the perf results |
@rust-timer build 4f2ef2516cd3f1e875bb589402bf1319967ce3a5 |
Success: Queued 4f2ef2516cd3f1e875bb589402bf1319967ce3a5 with parent ceb2512, comparison URL. |
Finished benchmarking try commit 4f2ef2516cd3f1e875bb589402bf1319967ce3a5 |
This looks pretty good. For the synthetic const-eval heavy tests it looks spectacular even I'll give it a proper review tomorrow. Meanwhile, could you check whether there's a problem with compiling the unicode_normalization crate? perf.rlo shows no numbers for it which might indicate a compilation error. cc @oli-obk @rust-lang/wg-compiler-performance |
The numbers are only missing for the parent commit. I suppose that it just predates the addition of that benchmark, which was yesterday. |
let local_ty = self.monomorphize(local_ty, frame.instance.substs); | ||
self.layout_of(local_ty) | ||
) -> TyLayout<'tcx> { | ||
frame.local_layouts[local] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While definitely an improvement for all currently possible const eval code, I think this is suboptimal for code with branching, as a lot of locals will never be touched during evaluation. I think it would be better to make local_layouts
be IndexVec<mir::Local, Option<TyLayout<'tcx>>>
and only fill it in on demand. To make this really work with reading I presume it would even need to be IndexVec<mir::Local, RefCell<Option<TyLayout<'tcx>>>>
at which point we can just move to IndexVec<mir::Local, Once<TyLayout<'tcx>>>
or some non-Sync variant of that.
@@ -463,12 +463,10 @@ impl<'a, 'mir, 'tcx, M: Machine<'a, 'mir, 'tcx>> EvalContext<'a, 'mir, 'tcx, M> | |||
&self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive-by fix: access_local
doesn't need to be pub
anymore
@oli-obk's remark regarding branching code makes sense. Would it be possible to make some or most |
The problem is that we separated reading and writing into |
OK, I'm fine with the I'd be interested though in how using |
Mirroring the mutability of the virtual miri memory in the mutability of the memory datastructure itself allows various things like holding references to different parts of the virtual memory at the same time. If we made everything take mutable references, we could only ever access one miri allocation, even if we just wanted to write to one. While we could have worked around this with unsafe code (and we still have to do it in one situation rust/src/librustc_mir/interpret/memory.rs Lines 725 to 763 in 38650b6
Also it just "felt right" to make sure we don't accidentally modify our virtual memory in functions that should just read from it. |
Yeah, I can see how this can become tricky. Thanks for the explanation, @oli-obk! |
using
Is there some magic in the stable hash macro that might affect this? |
…k frame Usually the layout of any locals is required at least three times, once when it becomes live, once when it is written to, and once it is read from. By adding a cache for them, we can reduce the number of layout queries speeding up code that is heavy on const_eval.
Seems that the constraint on the stable hash impl was just swapped around, asking for 'mir to outlive 'tcx instead of the other way around. |
@bors try We should measure this again, my local setup seems a bit funky right now, and showed a massive slowdown for this version which seems unlikely to be right. |
⌛ Trying commit 98d4f33 with merge a0b24f2d5d611007296d56748caf31cfa8e5af32... |
☀️ Test successful - checks-travis |
@rust-timer build a0b24f2d5d611007296d56748caf31cfa8e5af32 |
Success: Queued a0b24f2d5d611007296d56748caf31cfa8e5af32 with parent 4db2394, comparison URL. |
Finished benchmarking try commit a0b24f2d5d611007296d56748caf31cfa8e5af32 |
Perf is great (though no effect on unicode-normalization). Impl looks good. The commit message is still slightly outdated. |
Ah, I had updated the commit message, but not the message for the PR itself. Or did I still miss something in the commit message as well? |
📌 Commit 98d4f33 has been approved by |
const_eval: Predetermine the layout of all locals when pushing a stack frame Usually the layout of any locals is required at least three times, once when it becomes live, once when it is written to, and once it is read from. By adding a cache for them, we can reduce the number of layout queries speeding up code that is heavy on const_eval.
const_eval: Predetermine the layout of all locals when pushing a stack frame Usually the layout of any locals is required at least three times, once when it becomes live, once when it is written to, and once it is read from. By adding a cache for them, we can reduce the number of layout queries speeding up code that is heavy on const_eval.
Rollup of 9 pull requests Successful merges: - #57537 (Small perf improvement for fmt) - #57552 (Default images) - #57604 (Make `str` indexing generic on `SliceIndex`.) - #57667 (Fix memory leak in P::filter_map) - #57677 (const_eval: Predetermine the layout of all locals when pushing a stack frame) - #57791 (Add regression test for #54582) - #57798 (Corrected spelling inconsistency) - #57809 (Add powerpc64-unknown-freebsd) - #57813 (fix validation range printing when encountering undef) Failed merges: r? @ghost
@@ -510,7 +504,7 @@ impl<'a, 'mir, 'tcx, M: Machine<'a, 'mir, 'tcx>> EvalContext<'a, 'mir, 'tcx, M> | |||
// FIXME: do some more logic on `move` to invalidate the old location | |||
Copy(ref place) | | |||
Move(ref place) => | |||
self.eval_place_to_op(place, layout)?, | |||
self.eval_place_to_op(place)?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, but this means we might have to compute the layout even if we already know it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To change this we'd need to pass a precomputed layout to layout_of_local
, which probably doesn't do anything, because any initialized local will already have had called layout_of_local
on it before. We can still benchmark it and see if it changes anything
@@ -72,6 +72,7 @@ fn mk_eval_cx_inner<'a, 'mir, 'tcx>( | |||
ecx.stack.push(interpret::Frame { | |||
block: mir::START_BLOCK, | |||
locals: IndexVec::new(), | |||
local_layouts: IndexVec::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't the information be included in locals
itself?
Usually the layout of any locals is required at least three times, once
when it becomes live, once when it is written to, and once it is read
from. By adding a cache for them, we can reduce the number of layout
queries speeding up code that is heavy on const_eval.