-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to make Box::default() construct in-place + Box::new_in_place() #64062
Conversation
r? @kennytm (rust_highfive has picked a reviewer for you, use r? to override) |
/// FIXME: This is intended to work via return-value-optimization, but due to compiler | ||
/// limitations can't reliably do that yet. | ||
#[inline(always)] | ||
fn new_in_place(f: impl FnOnce() -> T) -> Box<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is semantically more like new_please_try_to_do_it_in_place_but_I_am_not_expecting_any_guarantee
?
(Such a guarantee is not something we can reliably or would make on a language (not LLVM) level.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point is, if we had guaranteed copy elision, I think this is as close as you could get with an API that looked something like this given Rust's semantics. Even with out-references it's not clear to me you could get semantics that nicely guarantee no copies in an easy way, as you could still mess up and wind up with a copy prior to writing it to the out-reference.
Like I said above, it's not clear to me this is actually a good idea. But it does seem novel. :)
edit: fixed thinko: we do have return-value-optimization; it's the guaranteed copy elision that we're missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we do not guarantee copy elision we'll need to at minimum call this something other than new_in_place
which suggests that there actually is a guarantee. For example: new_hint_in_place
.
It might be a good idea to just use this internally for now... let's see how this measures up in terms of perf for fn default
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a good idea to just use this internally for now
Yup, it is private right now, as I didn't want to give people false hope even in nightly. I'd suggest that we leave it that way until RVO is more reliable (specifically named-return-value-optimization if I understand the terminology right?).
@bors try |
Attempt to make Box::default() construct in-place + Box::new_in_place() To start with, I'll say straight up this is more of a sketch of a possible API then something I'm sure we should merge. The first part here is the `Box::new_in_place()` API. Basically the idea is that we could provide a way to reliably create large boxed values in-place via return-value-optimization. This is something that often is *never* possible to optimize, even in theory, because allocation is fallible and if the creation of the value is itself fallible, the optimizer can't move the allocation to happen first because that would be an observable change. Unfortunately this doesn't quite work if the closure can't be inlined sufficiently due to optimizer limitations. But hopefully this at least shows a way forward that might be interesting to others. Secondly, I use this `Box::new_in_place()` API to optimize `Box::default()`. I'm not totally sure this is a good idea to actually merge, as currently this is kind of an observable change in behavior for things like large arrays that would otherwise blow up the stack; people might rely on it, and currently the optimization is unreliable. Finally, this of course this could be extended to `Rc` and `Arc` as well in much the same fashion.
☀️ Try build successful - checks-azure |
@rust-timer build 9d1bd60 |
Success: Queued 9d1bd60 with parent e29faf0, comparison URL. |
Finished benchmarking try commit 9d1bd60, comparison URL. |
Looks like this is on average very slightly slower than before. I'll bet you this is a function of caching: by allocating first, you have to access allocation-related memory flushing both registers and cache, followed by doing the work to create the object. As most objects are fairly small anyway, having to reload registers and cache to do the work of creating it may be slightly more costly on average than the extra copy. |
#![feature(box_syntax)]
fn main() {
let _: Box<String> = box String::new();
} // [...]
_2 = Box(std::string::String);
(*_2) = const std::string::String::new() -> [return: bb2, unwind: bb4];
// [...] |
Ha, that's embarassing! Yeah, I wrote the initial version of this for Rc/Arc, and I was sure I'd double-checked the assembly for the Box version... But I guess not. @Centril Just pushed another commit which adds Rc/Arc if you want to try that benchmarking thing again. I suppose since the performance isn't significantly changed |
Attempt to make Box::default() construct in-place + Box::new_in_place() To start with, I'll say straight up this is more of a sketch of a possible API then something I'm sure we should merge. The first part here is the `Box::new_in_place()` API. Basically the idea is that we could provide a way to reliably create large boxed values in-place via return-value-optimization. This is something that often is *never* possible to optimize, even in theory, because allocation is fallible and if the creation of the value is itself fallible, the optimizer can't move the allocation to happen first because that would be an observable change. Unfortunately this doesn't quite work if the closure can't be inlined sufficiently due to optimizer limitations. But hopefully this at least shows a way forward that might be interesting to others. Secondly, I use this `Box::new_in_place()` API to optimize `Box::default()`. I'm not totally sure this is a good idea to actually merge, as currently this is kind of an observable change in behavior for things like large arrays that would otherwise blow up the stack; people might rely on it, and currently the optimization is unreliable. Finally, this of course this could be extended to `Rc` and `Arc` as well in much the same fashion.
☀️ Try build successful - checks-azure |
@rust-timer build a683cc0 |
Success: Queued a683cc0 with parent f257c40, comparison URL. |
Finished benchmarking try commit a683cc0, comparison URL. |
Heh, I wonder if that benchmark suit even has a single case of Anyway, might be too rare to be worth optimizing. |
@petertodd "The benchmark suite" is not those crates having |
Ah ok, so basically my question boils down to whether or not the compiler uses Rc/Arc default() Anyway, if you guys think this is worth merging, let me know and I'll fix up the comments appropriately. |
Basically yeah.
I'd drop the r? @nnethercote |
The general comments I can make are as follows.
Based on those observations, my inclination is that these commits do not have enough of a benefit to warrant landing. I am not comfortable reviewing the specifics of the code, because my understanding of things like r? @pnkfelix |
Just testing highfive. |
r? @pnkfelix |
Sorry y'all for the useless pings, trying to figure out why highfive didn't work earlier. r? @nnethercote |
Ok, apparently it's working now? The weird thing is I can't find any errors that happened when the last two r? failed, everything is green... r? @pnkfelix |
ping from triage - Thanks |
@pnkfelix Now that you're back, can you have a look at this please? // triage |
Pinging again from triage |
after reading @nnethercote 's evaluation above, I am closing this PR. Some extra notes. I'll leave it up to the reader to decide how relevant they are.
|
@pnkfelix Seems reasonable. Thanks for the notes! |
To start with, I'll say straight up this is more of a sketch of a possible API then something I'm sure we should merge.
The first part here is the
Box::new_in_place()
API. Basically the idea is that we could provide a way to reliably create large boxed values in-place via return-value-optimization. This is something that often is never possible to optimize, even in theory, because allocation is fallible and if the creation of the value is itself fallible, the optimizer can't move the allocation to happen first because that would be an observable change.Unfortunately this doesn't quite work if the closure can't be inlined sufficiently due to optimizer limitations. But hopefully this at least shows a way forward that might be interesting to others.
Secondly, I use this
Box::new_in_place()
API to optimizeBox::default()
. I'm not totally sure this is a good idea to actually merge, as currently this is kind of an observable change in behavior for things like large arrays that would otherwise blow up the stack; people might rely on it, and currently the optimization is unreliable.Finally, this of course this could be extended to
Rc
andArc
as well in much the same fashion.