-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generator size: unwinding and drops force extra generator state allocation #59123
Comments
@rustbot modify labels: A-generators and T-compiler. |
Is this related to #52924? |
@MSleepyPanda in as much as it’s about generators being too big. The specific optimisation proposed there won’t help here as |
cc @tmandry |
Another example, without drop: #![feature(generators, generator_trait)]
use std::ops::Generator;
struct Foo([u8; 1024]);
fn simple() -> impl Generator<Yield = (), Return = ()> {
static || {
let first = Foo([0; 1024]);
let _second = first;
yield;
}
}
fn complex() -> impl Generator<Yield = (), Return = ()> {
static || {
fn foo(_: &mut Foo) {}
let mut first = Foo([0; 1024]);
foo(&mut first);
let mut second = first;
foo(&mut second);
let mut third = second;
foo(&mut third);
let mut _fourth = third;
yield;
}
}
fn main() {
dbg!(std::mem::size_of_val(&simple()));
dbg!(std::mem::size_of_val(&complex()));
} outputs
|
I was thinking, we can solve this by adding the following rules to our
We must not optimize away storage of locals that are mutably borrowed, because as @matthewjasper notes in #61430, it isn't decided that the following is UB: let mut x = String::new();
let p = &mut x as *mut String;
let y = x;
p.write(String::new()); It's an open question of whether we can say "the local hasn't been mutably borrowed up to here" when evaluating rule 3. I'd prefer to make the optimization as smart as we can, but MIR probably allows borrowing a moved-from value and mutating it. Is this sound? |
The "mutably" of "mutably borrowed" is a red herring IMO, unless you want to check for |
@tmandry Interesting. How bad would it be to relax this to "if a local is moved from and never has had its address taken"? Then we can be sure without any assumptions about Stacked Borrows that direct accesses to the local variable are the only way to observe it, and those will be prevented after a move. This would also alleviate @eddyb's concern I think. Also, what is the granularity here? Without Stacked Borrows assumptions we can only do this on a per-local level, not on a per-field-of-a-struct level. Taking the address of one field leaks the address of other fields (if layout assumptions are made). |
Oh, really? I suppose that makes sense in a "list of things you can do" sense, but it's surprising and seems like it would make optimizations like SROA hard. |
Well, Stacked Borrows might restrict this. But then we get into rust-lang/unsafe-code-guidelines#134. So for now I'd say as compiler authors we should assume that it does leak the entire local. |
Oh right, well
Yes, but how would we enforce this? A borrow can be passed to another function, which can convert to a pointer. I'd rather not disable the optimization when a borrow escapes the function. One practical reason for this is that
I'm only proposing to do this on a per-local level, for now. |
I think that @RalfJung was including |
"the address taken" = "has the Basically what @cramertj said. Doing "better than this" requires committing to parts of Stacked Borrows. I am not sure if @rust-lang/lang is ready for that. For sure it shouldn't happen en passant as part of a PR like this. |
Then yes, that's easy to enforce.
Can you expand on this? How would checking for |
@cramertj indeed that's what I mean. This would be the first new exploitation of "reference-based UB" since... forever, I think. It should happen deliberately and prominently and have its own discussion. |
@RalfJung Hmm okay, then I think we should start having that discussion soon (in a different issue, probably). This relates to the maturity of |
@tmandry notice that the moment your future is generic or has a |
Also, won't it usually |
I'm not sure I understand this bit-- couldn't the optimization be monomorphization-dependent? |
FWIW I'm probably reverting all changes to the generator transform in that PR. The |
@RalfJung What I'm trying to optimize is this: let mut x = get_future();
dbg!(std::mem::size_of_val(&x));
x.await which today expands to let mut x = get_future();
dbg!(std::mem::size_of_val(&x));
{
let mut pinned = x; // <--- This move
loop {
// use `&mut pinned`
yield;
}
} The move Based on our discussion, it seems pretty straightforward to allow this optimization when the |
For what it's worth, that move ( |
A secondary question I have is, can we rely on the ordering between the move and the borrow? e.g. let mut x = String::new();
move_from(x);
// ...later...
x = bar();
let addr_x = &x as *const String;
move_from(x);
ptr::write(addr_x as *mut String, String::new()); Since the first instance of I don't want to get bogged down in this question though, |
Actually i think we can do this today-- |
@cramertj Hmm, good to know, but this also affects the size of |
@tmandry we can get away with a similar thing there by only putting references in the new state machine. It's higher overhead because of the additional pointers, but it'll prevent ever having to copy into a new location, and save space for large types, even those whose address had been taken. |
Still catching up, some comments on the way:
Oh so everything we are discussing here happens post-monomorphization? Never mind that part then.
Agreed. And if it's just Even with full Stacked Borrows, we'd still have an "uncertainty principle" (the physicist in me dies a little in the face of this very inaccurate analogy but whatever ;) when there is any interior mutability in a local of the future. That still seems bad enough for debugging, does it not? |
Aaaand this doesn't work because it requires that the binding be mutable: macro_rules! drop_with_guaranteed_move_elision {
($val:ident) => {
unsafe {
std::ptr::drop_in_place(&mut $val);
std::mem::forget($val);
}
}
}
fn main() {
let mut x = "foo".to_string(); // errors if `x` isn't `mut`
drop_with_guaranteed_move_elision!(x);
} |
Mm yeah, I guess the true fix for this rough edge is MIR inlining, with a special case for Still, with regard to optimization I would like to support as many cases as possible. |
Sure you do, we all do. Just remember that you are paying for this optimization with the One thing I am wondering about: in a situation like let mut x = get_future();
dbg!(std::mem::size_of_val(&x));
{
let mut pinned = x; // <--- This move
loop {
// use `&mut pinned`
yield;
}
} why can't MIR building insert the |
Closures have a similar problem with bindings having to be mutable for some stuff, and that's why we have "unique" borrows in the compiler. So maybe a similar technique could be used here?
let mut x = String::new();
move_from(x);
// ...later...
x = bar();
let addr_x = &x as *const String;
move_from(x);
ptr::write(addr_x as *mut String, String::new()); You can rely on there not being any access to This is subtle but I see nothing wrong with this reasoning.^^ |
Fair enough :) Personally I'm just as wary of painting ourselves into a corner in terms of classes of optimizations we can do, though. One is a higher immediate cost, and the other is a smaller cost which will be paid by all future users of Rust. But anyway, I'll stop waxing philosophical about optimizations for now ;)
To me it seems just as hard as doing the analysis, because you have to make sure not to do this in instances where
Great. In MIR is it possible to write code that takes the address of a local after it's been moved from (before it's reinitialized)? If so it should be easy to handle, we just mark locals as needing storage as soon as any part of them is borrowed. |
I guess this might be already understood, but There is quite a bit of code around which moves futures just in order to make sure users can't fiddle with them anymore, e.g. With @tmandry's last optimization I got a not too large future based program already from a a 70kB Future to a 30kB Future without further changes. However with e.g. replacing some await based mechanisms with manual combinators I brought it down further to less than 10kB - and I'm pretty sure there is a lot more potential since the program is actually pretty tiny. I can't comment on what can be done in the compiler to improve this, but I definitely encourage all further investigations and improvements on this. The impact seems to be huge! |
No, that's not what I mean. The idea is to aggressively insert The reason I think we need this is that validity of pointers is not the only concern here. Currently, the following (entirely safe) code will definitely return let mut x = String::new();
let addr_x = &x as *const _ as usize;
drop(x);
// later
x = String::new();
let addr_x2 = &x as *const _ as usize;
return addr_x == addr_x2; If we want to do optimizations like yours here (and I am totally sympathetic to that), we have to explain in the "Rust Abstract Machine" (and in Miri) why this program might return This is a topic that @nikomatsakis, @eddyb and me have touched on several times already, without ever hashing out a full plan. But in the current state of affairs, the only mechanism we have to "defeat" pointer equality tests like the above is to make sure that this is not the same allocation any more. So, one thing we might do is to do
Maybe. Better assume so. Just look out for any |
Okay, that sounds like something worth pursuing, but it also seems like it would have potential fallout we need to check for. I'm not sure I understand all the implications myself. Given that I have an optimization pass mostly working without it, maybe it would be good to set up a separate issue to track all the considerations involved. EDIT: Opened #61849 Also, I forgot to mention.. just adding We would need something that adds |
Moving to "deferred" by the same logic as #52924 (comment). @rust-lang/lang feel free to put the labels back if you disagree. |
I'd personally like to see this fixed before stabilization. It can cause the same exponential size effects as we were seeing before #60187, albeit in different contexts. I'm hoping to have a fix up for review soon. |
@RalfJung But wasn't the point of " |
The only thing I want is a precise definition of the semantics in a way that can be dynamically checked (e.g. in Miri), and ideally I also want the semantics to not be full of weird special cases. ;)
How would that work? Wouldn't that mean that legal MIR code (that uses some kind of trick to reinitialize after a move) becomes UB in LLVM? |
No, I was referring to the case where we want the semantics of I don't understand your position now. Are you saying you don't mind if source-level moves invalidate borrows, you just don't want it be be encoded into |
Let's continue at #61849 (comment). |
…, r=matthewjasper Don't store locals that have been moved from in generators This avoids reserving storage in generators for locals that are moved out of (and not re-initialized) prior to yield points. Fixes rust-lang#59123. This adds a new dataflow analysis, `RequiresStorage`, to determine whether the storage of a local can be destroyed without being observed by the program. The rules are: 1. StorageLive(x) => mark x live 2. StorageDead(x) => mark x dead 3. If a local is moved from, _and has never had its address taken_, mark it dead 4. If (any part of) a local is initialized, mark it live' This is used to determine whether to save a local in the generator object at all, as well as which locals can be overlapped in the generator layout. Here's the size in bytes of all testcases included in the change, before and after the change: async fn test |Size before |Size after -----------------|------------|---------- single | 1028 | 1028 single_with_noop | 2056 | 1032 joined | 5132 | 3084 joined_with_noop | 8208 | 3084 generator test |Size before |Size after ----------------------------|------------|---------- move_before_yield | 1028 | 1028 move_before_yield_with_noop | 2056 | 1032 overlap_move_points | 3080 | 2056 ## Future work Note that there is a possible extension to this optimization, which modifies rule 3 to read: "If a local is moved from, _**and either has never had its address taken, or is Freeze and has never been mutably borrowed**_, mark it dead." This was discussed at length in rust-lang#59123 and then rust-lang#61849. Because this would cause some behavior to be UB which was not UB before, it's a step that needs to be taken carefully. A more immediate priority for me is inlining `std::mem::size_of_val(&x)` so it becomes apparent that the address of `x` is not taken. This way, using `size_of_val` to look at the size of your inner futures does not affect the size of your outer future. cc @cramertj @eddyb @Matthias247 @nikomatsakis @RalfJung @Zoxc
…jasper Don't store locals that have been moved from in generators This avoids reserving storage in generators for locals that are moved out of (and not re-initialized) prior to yield points. Fixes #59123. This adds a new dataflow analysis, `RequiresStorage`, to determine whether the storage of a local can be destroyed without being observed by the program. The rules are: 1. StorageLive(x) => mark x live 2. StorageDead(x) => mark x dead 3. If a local is moved from, _and has never had its address taken_, mark it dead 4. If (any part of) a local is initialized, mark it live' This is used to determine whether to save a local in the generator object at all, as well as which locals can be overlapped in the generator layout. Here's the size in bytes of all testcases included in the change, before and after the change: async fn test |Size before |Size after -----------------|------------|---------- single | 1028 | 1028 single_with_noop | 2056 | 1032 joined | 5132 | 3084 joined_with_noop | 8208 | 3084 generator test |Size before |Size after ----------------------------|------------|---------- move_before_yield | 1028 | 1028 move_before_yield_with_noop | 2056 | 1032 overlap_move_points | 3080 | 2056 ## Future work Note that there is a possible extension to this optimization, which modifies rule 3 to read: "If a local is moved from, _**and either has never had its address taken, or is Freeze and has never been mutably borrowed**_, mark it dead." This was discussed at length in #59123 and then #61849. Because this would cause some behavior to be UB which was not UB before, it's a step that needs to be taken carefully. A more immediate priority for me is inlining `std::mem::size_of_val(&x)` so it becomes apparent that the address of `x` is not taken. This way, using `size_of_val` to look at the size of your inner futures does not affect the size of your outer future. cc @cramertj @eddyb @Matthias247 @nikomatsakis @RalfJung @Zoxc
The two generators returned by
simple
andcomplex
should be equivalent, butcomplex
takes twice as much space:Dumping out the MIR (with
rustc 1.34.0-nightly (f66e4697a 2019-02-20)
) shows an issue with how unwinding fromfoo
interacts with the two stack slots forfirst
and_second
, using a dynamic drop flag means thatfirst
is "live" through the path that goes through the yield, even though the drop flag is guaranteed to be false. (The below graph shows the basic blocks, with the psuedo-code run in them and which variables are alive when exiting the block):The text was updated successfully, but these errors were encountered: