-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unboxed closures #77
unboxed closures #77
Conversation
+1, I think we want something like this. I'm gathering some statistics now that will better inform our decision. |
My gut says we'd really want some form of capture lists to handle the by value vs by ref issue. I'm not sure how often it'll actually be necessary to manually specify a by-ref capture though, so maybe we can revisit later if it becomes a problem. |
I struggled to come up with an unambiguous way of defining capture lists, and I think any approach would be very ugly with the current closure syntax. Since references are just regular values, you can capture by-reference without using capture sugar at all, so I don't think a capture list would make things significantly clearer. The closures proposed here are purely sugar for simple cases, so it should be as painless as possible to write them. An explicit It's not perfect, but In C++, capture lists are the defining characteristic of the closure syntax, because the simplest closure ( Capture lists may be intended to make capturing many values more clear, but I find that I always stop listing out the captures by hand where there are more than 2 because it's less painful to just capture anything that's referenced. I think there's a fine line between a helpful feature (dare I say |
currently require a concrete type signature so this proposal alone is not enough to return them. The | ||
restriction on functions could be relaxed to permit the return type to be an anonymous type | ||
implementing a trait, or in other words an "unboxed trait object". This is future work for another | ||
proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, couldn't this be done without unboxed trait objects? A function that returns a closure isn't referring to an unboxed trait object but instead referring to a specific anonymous struct. The function's implementation would then use this anonymous struct as the type for whatever closure is actually returned. The two big downsides I could see are:
- You can't conditionally pick from several closures to return, and doing so would probably cause a very confusing error (because only one can actually be the correct type).
- I'm not sure how this would work if the function is generic. Does the anonymous struct then become generic too, using the same type parameters as the function? What if the function isn't generic, but it's part of a generic impl? This seems potentially problematic.
I'm not suggesting this is the way to go. Unboxed trait objects seem like a better solution. I just wanted to mention it as a potential alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just using unboxed trait objects as jargon to refer to an anonymous type implementing some trait. Implementing it with sane error messages seems like it would go a long way to implementing this as a general purpose feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering my suggestion to actually be the inverse of rust-lang/rust#11455. In that proposal, a function returning an unboxed Iterator
would actually return the concrete type of whatever iterator the implementation returns. I was thinking of this as the opposite, the return type would define the anonymous type and the closure in the implementation would be forced to be that same type.
But it turns out that's rather unnecessary (and overly complicated, because the function doesn't know what upvars there will be so it can't actually fully define the type). Because treating it the exact same way that the Iterator
case is works just as well. So forget I suggested anything.
+1 on the general idea; the by-ref capture syntax is awkward, but no better approach is readily apparent. |
Here are some statistics on closures (click through the charts at the bottom to see charts): https://docs.google.com/spreadsheets/d/1AMNPHhUxa7HKnoPAUD0Lj_fHxsScEa9rfWgK3JLYSjM/edit?usp=sharing |
Out of 6,145 closures, 4,654 (75.7%) contain no or only immutable, copyable borrows, and would therefore have identical semantics if upvars were moved by default (excluding Most of the rest (19.2% of the total, 79.0% of the rest) contain at least one unique-immutable borrow. I have not yet analyzed for how many of them those uniquely-immutably-borrowed variables are used after the construction of the closure. |
Overall I'm very strongly in favor of this proposal. I was initially very against the One question I have about this is the traits for
|
Yes, I want it to be possible for user-defined types to implement the trait(s) via this sugar.
If it has the same number of an arguments, it will report an error for a duplicate implementation of the trait. If it has a different number of arguments, it will report an error when you attempt to call it. This is already implemented as an error message when two traits are in scope implementing a method with the same name, and using a different number of arguments can be considered an implementation of a separate trait. If variadic generics ever exist, it could be mapped to a single trait rather than multiple traits but it's not necessary and the sugar would still be how it's used, so it would only influence the error messages. |
What about the same number of arguments, but different types? In any case, it sounds like you're saying "it would be ambiguous", which is what I wanted to hear. |
@kballard: If it's the same number of arguments with different types, the compiler would reject it as a duplicate implementation of the trait via the coherence rules. |
@pcwalton what is a "unique-immutable" capture? |
I took that to mean a capture of an affine (non- |
Wouldn't that be covered by "immutable, noncopyable" and "mutable, noncopyable"? |
Ah, then I don't know what it is either. |
I love love love that this proposal avoids forcing us to re-introduce capture clauses. It's important to me that lambda syntax be very lightweight for the common case, a property that mandatory capture clauses would obliterate. @pcwalton, what code did you analyze to produce those numbers, and what tool did you use to generate them? |
Random thought: we can possibly support C++14 generic lambdas relatively naturally. We currently have the I'm mainly writing this here so that it's recorded somewhere, and because it seems reasonable to avoid closing off this path, if possible. (If there is interest in discussing this outside of what's relevant to this RFC I'll make a post on /r/rust or the mailing list, to avoid this RFC being sidetracked.) |
@thestinger @huonw Unique-immutable means that
@bstrie I'm a bit concerned about is having to write |
@pcwalton: I don't understand when moving or re-borrowing wouldn't be enough when it's being captured by-value. I think using |
Yes, Niko told me about the reborrowing option for On Friday, May 16, 2014, Daniel Micay notifications@github.com wrote:
|
Can someone elaborate on reborrowing in this context? |
|
@pcwalton My understanding is that a |
Ok, so it sounds like this sugar is mapping to an infinite series of trait<R> Callable0<R> { fn call(&mut self) -> R; }
trait<R,A1> Callable1<R,A1> { fn call(&mut self, A1) -> R; }
trait<R,A1,A2> Callable2<R,A1,A2> { fn call(&mut self, A1, A2) -> R; }
// etc which makes sense. I was interpreting each |
Looking at the categories of captures in pcwalton's spreadsheet, can someone confirm how each would map to the capture syntax proposed here? My best intuition:
Also, are the parens in the latter case mandatory for disambiguation? |
Correct. Or if the value is sufficiently large, you can use
If moving is acceptable, then that's correct. Otherwise, you can use
Yeah. Or you can just use
I believe it's unambiguous without them, but I also think it's better code style to use them. So I would imagine they wouldn't actually be required by the parser, but we'd still use them in code examples. |
For reference, if you want to reproduce the exact same behavior we have today, then you'd use either |
The parentheses wouldn't always be required, I just used them because my example wasn't as trivial as |
@kballard, thanks. I was just trying to get an intuition for how many upvars would opt out of by-value capturing, though since it's not really a 1:1 mapping I suppose it doesn't really answer my question (but is informative regardless). |
One more clarification regarding the syntax: presumably, after capturing by reference, you would not require the capture syntax on subsequent uses of the variable? i.e. let mut v = Vec::new(); let c = || { (ref mut v).push(3); v.push(6); } ...instead of: let mut v = Vec::new(); let c = || { (ref mut v).push(3); (ref mut v).push(6); } |
@bstrie: You would need to repeat |
I can live with repeating the capture type. More verbose, but less magical, so it's a wash. And |
I prefer repeating. It allows expressing both a by-value and a by-ref capture in the same closure, which is theoretically useful (well, by-mut-ref is more theoretically useful in this scenario).
|
I broadly like the proposal and agree capture clauses are pretty ugly. However, I really dislike annotating actual variables with properties of the formals, even if the formals are implicit. This is a big change from anything currently in Rust or (afaik) any other mainstream programming language. It seems confusing as well as not scaling beyond the smallest closures. There was a suggestion floating around (I think Niko's) that we make closures capture all arguments by value by default, we have some sugary syntax for a closure which captures all its arguments by reference (of course the defaults could be reversed if desired), and for the remaining case we use the by-value form and you have to explicitly create references in the caller or to use capture clauses. I'm not sure about the details of that, but it sits much better with me. |
I think large closures venture outside of the use case for closure syntax. The use case for this syntax is defining small callable objects with state inline with the body of another function. If it's a large amount of code, then it makes more sense to define it outside of the function as a
This will lead to performing unwanted shallow copies (potentially large ones) or unwanted by-reference captures simply because it's the only clean way to do it. It will encourage writing inefficient code. The use of manual by-reference captures via |
You can also do |
I agree with @thestinger, sprawling closures are an anti-pattern and should be discouraged.
My mental model of closures doesn't consider upvars to be formal parameters, and I'm willing to venture this is how most other people conceptualize it as well. Capture clauses themselves are already far removed from most programming languages (aside from C++, I can't think of any other language, niche or mainstream, that has syntax to allow you to determine the manner in which upvars are captured).
I would like to read this proposal, but at first blush it seems too coarse-grained to force a choice between 1) every upvar by-val, 2) every upvar by-ref, 3) a mixture by making manual bindings outside the function. pcwalton could weigh in here by giving us stats on what percentage of closures have at at least one by-val and one by-ref upvar. |
I feel like we may not need any syntax for upvar captures. Fewer than 10% of closures in both Rust and Servo use anything other than the default, assuming we reborrow @thestinger was concerned about people accidentally capturing large structs by value, but I bet it's rare. I'll do some analysis to see how big the set of upvars is. |
Would the desugaring described allow for movement of owned captured variables into the closure? For example, today this compiles with an error: let a = box 1;
(|| drop(a))() With this proposal, the closure would capture |
@alexcrichton I believe you are correct. You can move the It doesn't make sense to be able to |
Unboxed closures can be run-once, if the "callable" trait they implement takes (The |
regular `call` method can not safely move out of the captured environment. A separate trait would be | ||
required, perhaps using the reserved `once` keyword as a prefix to the closure type sugar. It could | ||
also be done via a future implementation of variadic generics without the sugar, but it would be | ||
significantly uglier and would still be a magical lang item. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really different enough that it deserves to be deferred for a future RFC? Because if you're just looking to avoid a syntax bikeshed...
spawn(proc(x) { ... }); // old
spawn(once |x| { ... }); // new
...then IMO this looks like it would be an improvement. Still not quite as pretty as the old do spawn
, but I was never comfortable with how non-closurelike the proc
syntax looks. :P
@huonw Ok sure, but the run-once variant uses a by-value |
@kballard ... that's exactly my point. Generalised closures aren't forced to have |
I did some analysis on the size of upvars being captured. Results are here: https://docs.google.com/spreadsheets/d/1AMNPHhUxa7HKnoPAUD0Lj_fHxsScEa9rfWgK3JLYSjM/edit?usp=sharing The tl;dr is that, on a 64-bit machine in the Rust compiler and standard libraries, 95.5% of upvars are 16 bytes or fewer, 97.2% of upvars are 32 bytes or fewer, and 98.2% of upvars are 64 bytes or fewer. |
Did you calculate those percentages correctly? That data gives me even better results:
|
@alexcrichton: You can move into the closure, but it's not possible to move out via |
I like this proposal, for all the reasons @thestinger has mentioned. In Rust, working by-value is simply the most composable option, and with the optional |
This is now updated to treat |
One alternative to the proposed by-reference capture system (the size of rust-lang/rust#14501 implies that this should not be thrown out) would be replacing
Downsides:
|
@gereeter: That would make the reference operators have different behavior depending on the context where they are written, which apart from being confusing would also prevent the user from being able to choose whether they want to capture by reference, or to capture by value and take a reference to that. The advantage of the |
Closing in favor of the most recent unboxed closures RFC #114 |
Drastically simplify with ideas from tokio
No description provided.