Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbox almost all the closures #19467

Merged
merged 92 commits into from
Dec 14, 2014
Merged

Unbox almost all the closures #19467

merged 92 commits into from
Dec 14, 2014

Conversation

japaric
Copy link
Member

@japaric japaric commented Dec 2, 2014

This PR moves almost all our current uses of closures, both in public API and internal uses, to the new "unboxed" closures system.

In most cases, downstream code that only uses closures will continue to work as it is. The reason is that the || {} syntax can be inferred either as a boxed or an "unboxed" closure according to the context. For example the following code will continue to work:

some_option.map(|x| x.transform_with(upvar))

And will get silently upgraded to an "unboxed" closure.

In some other cases, it may be necessary to "annotate" which Fn* trait the closure implements:

// Change this
|x| { /* body */}
// to either of these
|: x| { /* body */}  // closure implements the FnOnce trait
|&mut : x| { /* body */}  // FnMut
|&: x| { /* body */}  // Fn

This mainly occurs when the closure is assigned to a variable first, and then passed to a function/method.

let closure = |: x| x.transform_with(upvar);
some.option.map(closure)

(It's very likely that in the future, an improved inference engine will make this annotation unnecessary)

Other cases that require annotation are closures that implement some trait via a blanket impl, for example:

  • std::finally::Finally
  • regex::Replacer
  • std::str::CharEq
string.trim_left_chars(|c: char| c.is_whitespace())
//~^ ERROR: the trait `Fn<(char,), bool>` is not implemented for the type `|char| -> bool`
string.trim_left_chars(|&: c: char| c.is_whitespace())  // OK

Finally, all implementations of traits that contain boxed closures in the arguments of their methods are now broken. And will need to be updated to use unboxed closures. These are the main affected traits:

  • serialize::Decoder
  • serialize::DecoderHelpers
  • serialize::Encoder
  • serialize::EncoderHelpers
  • rustrt::ToCStr

For example, change this:

// libserialize/json.rs
impl<'a> Encoder<io::IoError> for Encoder<'a> {
    fn emit_enum(&mut self,
                 _name: &str,
                 f: |&mut Encoder<'a>| -> EncodeResult) -> EncodeResult {
        f(self)
    }
}

to:

// libserialize/json.rs
impl<'a> Encoder<io::IoError> for Encoder<'a> {
    fn emit_enum<F>(&mut self, _name: &str, f: F) -> EncodeResult where
        F: FnOnce(&mut Encoder<'a>) -> EncodeResult
    {
        f(self)
    }
}

[breaking-change]


How the Fn* bound has been selected

I've chosen the bounds to make the functions/structs as "generic as possible", i.e. to let them allow the maximum amount of input.

  • An F: FnOnce bound accepts the three kinds of closures: |:|, |&mut:| and |&:|.
  • An F: FnMut bound only accepts "non-consuming" closures: |&mut:| and |&:|.
  • An F: Fn bound only accept the "immutable environment" closures: |&:|.

This means that whenever possible the FnOnce bound has been used, if the FnOnce bound couldn't be used, then the FnMut was used. The Fn bound was never used in the whole repository.

The FnMut bound was the most used, because it resembles the semantics of the current boxed closures: the closure can modify its environment, and the closure may be called several times.

The FnOnce bound allows new semantics: you can move out the upvars when the closure is called. This can be effectively paired with the move || {} syntax to transfer ownership from the environment to the closure caller.

In the case of trait methods, is hard to select the "right" bound since we can't control how the trait may be implemented by downstream users. In these cases, I have selected the bound based on how we use these traits in the repository. For this reason the selected bounds may not be ideal, and may require tweaking before stabilization.

r? @aturon

@Gankra
Copy link
Contributor

Gankra commented Dec 2, 2014

😍

@alexcrichton
Copy link
Member

@japaric this looks amazing, awesome work!

@aturon
Copy link
Member

aturon commented Dec 3, 2014

@japaric

First, thanks as always for your amazing work moving the libraries forward with these new language features!

I talked with @alexcrichton and @nikomatsakis a bit about our strategy, and we had a few thoughts:

  • Functions consuming closures should essentially always take them generically (as you're doing)
  • Functions yielding closures should, in general, be using a newtype to hide the implementation choice. In particular, the various pub type definitions for iterators like Values should all be changed to be newtypes hiding their implementation and providing their own Iterator impl. (This is a workaround until we have a better mechanism for doing this hiding.) This newtype work is something we're doing as part of API stabilization, but in many cases we haven't gotten to it yet. Please convert cases like this (Keys, Values, etc) to newtypes.
  • The use of internal free fns to have a nameable type as you're using in a few places here shouldn't be necessary: we should be able to provide an implementation of Fn(X) -> Y for Box<Fn(X) -> Y>, for example, and so you should be able to just write .map(box |(k, _)| k). (As with your current implementation, this will involve dynamic dispatch, which is bad, but there's essentially no way around that with our current plans for what's available on the Stable channel).

So, I'd suggest adding an implementation of the Fn traits for boxes Fn values and using box rather than manually creating local bare fns -- which should be less work anyway :-)

Please let me know if any of that is unclear.

@japaric
Copy link
Member Author

japaric commented Dec 3, 2014

Functions yielding closures should, in general, be using a newtype to hide the implementation choice.

Could this be done in a follow up PR? This quite a bit of work, not only we must make the wrapper implement Iterator, but also DoubleEndedIterator, RandomAccessIterator, ExactSizeIterator, Clone, and/or any other trait that the type alias "transparently" implements today.

you should be able to just write .map(box |(k, _)| k)

Which implementation should be in the standard library:

  • impl FnOnce<Args, Results> for Box<F> where F: FnOnce<Args, Results> { .. }, or
  • impl FnOnce<Args, Results> for Box<FnOnce<Args, Results> + 'static> { .. }

If the later, then won't we need to write .map(box |(k, _)| k as Box<FnOnce(_) -> _ + 'static) instead? I don't think that box || {} gets automatically coerced to a trait object.

Anyhow, due to #19032/#18835, I can't implement either right now. Can we move forward with the bare functions? Or should we block on #19032 getting fixed?

@aturon
Copy link
Member

aturon commented Dec 4, 2014

@japaric

Yes, I think it's fine to move forward with this as-is; I didn't realize that the impls were currently blocked. The transition to bare fns is relatively rare and can be cleaned up later. The newtype wrappers for various Iterator types will fall out of stabilization regardless.

@nikomatsakis
Copy link
Contributor

@aturon @japaric I think the most likely fix to #19032 is to convert the Fn traits to use associated types. We should start investigating this. Hurrah for multiplying risk!

@nikomatsakis
Copy link
Contributor

(Or at least, the most likely workaround to #19032 for the short term)

@@ -163,7 +163,7 @@ pub trait IteratorExt<A>: Iterator<A> {
/// ```
#[inline]
#[unstable = "waiting for unboxed closures"]
fn map<'r, B>(self, f: |A|: 'r -> B) -> Map<'r, A, B, Self> {
fn map<'r, B, F: FnMut(A) -> B>(self, f: F) -> Map<A, B, Self, F> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still have 'r as a lifetime parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it shouldn't. I'll remove it. Thanks for pointing it out!

@japaric
Copy link
Member Author

japaric commented Dec 6, 2014

I'm deferring unboxing TrieMap methods to a later PR, because of the "recursion limit during monomorphization" issue (#19596)


/// Returns an iterator over subslices separated by elements that match
/// `pred` limited to splitting at most `n` times. This starts at the end of
/// the slice and works backwards. The matched element is not contained in
/// the subslices.
#[unstable = "waiting on unboxed closures, iterator type name conventions"]
fn rsplitn_mut<'a>(&'a mut self, n: uint, pred: |&T|: 'a -> bool) -> SplitsN<MutSplits<'a, T>>;
fn rsplitn_mut<'a, P>(&'a mut self, n: uint, pred: P) -> SplitsN<MutSplits<'a, T, P>> where
P: FnMut(&T) -> bool;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistically I've personally been pushing towards something like:

fn foo(...)
       where ... {
}

I think this came up on a previous PR though, @aturon do you know if we've settled on a convention for this just yet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really :-)

In fact, we don't currently have a way to settle on these kinds of conventions -- the ones in the Guidelines retain a very informal status and haven't gone through an RFC.

FWIW, I personally prefer @japaric's style here. But ultimately, we're going to need to sit down and design a full style guide -- probably between a release candidate and the 1.0 final.

@japaric japaric changed the title [WIP] Unbox all the closures Unbox almost all the closures Dec 10, 2014
@japaric
Copy link
Member Author

japaric commented Dec 10, 2014

@alexcrichton @aturon This is pretty much ready to go. There are a few closures that still need to be unboxed, but most of them are blocked on #19596 or #18835. I'll do one final pass marking them with FIXMEs. I've also updated this PR message, be sure to check it out.

@aturon
Copy link
Member

aturon commented Dec 11, 2014

@japaric I'm speechless; this is amazing. I can't thank you enough for the quality and quantity of work you've been doing on the libraries. I will review this PR ASAP.

/// let v: Vec<&str> = "Mary had a little lamb".split(' ').collect();
/// assert_eq!(v, vec!["Mary", "had", "a", "little", "lamb"]);
///
/// let v: Vec<&str> = "abc1def2ghi".split(|c: char| c.is_numeric()).collect();
/// let v: Vec<&str> = "abc1def2ghi".split(|&: c: char| c.is_numeric()).collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this is needed. I tried a similar test case on nightly without the &: and it seemed to go through, but perhaps I'm missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this is needed.

I removed the impl CharEq for |char| -> bool (note: that's a boxed closure).

I tried a similar test case on nightly without the &: and it seemed to go through

Are you sure it wasn't using a boxed closure instead of an unboxed one?


Note that the split method takes an CharEq implementor (not an Fn* implementor). And last time I checked the || {} syntax only gets interpreted as an unboxed closure if it's used right were a Fn* implementor is expected [1]. But it has been a few days, perhaps that changed already?

[1] For example:

fn f<F>(_: F) where F: Fn() {}

fn main() {
    f(|| {});  // OK
    let closure = || {};  // Interpreted as a boxed closure at this point
    f(closure);  // ERROR: type `||` doesn't implement the `Fn()` trait
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahha -- I just wasn't paying quite close enough attention: I was accidentally looking at the Vec split method. Thanks.

@aturon
Copy link
Member

aturon commented Dec 12, 2014

OK, I've made it all the way through the PR. Wonderful, epic work -- I can't wait to land this.

I'm basically happy to r+ as is (after a rebase), but I'm curious whether all uses of explicit closure notation (|&: ...| and move) are needed.

This also leaves me somewhat curious what our long term conventions around boxing will be. For example, this change adds type parameters to a fair number of structs that we would probably be equally happy to just put boxed closures in. But I think this is a question we can resolve over time, and for now getting off the old-style closures is the main thing -- we can always box up the new-style ones over time. Most public APIs should stay unboxed in any case, I think.

@japaric
Copy link
Member Author

japaric commented Dec 13, 2014

I'm curious whether all uses of explicit closure notation (|&: ...| and move) are needed.

Well, in most cases I added the explicit notation to avoid compiler errors, but I may have over annotated in some places.

Also, IIRC, @nikomatsakis mentioned (in his blog post, I think) that he wanted/planned to improve the inference of the "generic" || {} syntax to not require the explict move/&: notation. But, I'm not sure how hard it would be implement that (i.e. if it can be done for 1.0). I do expect that the following code will work as soon as we drop the boxed closures:

fn f<F>(_: F) where F: Fn() {}

fn main() {
    let closure = || {};   // currently, this defaults to a boxed closure
    f(closure);  // currently, errors with `Fn` trait is not implemented for `||` type
}

Which is one of main cases for explicit closure annotation (&:).

If we can improve the || {} inference, then I think we can remove most of the explicit annotations.

This also leaves me somewhat curious what our long term conventions around boxing will be.

So, one "advantage" of storing a bare function (fn()) or an unboxed closure instead of a boxed closure (Box<Fn()>) in a struct like Bytes, an alias of Map, is that in the former case the struct can implement Copy. Though not everyone likes implicitly copyable structs, specially iterators.

There are also some places where can simply not use boxed closures, like libcore, since there is no Box there.


I'll rebase this ASAP.

(BTW, I also wanted to add #[deriving(Clone, Copy)] to several structs that now contain unboxed closures (this would fix issues like #12677), but this patch is already too big. So, I'll do that in another PR.)

@japaric
Copy link
Member Author

japaric commented Dec 13, 2014

@aturon rebased

bors added a commit that referenced this pull request Dec 13, 2014
This PR moves almost all our current uses of closures, both in public API and internal uses, to the new "unboxed" closures system.

In most cases, downstream code that *only uses* closures will continue to work as it is. The reason is that the `|| {}` syntax can be inferred either as a boxed or an "unboxed" closure according to the context. For example the following code will continue to work:

``` rust
some_option.map(|x| x.transform_with(upvar))
```

And will get silently upgraded to an "unboxed" closure.

In some other cases, it may be necessary to "annotate" which `Fn*` trait the closure implements:

```
// Change this
|x| { /* body */}
// to either of these
|: x| { /* body */}  // closure implements the FnOnce trait
|&mut : x| { /* body */}  // FnMut
|&: x| { /* body */}  // Fn
```

This mainly occurs when the closure is assigned to a variable first, and then passed to a function/method.

``` rust
let closure = |: x| x.transform_with(upvar);
some.option.map(closure)
```

(It's very likely that in the future, an improved inference engine will make this annotation unnecessary)

Other cases that require annotation are closures that implement some trait via a blanket `impl`, for example:

- `std::finally::Finally`
- `regex::Replacer`
- `std::str::CharEq`

``` rust
string.trim_left_chars(|c: char| c.is_whitespace())
//~^ ERROR: the trait `Fn<(char,), bool>` is not implemented for the type `|char| -> bool`
string.trim_left_chars(|&: c: char| c.is_whitespace())  // OK
```

Finally, all implementations of traits that contain boxed closures in the arguments of their methods are now broken. And will need to be updated to use unboxed closures. These are the main affected traits:

- `serialize::Decoder`
- `serialize::DecoderHelpers`
- `serialize::Encoder`
- `serialize::EncoderHelpers`
- `rustrt::ToCStr`

For example, change this:

``` rust
// libserialize/json.rs
impl<'a> Encoder<io::IoError> for Encoder<'a> {
    fn emit_enum(&mut self,
                 _name: &str,
                 f: |&mut Encoder<'a>| -> EncodeResult) -> EncodeResult {
        f(self)
    }
}
```

to:

``` rust
// libserialize/json.rs
impl<'a> Encoder<io::IoError> for Encoder<'a> {
    fn emit_enum<F>(&mut self, _name: &str, f: F) -> EncodeResult where
        F: FnOnce(&mut Encoder<'a>) -> EncodeResult
    {
        f(self)
    }
}
```

[breaking-change]

---

### How the `Fn*` bound has been selected

I've chosen the bounds to make the functions/structs as "generic as possible", i.e. to let them allow the maximum amount of input.

- An `F: FnOnce` bound accepts the three kinds of closures: `|:|`, `|&mut:|` and `|&:|`.
- An `F: FnMut` bound only accepts "non-consuming" closures: `|&mut:|` and `|&:|`.
- An `F: Fn` bound only accept the "immutable environment" closures: `|&:|`.

This means that whenever possible the `FnOnce` bound has been used, if the `FnOnce` bound couldn't be used, then the `FnMut` was used. The `Fn` bound was never used in the whole repository.

The `FnMut` bound was the most used, because it resembles the semantics of the current boxed closures: the closure can modify its environment, and the closure may be called several times.

The `FnOnce` bound allows new semantics: you can move out the upvars when the closure is called. This can be effectively paired with the `move || {}` syntax to transfer ownership from the environment to the closure caller.

In the case of trait methods, is hard to select the "right" bound since we can't control how the trait may be implemented by downstream users. In these cases, I have selected the bound based on how we use these traits in the repository. For this reason the selected bounds may not be ideal, and may require tweaking before stabilization.

r? @aturon
@bors bors closed this Dec 14, 2014
@bors bors merged commit b8e0b81 into rust-lang:master Dec 14, 2014
@japaric japaric deleted the uc branch December 16, 2014 02:09
@japaric
Copy link
Member Author

japaric commented Dec 17, 2014

The binary size of the Rust distribution (bin + lib folders) went up 3% after this change. Here's a table with per file binary size change, and here's the raw data.

cc @aturon @brson @huonw

@huonw
Copy link
Member

huonw commented Dec 17, 2014

Sounds like my concern was unfounded! Thanks for investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants