Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more custom folding to core::iter adaptors #44856

Merged
merged 1 commit into from
Sep 29, 2017

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Sep 26, 2017

Many of the iterator adaptors will perform faster folds if they forward
to their inner iterator's folds, especially for inner types like Chain
which are optimized too. The following types are newly specialized:

Type fold rfold
Enumerate
Filter
FilterMap
FlatMap exists
Fuse
Inspect
Peekable N/A¹
Skip N/A²
SkipWhile N/A¹

¹ not a DoubleEndedIterator

² Skip::next_back doesn't pull skipped items at all, but this couldn't
be avoided if Skip::rfold were to call its inner iterator's rfold.

Benchmarks

In the following results, plain _sum computes the sum of a million
integers -- note that sum() is implemented with fold(). The
_ref_sum variants do the same on a by_ref() iterator, which is
limited to calling next() one by one, without specialized fold.

The chain variants perform the same tests on two iterators chained
together, to show a greater benefit of forwarding fold internally.

test iter::bench_enumerate_chain_ref_sum  ... bench:   2,216,264 ns/iter (+/- 29,228)
test iter::bench_enumerate_chain_sum      ... bench:     922,380 ns/iter (+/- 2,676)
test iter::bench_enumerate_ref_sum        ... bench:     476,094 ns/iter (+/- 7,110)
test iter::bench_enumerate_sum            ... bench:     476,438 ns/iter (+/- 3,334)

test iter::bench_filter_chain_ref_sum     ... bench:   2,266,095 ns/iter (+/- 6,051)
test iter::bench_filter_chain_sum         ... bench:     745,594 ns/iter (+/- 2,013)
test iter::bench_filter_ref_sum           ... bench:     889,696 ns/iter (+/- 1,188)
test iter::bench_filter_sum               ... bench:     667,325 ns/iter (+/- 1,894)

test iter::bench_filter_map_chain_ref_sum ... bench:   2,259,195 ns/iter (+/- 353,440)
test iter::bench_filter_map_chain_sum     ... bench:   1,223,280 ns/iter (+/- 1,972)
test iter::bench_filter_map_ref_sum       ... bench:     611,607 ns/iter (+/- 2,507)
test iter::bench_filter_map_sum           ... bench:     611,610 ns/iter (+/- 472)

test iter::bench_fuse_chain_ref_sum       ... bench:   2,246,106 ns/iter (+/- 22,395)
test iter::bench_fuse_chain_sum           ... bench:     634,887 ns/iter (+/- 1,341)
test iter::bench_fuse_ref_sum             ... bench:     444,816 ns/iter (+/- 1,748)
test iter::bench_fuse_sum                 ... bench:     316,954 ns/iter (+/- 2,616)

test iter::bench_inspect_chain_ref_sum    ... bench:   2,245,431 ns/iter (+/- 21,371)
test iter::bench_inspect_chain_sum        ... bench:     631,645 ns/iter (+/- 4,928)
test iter::bench_inspect_ref_sum          ... bench:     317,437 ns/iter (+/- 702)
test iter::bench_inspect_sum              ... bench:     315,942 ns/iter (+/- 4,320)

test iter::bench_peekable_chain_ref_sum   ... bench:   2,243,585 ns/iter (+/- 12,186)
test iter::bench_peekable_chain_sum       ... bench:     634,848 ns/iter (+/- 1,712)
test iter::bench_peekable_ref_sum         ... bench:     444,808 ns/iter (+/- 480)
test iter::bench_peekable_sum             ... bench:     317,133 ns/iter (+/- 3,309)

test iter::bench_skip_chain_ref_sum       ... bench:   1,778,734 ns/iter (+/- 2,198)
test iter::bench_skip_chain_sum           ... bench:     761,850 ns/iter (+/- 1,645)
test iter::bench_skip_ref_sum             ... bench:     478,207 ns/iter (+/- 119,252)
test iter::bench_skip_sum                 ... bench:     315,614 ns/iter (+/- 3,054)

test iter::bench_skip_while_chain_ref_sum ... bench:   2,486,370 ns/iter (+/- 4,845)
test iter::bench_skip_while_chain_sum     ... bench:     633,915 ns/iter (+/- 5,892)
test iter::bench_skip_while_ref_sum       ... bench:     666,926 ns/iter (+/- 804)
test iter::bench_skip_while_sum           ... bench:     444,405 ns/iter (+/- 571)

Many of the iterator adaptors will perform faster folds if they forward
to their inner iterator's folds, especially for inner types like `Chain`
which are optimized too.  The following types are newly specialized:

| Type        | `fold` | `rfold` |
| ----------- | ------ | ------- |
| `Enumerate` | ✓      | ✓       |
| `Filter`    | ✓      | ✓       |
| `FilterMap` | ✓      | ✓       |
| `FlatMap`   | exists | ✓       |
| `Fuse`      | ✓      | ✓       |
| `Inspect`   | ✓      | ✓       |
| `Peekable`  | ✓      | N/A¹    |
| `Skip`      | ✓      | N/A²    |
| `SkipWhile` | ✓      | N/A¹    |

¹ not a `DoubleEndedIterator`

² `Skip::next_back` doesn't pull skipped items at all, but this couldn't
be avoided if `Skip::rfold` were to call its inner iterator's `rfold`.

Benchmarks
----------

In the following results, plain `_sum` computes the sum of a million
integers -- note that `sum()` is implemented with `fold()`.  The
`_ref_sum` variants do the same on a `by_ref()` iterator, which is
limited to calling `next()` one by one, without specialized `fold`.

The `chain` variants perform the same tests on two iterators chained
together, to show a greater benefit of forwarding `fold` internally.

    test iter::bench_enumerate_chain_ref_sum  ... bench:   2,216,264 ns/iter (+/- 29,228)
    test iter::bench_enumerate_chain_sum      ... bench:     922,380 ns/iter (+/- 2,676)
    test iter::bench_enumerate_ref_sum        ... bench:     476,094 ns/iter (+/- 7,110)
    test iter::bench_enumerate_sum            ... bench:     476,438 ns/iter (+/- 3,334)

    test iter::bench_filter_chain_ref_sum     ... bench:   2,266,095 ns/iter (+/- 6,051)
    test iter::bench_filter_chain_sum         ... bench:     745,594 ns/iter (+/- 2,013)
    test iter::bench_filter_ref_sum           ... bench:     889,696 ns/iter (+/- 1,188)
    test iter::bench_filter_sum               ... bench:     667,325 ns/iter (+/- 1,894)

    test iter::bench_filter_map_chain_ref_sum ... bench:   2,259,195 ns/iter (+/- 353,440)
    test iter::bench_filter_map_chain_sum     ... bench:   1,223,280 ns/iter (+/- 1,972)
    test iter::bench_filter_map_ref_sum       ... bench:     611,607 ns/iter (+/- 2,507)
    test iter::bench_filter_map_sum           ... bench:     611,610 ns/iter (+/- 472)

    test iter::bench_fuse_chain_ref_sum       ... bench:   2,246,106 ns/iter (+/- 22,395)
    test iter::bench_fuse_chain_sum           ... bench:     634,887 ns/iter (+/- 1,341)
    test iter::bench_fuse_ref_sum             ... bench:     444,816 ns/iter (+/- 1,748)
    test iter::bench_fuse_sum                 ... bench:     316,954 ns/iter (+/- 2,616)

    test iter::bench_inspect_chain_ref_sum    ... bench:   2,245,431 ns/iter (+/- 21,371)
    test iter::bench_inspect_chain_sum        ... bench:     631,645 ns/iter (+/- 4,928)
    test iter::bench_inspect_ref_sum          ... bench:     317,437 ns/iter (+/- 702)
    test iter::bench_inspect_sum              ... bench:     315,942 ns/iter (+/- 4,320)

    test iter::bench_peekable_chain_ref_sum   ... bench:   2,243,585 ns/iter (+/- 12,186)
    test iter::bench_peekable_chain_sum       ... bench:     634,848 ns/iter (+/- 1,712)
    test iter::bench_peekable_ref_sum         ... bench:     444,808 ns/iter (+/- 480)
    test iter::bench_peekable_sum             ... bench:     317,133 ns/iter (+/- 3,309)

    test iter::bench_skip_chain_ref_sum       ... bench:   1,778,734 ns/iter (+/- 2,198)
    test iter::bench_skip_chain_sum           ... bench:     761,850 ns/iter (+/- 1,645)
    test iter::bench_skip_ref_sum             ... bench:     478,207 ns/iter (+/- 119,252)
    test iter::bench_skip_sum                 ... bench:     315,614 ns/iter (+/- 3,054)

    test iter::bench_skip_while_chain_ref_sum ... bench:   2,486,370 ns/iter (+/- 4,845)
    test iter::bench_skip_while_chain_sum     ... bench:     633,915 ns/iter (+/- 5,892)
    test iter::bench_skip_while_ref_sum       ... bench:     666,926 ns/iter (+/- 804)
    test iter::bench_skip_while_sum           ... bench:     444,405 ns/iter (+/- 571)
@rust-highfive
Copy link
Collaborator

r? @dtolnay

(rust_highfive has picked a reviewer for you, use r? to override)

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work.

@dtolnay
Copy link
Member

dtolnay commented Sep 26, 2017

@bors r+

@bors
Copy link
Contributor

bors commented Sep 26, 2017

📌 Commit 13724fa has been approved by dtolnay

@arielb1 arielb1 added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Sep 26, 2017
@bors
Copy link
Contributor

bors commented Sep 29, 2017

⌛ Testing commit 13724fa with merge 09ee9b7...

bors added a commit that referenced this pull request Sep 29, 2017
Add more custom folding to `core::iter` adaptors

Many of the iterator adaptors will perform faster folds if they forward
to their inner iterator's folds, especially for inner types like `Chain`
which are optimized too.  The following types are newly specialized:

| Type        | `fold` | `rfold` |
| ----------- | ------ | ------- |
| `Enumerate` | ✓      | ✓       |
| `Filter`    | ✓      | ✓       |
| `FilterMap` | ✓      | ✓       |
| `FlatMap`   | exists | ✓       |
| `Fuse`      | ✓      | ✓       |
| `Inspect`   | ✓      | ✓       |
| `Peekable`  | ✓      | N/A¹    |
| `Skip`      | ✓      | N/A²    |
| `SkipWhile` | ✓      | N/A¹    |

¹ not a `DoubleEndedIterator`

² `Skip::next_back` doesn't pull skipped items at all, but this couldn't
be avoided if `Skip::rfold` were to call its inner iterator's `rfold`.

Benchmarks
----------

In the following results, plain `_sum` computes the sum of a million
integers -- note that `sum()` is implemented with `fold()`.  The
`_ref_sum` variants do the same on a `by_ref()` iterator, which is
limited to calling `next()` one by one, without specialized `fold`.

The `chain` variants perform the same tests on two iterators chained
together, to show a greater benefit of forwarding `fold` internally.

    test iter::bench_enumerate_chain_ref_sum  ... bench:   2,216,264 ns/iter (+/- 29,228)
    test iter::bench_enumerate_chain_sum      ... bench:     922,380 ns/iter (+/- 2,676)
    test iter::bench_enumerate_ref_sum        ... bench:     476,094 ns/iter (+/- 7,110)
    test iter::bench_enumerate_sum            ... bench:     476,438 ns/iter (+/- 3,334)

    test iter::bench_filter_chain_ref_sum     ... bench:   2,266,095 ns/iter (+/- 6,051)
    test iter::bench_filter_chain_sum         ... bench:     745,594 ns/iter (+/- 2,013)
    test iter::bench_filter_ref_sum           ... bench:     889,696 ns/iter (+/- 1,188)
    test iter::bench_filter_sum               ... bench:     667,325 ns/iter (+/- 1,894)

    test iter::bench_filter_map_chain_ref_sum ... bench:   2,259,195 ns/iter (+/- 353,440)
    test iter::bench_filter_map_chain_sum     ... bench:   1,223,280 ns/iter (+/- 1,972)
    test iter::bench_filter_map_ref_sum       ... bench:     611,607 ns/iter (+/- 2,507)
    test iter::bench_filter_map_sum           ... bench:     611,610 ns/iter (+/- 472)

    test iter::bench_fuse_chain_ref_sum       ... bench:   2,246,106 ns/iter (+/- 22,395)
    test iter::bench_fuse_chain_sum           ... bench:     634,887 ns/iter (+/- 1,341)
    test iter::bench_fuse_ref_sum             ... bench:     444,816 ns/iter (+/- 1,748)
    test iter::bench_fuse_sum                 ... bench:     316,954 ns/iter (+/- 2,616)

    test iter::bench_inspect_chain_ref_sum    ... bench:   2,245,431 ns/iter (+/- 21,371)
    test iter::bench_inspect_chain_sum        ... bench:     631,645 ns/iter (+/- 4,928)
    test iter::bench_inspect_ref_sum          ... bench:     317,437 ns/iter (+/- 702)
    test iter::bench_inspect_sum              ... bench:     315,942 ns/iter (+/- 4,320)

    test iter::bench_peekable_chain_ref_sum   ... bench:   2,243,585 ns/iter (+/- 12,186)
    test iter::bench_peekable_chain_sum       ... bench:     634,848 ns/iter (+/- 1,712)
    test iter::bench_peekable_ref_sum         ... bench:     444,808 ns/iter (+/- 480)
    test iter::bench_peekable_sum             ... bench:     317,133 ns/iter (+/- 3,309)

    test iter::bench_skip_chain_ref_sum       ... bench:   1,778,734 ns/iter (+/- 2,198)
    test iter::bench_skip_chain_sum           ... bench:     761,850 ns/iter (+/- 1,645)
    test iter::bench_skip_ref_sum             ... bench:     478,207 ns/iter (+/- 119,252)
    test iter::bench_skip_sum                 ... bench:     315,614 ns/iter (+/- 3,054)

    test iter::bench_skip_while_chain_ref_sum ... bench:   2,486,370 ns/iter (+/- 4,845)
    test iter::bench_skip_while_chain_sum     ... bench:     633,915 ns/iter (+/- 5,892)
    test iter::bench_skip_while_ref_sum       ... bench:     666,926 ns/iter (+/- 804)
    test iter::bench_skip_while_sum           ... bench:     444,405 ns/iter (+/- 571)
@bors
Copy link
Contributor

bors commented Sep 29, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: dtolnay
Pushing 09ee9b7 to master...

@bors bors merged commit 13724fa into rust-lang:master Sep 29, 2017
@cuviper cuviper deleted the more-fold branch October 19, 2017 23:29
bors added a commit that referenced this pull request Nov 17, 2017
Short-circuiting internal iteration with Iterator::try_fold & try_rfold

These are the core methods in terms of which the other methods (`fold`, `all`, `any`, `find`, `position`, `nth`, ...) can be implemented, allowing Iterator implementors to get the full goodness of internal iteration by only overriding one method (per direction).

Based off the `Try` trait, so works with both `Result` and `Option` (:tada: #42526).  The `try_fold` rustdoc examples use `Option` and the `try_rfold` ones use `Result`.

AKA continuing in the vein of PRs #44682 & #44856 for more of `Iterator`.

New bench following the pattern from the latter of those:
```
test iter::bench_take_while_chain_ref_sum          ... bench:   1,130,843 ns/iter (+/- 25,110)
test iter::bench_take_while_chain_sum              ... bench:     362,530 ns/iter (+/- 391)
```

I also ran the benches without the `fold` & `rfold` overrides to test their new default impls, with basically no change.  I left them there, though, to take advantage of existing overrides and because `AlwaysOk` has some sub-optimality due to #43278 (which 45225 should fix).

If you're wondering why there are three type parameters, see issue #45462

Thanks for @bluss for the [original IRLO thread](https://internals.rust-lang.org/t/pre-rfc-fold-ok-is-composable-internal-iteration/4434) and the rfold PR and to @cuviper for adding so many folds, [encouraging me](#45379 (comment)) to make this PR, and finding a catastrophic bug in a pre-review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants