Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterator::for_each #42782

Merged
merged 3 commits into from
Jun 30, 2017
Merged

Add Iterator::for_each #42782

merged 3 commits into from
Jun 30, 2017

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Jun 20, 2017

This works like a for loop in functional style, applying a closure to
every item in the Iterator. It doesn't allow break/continue like
a for loop, nor any other control flow outside the closure, but it may
be a more legible style for tying up the end of a long iterator chain.

This was tried before in #14911, but nobody made the case for using it
with longer iterators. There was also Iterator::advance at that time
which was more capable than for_each, but that no longer exists.

The itertools crate has Itertools::foreach with the same behavior,
but thankfully the names won't collide. The rayon crate also has a
ParallelIterator::for_each where simple for loops aren't possible.

I really wish we had for_each on seq iterators. Having to use a
dummy operation is annoying. - @nikomatsakis

This works like a `for` loop in functional style, applying a closure to
every item in the `Iterator`.  It doesn't allow `break`/`continue` like
a `for` loop, nor any other control flow outside the closure, but it may
be a more legible style for tying up the end of a long iterator chain.

This was tried before in rust-lang#14911, but nobody made the case for using it
with longer iterators.  There was also `Iterator::advance` at that time
which was more capable than `for_each`, but that no longer exists.

The `itertools` crate has `Itertools::foreach` with the same behavior,
but thankfully the names won't collide.  The `rayon` crate also has a
`ParallelIterator::for_each` where simple `for` loops aren't possible.

> I really wish we had `for_each` on seq iterators. Having to use a
> dummy operation is annoying.  - [@nikomatsakis][1]

[1]: https://github.com/nikomatsakis/rayon/pull/367#issuecomment-308455185
@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@cuviper
Copy link
Member Author

cuviper commented Jun 20, 2017

@bluss I'm curious about your "interesting reasons" to use fold in Itertools::foreach. What sort of optimizations come out of that?

@alexcrichton
Copy link
Member

👍 I'm game!

@rust-lang/libs, any others have thoughts?

@alexcrichton alexcrichton added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 20, 2017
@aturon
Copy link
Member

aturon commented Jun 21, 2017

Works for me!

@sfackler
Copy link
Member

I seem to remember there being philosophical objections to this back in the day, but I don't feel particularly strongly.

@cuviper
Copy link
Member Author

cuviper commented Jun 21, 2017 via email

@cuviper
Copy link
Member Author

cuviper commented Jun 21, 2017

Woah, for_each based on fold is a clear win over a for loop when chain is involved. I'll update it and see about adding benchmarks to show this benefit, and we should figure out how to express this in the documentation.

The benefit of using internal iteration is shown in new benchmarks:

    test iter::bench_for_each_chain_fold     ... bench:     635,110 ns/iter (+/- 5,135)
    test iter::bench_for_each_chain_loop     ... bench:   2,249,983 ns/iter (+/- 42,001)
    test iter::bench_for_each_chain_ref_fold ... bench:   2,248,061 ns/iter (+/- 51,940)
@cuviper cuviper force-pushed the iterator_for_each branch from c5c238b to 4a8ddac Compare June 21, 2017 20:22
/// #![feature(iterator_for_each)]
///
/// let mut v = vec![];
/// (0..5).for_each(|x| v.push(x * 100));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not necessarily opposed to the current example you have written, but i find the current example slightly less idiomatic (subjectively) than something like:

let v: Vec<_> = (0..5).map(|x| x * 100).collect();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was just aiming for something simple and testable. I would definitely use collect or extend for that in real code. Any ideas for something more meaningful?

Similarly, the added benchmarks are just sums.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas for something more meaningful?

New cookbook contributors usually have problem with consuming functional flow they have built just for the sake of side effects (if they do not wish to obtain any value like in fold or collect). Switching to imperative for just to obtain side effects feels not idiomatic.

Some artificial examples that might not be any better 😸

let (tx, rx) = channel();
(0..5).map(|x| x * 2 + 1).for_each(|x| { tx.send(x).unwrap(); } );
["1", "2", "lol", "baz", "55"]
    .iter()
    .filter_map(|s| s.parse::<u16>().ok())
    .map(|v| v * v)
    .for_each(|v| println!("{}", v));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@budziq I'm glad to hear of more motivation for this!

Your additional examples are OK, but still not testing anything besides successfully compiling. Note that my first example has an assert_eq with the for-loop result, so we actually get some sanity check that it really works, as trivial as that is. Your channel example could read the rx side to check the result though.

Copy link
Contributor

@budziq budziq Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cuviper so something like that might be ok?

use std::sync::mpsc::channel;

let (tx, rx) = channel();
(0..5).map(|x| x * 2 + 1).for_each(|x| { tx.send(x).unwrap(); } );
assert_eq!(vec![1, 3, 5, 7, 9], rx.iter().take(5).collect::<Vec<_>>());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work. Do folks like that better? Or perhaps as one more example?

@steveklabnik
Copy link
Member

I personally really wanted a foreach but yeah, it was rejected, what's changed since then?

@cuviper
Copy link
Member Author

cuviper commented Jun 27, 2017

@steveklabnik I think at that time the main point was that short iterators are probably better off with a for loop and more control-flow options. It wasn't properly considered for use with longer iterator chains. When your iterator takes up multiple lines, a for loop either makes awkward formatting or requires saving to a local first, both annoying.

Also, the performance benefit of internal iteration via fold is pretty compelling, and we can implement that in for_each without users having to understand it.

@steveklabnik
Copy link
Member

It wasn't properly considered for use with longer iterator chains.

I must have done a bad job advocating back then, then. Oh well.

@alexcrichton
Copy link
Member

Ok sounds like there's no reason to not experiment with this at this point, @cuviper if you want to update the example (which I think the comments indicate?) then I'll r+

@steveklabnik
Copy link
Member

History time!

I would argue that rust-lang/rfcs#1064 (comment) by @nagisa is a decent summary:

So far none of the points arguing against the RFC have become false:

  • There’s a more general version of this function already available in the standard library as well as various chains that produce the desired behaviour;
  • THere’s the for-loop construct;
  • itertools package still provides a convenience wrapper for the for-loop.

The first bullet point has changed. The other ones are still true.

@cuviper
Copy link
Member Author

cuviper commented Jun 27, 2017

Thanks for finding more context! My search-fu is apparently weak...

I don't think we have to falsify every point against -- only compare them to the points for:

@cuviper
Copy link
Member Author

cuviper commented Jun 27, 2017

In general, I feel like there are a lot of people that do want this, and the people against are just "meh".

Anyway, I updated the first example like @budziq's suggestion.

@nikomatsakis
Copy link
Contributor

Obviously I've already been quoted at the top, but I agree with @cuviper that I think the pros outweigh the cons. I think that there is indeed new information since the last time this was discussed:

  1. The performance benefits of "internal iteration" were not widely discussed at the time (unless I remember incorrectly; I confess I didn't bother to click all of @steveklabnik's links, I'm just going based on my memory).
  2. The fact that, for parallel iteration, for_each is necessary.

The two together mean that encouraging the use of for_each when it is convenient for sequential iteration will make code faster by default, and also facilitate the conversion to parallel execution. And it's more ergonomic to boot! Seems like a win-win to me.

The main argument against is basically "TMWTDI", which -- from my POV -- is a good enough reason to stop things without compelling advantages, but not an absolute blocker.

I also think one of the points made at the time is at least somewhat false:

There’s a more general version of this function already available in the standard library as well as various chains that produce the desired behaviour;

I presume that this is referring to fold or all? I don't consider that a real alternative. It's true that one can model for_each this way, but it's misleading and makes the code harder to read -- you have to realize that the function is being abused for something other than its intended purpose, which imo starts to defeat the point of using iterators. Moreover, as a consequence of that, code written in this style cannot be parallelized as efficiently or as well (e.g., the all combinator will waste time propagating booleans and checking for shortcircuits; fold isn't even available).

///
/// let (tx, rx) = channel();
/// (0..5).map(|x| x * 2 + 1)
/// .for_each(move |x| tx.send(x).unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the move necessary here? I would not expect so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's too sneaky, but that lets tx drop automatically, and then rx won't block.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Jun 30, 2017

📌 Commit e72ee6e has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Jun 30, 2017

⌛ Testing commit e72ee6e with merge 919c4a6...

bors added a commit that referenced this pull request Jun 30, 2017
Add `Iterator::for_each`

This works like a `for` loop in functional style, applying a closure to
every item in the `Iterator`.  It doesn't allow `break`/`continue` like
a `for` loop, nor any other control flow outside the closure, but it may
be a more legible style for tying up the end of a long iterator chain.

This was tried before in #14911, but nobody made the case for using it
with longer iterators.  There was also `Iterator::advance` at that time
which was more capable than `for_each`, but that no longer exists.

The `itertools` crate has `Itertools::foreach` with the same behavior,
but thankfully the names won't collide.  The `rayon` crate also has a
`ParallelIterator::for_each` where simple `for` loops aren't possible.

> I really wish we had `for_each` on seq iterators. Having to use a
> dummy operation is annoying.  - [@nikomatsakis][1]

[1]: https://github.com/nikomatsakis/rayon/pull/367#issuecomment-308455185
@bors
Copy link
Contributor

bors commented Jun 30, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 919c4a6 to master...

@bors bors merged commit e72ee6e into rust-lang:master Jun 30, 2017
@cuviper
Copy link
Member Author

cuviper commented Jun 30, 2017

Yay!

Now, since I left it unstable with issue = "0", do we need to open an issue and update that with a PR?

@Mark-Simulacrum
Copy link
Member

Oh, yes, please do. We probably shouldn't have merged yet actually, but no worries!

@cuviper
Copy link
Member Author

cuviper commented Jun 30, 2017

See #42987 for that update.

@phimuemue
Copy link
Contributor

phimuemue commented Apr 2, 2018

Hi, I stumbled upon this, but I wondered why for_each does not support break. I hope I don't miss something essential, but I imagined it would be no problem for for_each to take a function returning a value that determines whether to break or not.

This value could be - as it is now - a () (never causing the loop to break) or a bool indicating whether we want to break or an Option<T> indicatin whether we want to "break with a certain item".

In code, this could be captured by a trait BreakIndicator as follows:

trait BreakIndicator {
    fn is_break(&self) -> bool;
    fn final_continue() -> Self;
}

impl BreakIndicator for () {
    // always continue
    fn is_break(&self) -> bool { false }
    fn final_continue() -> () {}
}

impl BreakIndicator for bool {
    // true means break; false means continue
    fn is_break(&self) -> bool { *self }
    fn final_continue() -> bool { false }
}

impl<T> BreakIndicator for Option<T> {
    // Some(v) means "break wich value v"; None means continue
    fn is_break(&self) -> bool { self.is_some() }
    fn final_continue() -> Option<T> { None }
}

I implemented a fn breaking_for_each on top of Iterator to see how that would work out:

trait IteratorWithBreakingForEach : Iterator {
    fn breaking_for_each<F, BI>(self, f: F) -> BI
        where F: FnMut(Self::Item) -> BI,
              BI: BreakIndicator,
    ;
}

impl<I> IteratorWithBreakingForEach for I where I: Iterator {
    fn breaking_for_each<F, BI>(self, mut f: F) -> BI
        where F: FnMut(Self::Item) -> BI,
              BI: BreakIndicator,
    {
        for item in self {
            let break_indicator = f(item);
            if break_indicator.is_break() {
                return break_indicator;
            }
        }
        BI::final_continue()
    }
}

Usage could be as follows:

fn main() {
    (1..10).breaking_for_each(|i| {
        println!("{}", i) // does not break at all
    });
    (1..10).breaking_for_each(|i| {
        println!("{}", i);
        i>=5 // break if i>=5
    });
    let x = (1..10).breaking_for_each(|i| {
        println!("{}", i);
        if i>=5 {
            Some(i) // break with value
        } else {
            None // continue
        }
    });
    println!("{:?}", x);
}

Has this ever been thought about? And if so, why was it apparently rejected?

@cuviper
Copy link
Member Author

cuviper commented Apr 2, 2018

@phimuemue There's a form of that built around the Try trait with try_for_each (docs). There's also some discussion in #42327 (comment) whether Try should be re-framed more like Break/Continue.

@scottmcm
Copy link
Member

scottmcm commented Apr 2, 2018

@phimuemue As an example of "break with a certain item", check out how find is implemented:

fn find<P>(&mut self, mut predicate: P) -> Option<Self::Item> where
Self: Sized,
P: FnMut(&Self::Item) -> bool,
{
self.try_for_each(move |x| {
if predicate(&x) { LoopState::Break(x) }
else { LoopState::Continue(()) }
}).break_value()
}

(That LoopState type is currently internal, but personally I'd like Try to use it, as @cuviper said.)

If you wanted to do the same in your code today†, it can be done with Result like this:

    self.try_for_each(move |x| { 
        if predicate(&x) { Err(x) } 
        else { Ok(()) } 
    }).err() 

† Well, once someone makes a stabilization PR for try_for_each...

@cuviper cuviper mentioned this pull request Apr 16, 2018
@scottmcm
Copy link
Member

Hello 4 years later! In case anyone else ends up here, the LoopState type mentioned in my previous comment is now available on stable as https://doc.rust-lang.org/stable/std/ops/enum.ControlFlow.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.