Add iterator specialisations for Repeat and Cycle #47370

varkor · 2018-01-12T01:24:57Z

Repeat and Cycle are infinite iterators that make use of the default iterator implementations, which means that for some methods for which there is a determinable return value, they infinitely loop instead.

This adds specialisations so that the iterators return values as expected. Also optimises nth for Repeat.

r? @alexcrichton

sfackler · 2018-01-12T01:26:06Z

Could you clarify the rationale for doing this? Who is asking for the maximum value of an infinite iterator?

varkor · 2018-01-12T01:34:38Z

@sfackler: I think this provides the least surprising behaviour, is semantically correct (if we can determine the result of a method call, I think we should do it, rather than looping) and provides more flexibility for generic functions — it's an edge case, but if it's one the compiler can worry about instead of the user, that seems like a plus.
For Cycle especially this could simplify some code patterns, though for Range it's mostly trying to reduce a potential papercut.

scottmcm

I'm not a fan of this change; it feels very brittle. There are a ton of infinite iterators where this wouldn't be the case, including ones very similar to ones where it applies -- even just .filter(|_| true) stops these updates from working.

scottmcm · 2018-01-12T05:31:57Z

src/libcore/iter/mod.rs

@@ -643,6 +643,41 @@ impl<I> Iterator for Cycle<I> where I: Clone + Iterator {
            _ => (usize::MAX, None)
        }
    }
+
+    #[inline]
+    fn all<F>(&mut self, f: F) -> bool where F: FnMut(Self::Item) -> bool { self.orig.clone().all(f) }


Using a new clone of orig instead of using iter is wrong here, since it'll short-circuit in a different place.

Ah, that's true.

scottmcm · 2018-01-12T05:36:11Z

src/libcore/iter/sources.rs

    #[inline]
    fn size_hint(&self) -> (usize, Option<usize>) { (usize::MAX, None) }
+
+    #[inline]
+    fn nth(&mut self, _: usize) -> Option<A> { self.next() }


I like this one more than the others here, but it's an observable change if A::clone has side-effects. Is that allowable?

Oh, I didn't consider that. It seems to me that A::clone is an implementation detail that shouldn't be relied upon, but as it's technically observable at the moment, and the documentation doesn't specify otherwise, it might be worth clarifying this.

scottmcm · 2018-01-12T05:38:14Z

src/libcore/iter/sources.rs

+    fn nth(&mut self, _: usize) -> Option<A> { self.next() }
+
+    #[inline]
+    fn all<F>(&mut self, f: F) -> bool where F: FnMut(A) -> bool { self.any(f) }


I'm confused by this one, since it looks like it can still be infinite. If we want this kind of behavior, wouldn't it be f(self.element.clone())?

sfackler · 2018-01-12T06:08:44Z

more flexibility for generic functions

Which generic functions?

hanna-kruppe · 2018-01-12T10:01:39Z

IIUC some of these specializations make functions that previously looped infinitely return some value. Similar changes were previously proposed in #47169 where I and others argued against them. The arguments against behavioral changes apply to these behavioral changes as well (the ones about RangeFrom::next panicking don't).

varkor · 2018-01-12T12:28:42Z

Seeing how this change is a little controversial, maybe I should have opened an issue to discuss it first (#47169 discusses some related aspects, but I think the main issues there were not all the same as here as I'll mention below) — but maybe it's helpful to have an example implementation to look at while discussing. (Sorry, slightly long post, but I do really think this behaviour is more sensible and intuitive than the existing behaviour, though I agree it does need to be discussed before making a decision.)
(Including @alexcrichton in response to RangeFrom::min, which is the same issue as this one.)

@scottmcm: Regarding brittleness — this current approach right now is incomplete in terms of the brittleness, yes. This change only overrides methods that don't return a new iterator (like filter, which currently still causing infinite looping). If this change is approved, then I think the other methods should also be altered to make sure that calling derivatives of Repeat or Cycle also don't loop infinitely.
Regarding brittleness with other infinite iterators: the current behaviour "i.e. just loop if the iterator is unbounded" is an implementation detail, which I don't expect anyone (/ can't believe) to be relying on. It's true that there are cases where we cannot statically determine whether we can return the correct value without looping through every element, but for the cases where we can (and it is obvious to the user that it is possible — like for Repeat and Cycle), I think it totally makes sense to. The standard library sets a precedent in this respect, so if it ensures sensible answers are returned, it's more likely other library authors will do as well. This seems most intuitive to me, but perhaps it could be noted in the documentation for relevant infinite iterators.
I would argue that there is precedent in the case of methods like take — the documentation does not state that calling certain methods on infinite iterators is guaranteed to loop. take does not need to (for any infinite iterator), and neither does min. Granted, take should always halt — but this is not even necessarily the case — the implementation of take for a certain iterator could be specialised to just loop. Iterator makes no guarantees of halting at the moment, so I think there's no reason this behaviour shouldn't be changed.
@rkruppe: There were two arguments against changing RangeFrom::max — one (which was the main issue as far as I understood) was the semantics of the return value None changing, which doesn't apply here. The other was us not being able to provide these guarantees for every iterator. Frankly, I think the documentation here just doesn't exist. In the case of min on Repeat, for instance, there is only one sensible return value. I think regardless of what happens in this case, we should make this clear. The documentation could say: "In the case of unbounded iterators, the following functions never return: [...]" (which I argue is only motivated by the default implementation of Iterator, and is a side-effect of the implementation, which should have already been changed), or, for example, "In the case of unbounded iterators, these methods may loop unless the result is determinable without exhausting the iterator." (and then the documentation for specialised iterators could specify which are guaranteed to return). I agree that RangeFrom::max() == None is unviable, but the results returned by these changes are definitely correct: the issue is that they just change existing behaviour. I think apart from the case of explicit loops, a change from "non-halting → halting" should almost always be non-breaking.
@sfackler: Sorry, I was perhaps a bit cheeky in my inclusion of generic functions: I simply meant any function taking an iterator making use of these methods (max, any, etc.) would now be callable on Repeat- or Cycle-derived iterators, which they are not practically at the moment. (Of course, to make this completely straightforward, as @scottmcm points out, we would want the other derived methods to have the same finite behaviour, which I will be happy to add in a follow-up PR.) I think it's more likely that someone, for example, gets a Cycle and wants to evaluate some property of it (like all), which is currently non-trivial (maybe impossible?) at the moment, than gets confused by the nice behaviour of Cycle and thinks "Oh, Rust must be able to evaluate any infinite iterator method in finite time." Again, documentation would help here, and I'm happy to add it.

hanna-kruppe · 2018-01-12T13:14:23Z

I believe there is a lot of value in the default implementations being "canonical" and all specializations implementing the exact same behavior, just with different performance.¹ (Third party iterators can always screw this up, of course.) It means that programmers don't need to know at all whether a specific iterator overrides some method, not even for reasoning about termination or rare and obscure side effects. It would be extremely surprising and counter-intuitive if, for example, adding a filter to an iterator chain changes a program from working to looping infinitely. Similarly, it's very unlikely that the number of Clone::clone matters for program behavior, but in case it does, I really don't want to be the poor shmuck who has to track down that this is due to an "optimization" buried deep in the standard library. Edit: I think this may be because users, even advanced ones, don't usually think of an iterator as a big pile of individual combinators with their own unique quirks — it's just "an iterator chain", period.

In short, as I already expressed in #47169, I believe it's an attractive nuisance to try and "Do What I Mean" about infinite iterator some of the time, because then the many more complex cases where we can't DWIM will be very confusing.

¹ This means I would actually be fine with overriding Repeat::max with loop { self.item.clone(); } 🤔 Although there's not much motivation to do that manually, LLVM optimizations should be able to do it.

oberien · 2018-01-12T15:36:53Z

src/libcore/iter/mod.rs

+
+    #[inline]
+    fn all<F>(&mut self, f: F) -> bool where F: FnMut(Self::Item) -> bool {
+        self.iter.clone().chain(self.orig.clone()).all(f)


If this short-circuits with an element inside self.orig, this iterator should continue with the next element after the matching one in self.orig. Currently it'll instead restart with the first element in self.orig.

Ah, that's a good point. I'll fix that if the decision is to merge.

oberien · 2018-01-12T15:39:15Z

src/libcore/iter/mod.rs

+
+    #[inline]
+    fn min(self) -> Option<Self::Item> where Self::Item: cmp::Ord {
+        cmp::min(self.iter.min(), self.orig.clone().min())


self.iter is a subset of self.orig, so the minimum of self.orig will be at least as small as the minimum of self.iter.

Yeah — this is necessary to preserve the consistency of the behaviour of min when multiple elements are equally minimum. It's necessary to return the first/leftmost one, which in this case may not be the same as the first one in self.orig.

oberien · 2018-01-12T16:28:50Z

Could similar optimizations be performed for RangeFrom as another unbounded iterator?

varkor · 2018-01-12T16:35:52Z

@oberien: There's a separate PR for RangeFrom, but as the same discussion applies, we're limiting discussion to this thread so it's in one place.

bluss · 2018-01-12T20:04:09Z

👎 on the general idea of replacing an unbounded sequence with a shortcut that is surprising and possibly quickly disappears when the iterator is part of a composition with adaptors.

In particular, I think this breaks equivalence between .for_each() and .all().

(0..10).cycle().all(f) is a loop similar to the equivalent for_each ; and the user can call it and let it run as long as they want. They can break out of it by panic or other means. (Strawman use case, run (0..10).cycle().all(f) as an unbounded loop until the non-pure function f fails in some way and returns false.)

Anything you improve in all should be applied to at least any, but also for_each, try_fold etc as well, and I don't think it can be extended reasonably to those, for the reason already given.

varkor · 2018-01-12T20:55:02Z

Okay, I'm inclined to agree that all isn't a good choice for this optimisation, as it currently can be used like for_each, as @bluss says.

I still think the arguments for max and min hold, as they don't take a user-defined function. max_by_key and max_by are less certain, because although you can use them to modify state, I can't see an alternative use case for these (akin to any as a short-circuiting for_each). Whether max should be specialised if the other max_* functions aren't: I'm less sure. For consistency, "no" seems like a reasonable answer; though I still prefer preventing an infinite loop if it preserves all other semantics (unlike function-taking methods).

Edit: Without defining these specialisations for all relevant methods, it gets messy with chaining. Maybe it is too late to make a change to the semantics now, without introducing explicit new traits.

varkor · 2018-01-13T13:57:14Z

It seems there's a general consensus about the behaviour of methods on infinite iterators, so I'll go ahead and close the PR. I think a documentation tweak is in order to clarify these details though, as following these discussions I don't think there's a single behaviour that is completely obvious.

Thanks for all your input everyone!

One last matter I wanted to clear up regarding nth for Repeat, which didn't really fall into the same category as the others, as it doesn't modify termination behaviour: I strongly feel this should be a permitted optimisation. I think we should be very opposed to encouraging the possibility of relying upon the internal behaviour of a method unless it is really necessary (discouraging information hiding is really unpleasant, both from a language design POV and for anyone working on the standard library). I really feel not guaranteeing the number of times clone will be called for a value is important.

Does anyone feel there are issues with even nth? (Maybe some obscure, but actually very useful, reasons for observing clone in this way?)

hanna-kruppe · 2018-01-13T14:14:12Z

Not quite what you're talking about, but we do take the liberty to omit clone calls for things that are Copy: https://github.com/rust-lang/rfcs/blob/master/text/1521-copy-clone-semantics.md

However, the justification for that is stronger than "Clone should not have side effects period".

qnighy · 2018-01-13T14:15:55Z

There's at least an effort to maintain the clone order for Zip specializations.

varkor · 2018-01-13T15:17:00Z

@rkruppe: That does seem to be in the same vein. Arguably, it makes more sense to have to justify why calling clone may or may not perform other clones than calling nth may perform clones (or any other method).

@qnighy: Sorry, I don't quite see what you're referring to there. Isn't that an optimisation for Copy types, rather than explicitly defining Clone semantics?

qnighy · 2018-01-13T23:04:13Z

@varkor Sorry for less explanation. The TrustedRandomAccess trait is currently only used in Zip to generate an efficient code. There TrustedRandomAccess::may_have_side_effect is used to adjust the number of iterator calls only when the iterator has a side effect. For Cloned example, this is used to ensure e.g.

struct Foo;

impl Clone for Foo {
    fn clone(&self) -> Self {
        println!("Foo::clone()");
        Foo
    }
}

fn main() {
    [Foo, Foo].iter().cloned().zip([0].into_iter()).collect::<Vec<_>>();
}

produces two calls to Foo::clone().

Back to TrustedRandomAccess for Cloned impl, may_have_side_effect is true if it is Clone but not Copy, regardless of the inner iterator. So this implementation is conservative with respect to the number of clones.

Of course, it doesn't necessarily mean that there is a general consensus about the number of clone calls. It's just an example of existing conservative implementation.

aidanhs · 2018-01-17T23:47:02Z

@varkor it sounds like you're planning on closing the PR in #47370 (comment), so I'm going to go ahead and do so now and then you can open a PR for future work.

Feel free to continue any discussions, take them to internals etc! If you want to reopen because you're actually not finished with this, ping me and I'll do so.

varkor · 2018-01-17T23:53:24Z

@aidanhs: that's fine — I was going to close it soon myself!

oberien · 2018-01-18T09:25:36Z

@varkor I definitely think that you should create an RFC for these suggestions. Together with a solution for #47082 , there are several performance benefits to be had. While the exact behaviour is up for discussion, an RFC regarding optimizing adapters on infinite iterators could unveil more edge-cases and further ideas for optimizations.
If you create an RFC, please cc me :)

Add finite iterator specialisations for Repeat

6710669

rust-highfive assigned alexcrichton Jan 12, 2018

varkor mentioned this pull request Jan 12, 2018

Add min specialisation for RangeFrom #47362

Closed

Add finite iterator specialisations for Cycle

d6e2867

varkor force-pushed the iter-repeat-cycle branch from 29ba2f2 to d6e2867 Compare January 12, 2018 01:33

scottmcm reviewed Jan 12, 2018

View reviewed changes

Fix issues with all

71322b1

oberien reviewed Jan 12, 2018

View reviewed changes

shepmaster added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 13, 2018

shepmaster mentioned this pull request Jan 13, 2018

Add UnboundedIterator Trait #47082

Closed

aidanhs closed this Jan 17, 2018

varkor deleted the iter-repeat-cycle branch January 18, 2018 00:08

varkor mentioned this pull request Jan 18, 2018

Add Iterator specialisation for Repeat::nth #47533

Closed

varkor mentioned this pull request Jan 19, 2018

Document the behaviour of infinite iterators on potentially-computable methods #47547

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add iterator specialisations for Repeat and Cycle #47370

Add iterator specialisations for Repeat and Cycle #47370

varkor commented Jan 12, 2018

sfackler commented Jan 12, 2018

varkor commented Jan 12, 2018 •

edited

Loading

scottmcm left a comment

scottmcm Jan 12, 2018

varkor Jan 12, 2018

scottmcm Jan 12, 2018

varkor Jan 12, 2018

scottmcm Jan 12, 2018

sfackler commented Jan 12, 2018

hanna-kruppe commented Jan 12, 2018

varkor commented Jan 12, 2018

hanna-kruppe commented Jan 12, 2018 •

edited

Loading

oberien Jan 12, 2018

varkor Jan 12, 2018

oberien Jan 12, 2018

varkor Jan 12, 2018

oberien commented Jan 12, 2018

varkor commented Jan 12, 2018

bluss commented Jan 12, 2018 •

edited

Loading

varkor commented Jan 12, 2018 •

edited

Loading

varkor commented Jan 13, 2018 •

edited

Loading

hanna-kruppe commented Jan 13, 2018

qnighy commented Jan 13, 2018

varkor commented Jan 13, 2018

qnighy commented Jan 13, 2018

aidanhs commented Jan 17, 2018

varkor commented Jan 17, 2018

oberien commented Jan 18, 2018 •

edited

Loading

Add iterator specialisations for Repeat and Cycle #47370

Add iterator specialisations for Repeat and Cycle #47370

Conversation

varkor commented Jan 12, 2018

sfackler commented Jan 12, 2018

varkor commented Jan 12, 2018 • edited Loading

scottmcm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfackler commented Jan 12, 2018

hanna-kruppe commented Jan 12, 2018

varkor commented Jan 12, 2018

hanna-kruppe commented Jan 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oberien commented Jan 12, 2018

varkor commented Jan 12, 2018

bluss commented Jan 12, 2018 • edited Loading

varkor commented Jan 12, 2018 • edited Loading

varkor commented Jan 13, 2018 • edited Loading

hanna-kruppe commented Jan 13, 2018

qnighy commented Jan 13, 2018

varkor commented Jan 13, 2018

qnighy commented Jan 13, 2018

aidanhs commented Jan 17, 2018

varkor commented Jan 17, 2018

oberien commented Jan 18, 2018 • edited Loading

varkor commented Jan 12, 2018 •

edited

Loading

hanna-kruppe commented Jan 12, 2018 •

edited

Loading

bluss commented Jan 12, 2018 •

edited

Loading

varkor commented Jan 12, 2018 •

edited

Loading

varkor commented Jan 13, 2018 •

edited

Loading

oberien commented Jan 18, 2018 •

edited

Loading