Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add iterator specialisations for Repeat and Cycle #47370

Closed
wants to merge 3 commits into from

Conversation

varkor
Copy link
Member

@varkor varkor commented Jan 12, 2018

Repeat and Cycle are infinite iterators that make use of the default iterator implementations, which means that for some methods for which there is a determinable return value, they infinitely loop instead.

This adds specialisations so that the iterators return values as expected. Also optimises nth for Repeat.

r? @alexcrichton

@sfackler
Copy link
Member

Could you clarify the rationale for doing this? Who is asking for the maximum value of an infinite iterator?

@varkor varkor force-pushed the iter-repeat-cycle branch from 29ba2f2 to d6e2867 Compare January 12, 2018 01:33
@varkor
Copy link
Member Author

varkor commented Jan 12, 2018

@sfackler: I think this provides the least surprising behaviour, is semantically correct (if we can determine the result of a method call, I think we should do it, rather than looping) and provides more flexibility for generic functions — it's an edge case, but if it's one the compiler can worry about instead of the user, that seems like a plus.
For Cycle especially this could simplify some code patterns, though for Range it's mostly trying to reduce a potential papercut.

Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this change; it feels very brittle. There are a ton of infinite iterators where this wouldn't be the case, including ones very similar to ones where it applies -- even just .filter(|_| true) stops these updates from working.

@@ -643,6 +643,41 @@ impl<I> Iterator for Cycle<I> where I: Clone + Iterator {
_ => (usize::MAX, None)
}
}

#[inline]
fn all<F>(&mut self, f: F) -> bool where F: FnMut(Self::Item) -> bool { self.orig.clone().all(f) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a new clone of orig instead of using iter is wrong here, since it'll short-circuit in a different place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's true.

#[inline]
fn size_hint(&self) -> (usize, Option<usize>) { (usize::MAX, None) }

#[inline]
fn nth(&mut self, _: usize) -> Option<A> { self.next() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this one more than the others here, but it's an observable change if A::clone has side-effects. Is that allowable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't consider that. It seems to me that A::clone is an implementation detail that shouldn't be relied upon, but as it's technically observable at the moment, and the documentation doesn't specify otherwise, it might be worth clarifying this.

fn nth(&mut self, _: usize) -> Option<A> { self.next() }

#[inline]
fn all<F>(&mut self, f: F) -> bool where F: FnMut(A) -> bool { self.any(f) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this one, since it looks like it can still be infinite. If we want this kind of behavior, wouldn't it be f(self.element.clone())?

@sfackler
Copy link
Member

more flexibility for generic functions

Which generic functions?

@hanna-kruppe
Copy link
Contributor

IIUC some of these specializations make functions that previously looped infinitely return some value. Similar changes were previously proposed in #47169 where I and others argued against them. The arguments against behavioral changes apply to these behavioral changes as well (the ones about RangeFrom::next panicking don't).

@varkor
Copy link
Member Author

varkor commented Jan 12, 2018

Seeing how this change is a little controversial, maybe I should have opened an issue to discuss it first (#47169 discusses some related aspects, but I think the main issues there were not all the same as here as I'll mention below) — but maybe it's helpful to have an example implementation to look at while discussing. (Sorry, slightly long post, but I do really think this behaviour is more sensible and intuitive than the existing behaviour, though I agree it does need to be discussed before making a decision.)
(Including @alexcrichton in response to RangeFrom::min, which is the same issue as this one.)

  • @scottmcm: Regarding brittleness — this current approach right now is incomplete in terms of the brittleness, yes. This change only overrides methods that don't return a new iterator (like filter, which currently still causing infinite looping). If this change is approved, then I think the other methods should also be altered to make sure that calling derivatives of Repeat or Cycle also don't loop infinitely.
  • Regarding brittleness with other infinite iterators: the current behaviour "i.e. just loop if the iterator is unbounded" is an implementation detail, which I don't expect anyone (/ can't believe) to be relying on. It's true that there are cases where we cannot statically determine whether we can return the correct value without looping through every element, but for the cases where we can (and it is obvious to the user that it is possible — like for Repeat and Cycle), I think it totally makes sense to. The standard library sets a precedent in this respect, so if it ensures sensible answers are returned, it's more likely other library authors will do as well. This seems most intuitive to me, but perhaps it could be noted in the documentation for relevant infinite iterators.
  • I would argue that there is precedent in the case of methods like take — the documentation does not state that calling certain methods on infinite iterators is guaranteed to loop. take does not need to (for any infinite iterator), and neither does min. Granted, take should always halt — but this is not even necessarily the case — the implementation of take for a certain iterator could be specialised to just loop. Iterator makes no guarantees of halting at the moment, so I think there's no reason this behaviour shouldn't be changed.
  • @rkruppe: There were two arguments against changing RangeFrom::max — one (which was the main issue as far as I understood) was the semantics of the return value None changing, which doesn't apply here. The other was us not being able to provide these guarantees for every iterator. Frankly, I think the documentation here just doesn't exist. In the case of min on Repeat, for instance, there is only one sensible return value. I think regardless of what happens in this case, we should make this clear. The documentation could say: "In the case of unbounded iterators, the following functions never return: [...]" (which I argue is only motivated by the default implementation of Iterator, and is a side-effect of the implementation, which should have already been changed), or, for example, "In the case of unbounded iterators, these methods may loop unless the result is determinable without exhausting the iterator." (and then the documentation for specialised iterators could specify which are guaranteed to return). I agree that RangeFrom::max() == None is unviable, but the results returned by these changes are definitely correct: the issue is that they just change existing behaviour. I think apart from the case of explicit loops, a change from "non-halting → halting" should almost always be non-breaking.
  • @sfackler: Sorry, I was perhaps a bit cheeky in my inclusion of generic functions: I simply meant any function taking an iterator making use of these methods (max, any, etc.) would now be callable on Repeat- or Cycle-derived iterators, which they are not practically at the moment. (Of course, to make this completely straightforward, as @scottmcm points out, we would want the other derived methods to have the same finite behaviour, which I will be happy to add in a follow-up PR.) I think it's more likely that someone, for example, gets a Cycle and wants to evaluate some property of it (like all), which is currently non-trivial (maybe impossible?) at the moment, than gets confused by the nice behaviour of Cycle and thinks "Oh, Rust must be able to evaluate any infinite iterator method in finite time." Again, documentation would help here, and I'm happy to add it.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jan 12, 2018

I believe there is a lot of value in the default implementations being "canonical" and all specializations implementing the exact same behavior, just with different performance.1 (Third party iterators can always screw this up, of course.) It means that programmers don't need to know at all whether a specific iterator overrides some method, not even for reasoning about termination or rare and obscure side effects. It would be extremely surprising and counter-intuitive if, for example, adding a filter to an iterator chain changes a program from working to looping infinitely. Similarly, it's very unlikely that the number of Clone::clone matters for program behavior, but in case it does, I really don't want to be the poor shmuck who has to track down that this is due to an "optimization" buried deep in the standard library. Edit: I think this may be because users, even advanced ones, don't usually think of an iterator as a big pile of individual combinators with their own unique quirks — it's just "an iterator chain", period.

In short, as I already expressed in #47169, I believe it's an attractive nuisance to try and "Do What I Mean" about infinite iterator some of the time, because then the many more complex cases where we can't DWIM will be very confusing.

1 This means I would actually be fine with overriding Repeat::max with loop { self.item.clone(); } 🤔 Although there's not much motivation to do that manually, LLVM optimizations should be able to do it.


#[inline]
fn all<F>(&mut self, f: F) -> bool where F: FnMut(Self::Item) -> bool {
self.iter.clone().chain(self.orig.clone()).all(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this short-circuits with an element inside self.orig, this iterator should continue with the next element after the matching one in self.orig. Currently it'll instead restart with the first element in self.orig.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a good point. I'll fix that if the decision is to merge.


#[inline]
fn min(self) -> Option<Self::Item> where Self::Item: cmp::Ord {
cmp::min(self.iter.min(), self.orig.clone().min())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.iter is a subset of self.orig, so the minimum of self.orig will be at least as small as the minimum of self.iter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah — this is necessary to preserve the consistency of the behaviour of min when multiple elements are equally minimum. It's necessary to return the first/leftmost one, which in this case may not be the same as the first one in self.orig.

@oberien
Copy link
Contributor

oberien commented Jan 12, 2018

Could similar optimizations be performed for RangeFrom as another unbounded iterator?

@varkor
Copy link
Member Author

varkor commented Jan 12, 2018

@oberien: There's a separate PR for RangeFrom, but as the same discussion applies, we're limiting discussion to this thread so it's in one place.

@bluss
Copy link
Member

bluss commented Jan 12, 2018

👎 on the general idea of replacing an unbounded sequence with a shortcut that is surprising and possibly quickly disappears when the iterator is part of a composition with adaptors.

In particular, I think this breaks equivalence between .for_each() and .all().

(0..10).cycle().all(f) is a loop similar to the equivalent for_each ; and the user can call it and let it run as long as they want. They can break out of it by panic or other means. (Strawman use case, run (0..10).cycle().all(f) as an unbounded loop until the non-pure function f fails in some way and returns false.)

Anything you improve in all should be applied to at least any, but also for_each, try_fold etc as well, and I don't think it can be extended reasonably to those, for the reason already given.

@varkor
Copy link
Member Author

varkor commented Jan 12, 2018

Okay, I'm inclined to agree that all isn't a good choice for this optimisation, as it currently can be used like for_each, as @bluss says.

I still think the arguments for max and min hold, as they don't take a user-defined function. max_by_key and max_by are less certain, because although you can use them to modify state, I can't see an alternative use case for these (akin to any as a short-circuiting for_each). Whether max should be specialised if the other max_* functions aren't: I'm less sure. For consistency, "no" seems like a reasonable answer; though I still prefer preventing an infinite loop if it preserves all other semantics (unlike function-taking methods).

Edit: Without defining these specialisations for all relevant methods, it gets messy with chaining. Maybe it is too late to make a change to the semantics now, without introducing explicit new traits.

@shepmaster shepmaster added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 13, 2018
@varkor
Copy link
Member Author

varkor commented Jan 13, 2018

It seems there's a general consensus about the behaviour of methods on infinite iterators, so I'll go ahead and close the PR. I think a documentation tweak is in order to clarify these details though, as following these discussions I don't think there's a single behaviour that is completely obvious.

Thanks for all your input everyone!

One last matter I wanted to clear up regarding nth for Repeat, which didn't really fall into the same category as the others, as it doesn't modify termination behaviour: I strongly feel this should be a permitted optimisation. I think we should be very opposed to encouraging the possibility of relying upon the internal behaviour of a method unless it is really necessary (discouraging information hiding is really unpleasant, both from a language design POV and for anyone working on the standard library). I really feel not guaranteeing the number of times clone will be called for a value is important.

Does anyone feel there are issues with even nth? (Maybe some obscure, but actually very useful, reasons for observing clone in this way?)

@hanna-kruppe
Copy link
Contributor

Not quite what you're talking about, but we do take the liberty to omit clone calls for things that are Copy: https://github.com/rust-lang/rfcs/blob/master/text/1521-copy-clone-semantics.md

However, the justification for that is stronger than "Clone should not have side effects period".

@qnighy
Copy link
Contributor

qnighy commented Jan 13, 2018

There's at least an effort to maintain the clone order for Zip specializations.

@varkor
Copy link
Member Author

varkor commented Jan 13, 2018

@rkruppe: That does seem to be in the same vein. Arguably, it makes more sense to have to justify why calling clone may or may not perform other clones than calling nth may perform clones (or any other method).

@qnighy: Sorry, I don't quite see what you're referring to there. Isn't that an optimisation for Copy types, rather than explicitly defining Clone semantics?

@qnighy
Copy link
Contributor

qnighy commented Jan 13, 2018

@varkor Sorry for less explanation. The TrustedRandomAccess trait is currently only used in Zip to generate an efficient code. There TrustedRandomAccess::may_have_side_effect is used to adjust the number of iterator calls only when the iterator has a side effect. For Cloned example, this is used to ensure e.g.

struct Foo;

impl Clone for Foo {
    fn clone(&self) -> Self {
        println!("Foo::clone()");
        Foo
    }
}

fn main() {
    [Foo, Foo].iter().cloned().zip([0].into_iter()).collect::<Vec<_>>();
}

produces two calls to Foo::clone().

Back to TrustedRandomAccess for Cloned impl, may_have_side_effect is true if it is Clone but not Copy, regardless of the inner iterator. So this implementation is conservative with respect to the number of clones.

Of course, it doesn't necessarily mean that there is a general consensus about the number of clone calls. It's just an example of existing conservative implementation.

@aidanhs
Copy link
Member

aidanhs commented Jan 17, 2018

@varkor it sounds like you're planning on closing the PR in #47370 (comment), so I'm going to go ahead and do so now and then you can open a PR for future work.

Feel free to continue any discussions, take them to internals etc! If you want to reopen because you're actually not finished with this, ping me and I'll do so.

@aidanhs aidanhs closed this Jan 17, 2018
@varkor
Copy link
Member Author

varkor commented Jan 17, 2018

@aidanhs: that's fine — I was going to close it soon myself!

@varkor varkor deleted the iter-repeat-cycle branch January 18, 2018 00:08
@oberien
Copy link
Contributor

oberien commented Jan 18, 2018

@varkor I definitely think that you should create an RFC for these suggestions. Together with a solution for #47082 , there are several performance benefits to be had. While the exact behaviour is up for discussion, an RFC regarding optimizing adapters on infinite iterators could unveil more edge-cases and further ideas for optimizations.
If you create an RFC, please cc me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants