Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Iterator::flatten #48213

Closed
Centril opened this issue Feb 14, 2018 · 35 comments · Fixed by #51969
Closed

Tracking issue for Iterator::flatten #48213

Centril opened this issue Feb 14, 2018 · 35 comments · Fixed by #51969
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Centril
Copy link
Contributor

Centril commented Feb 14, 2018

Tracking issue for #48115, Iterator::flatten and FlatMap.

@Centril Centril added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Feb 14, 2018
@leonardo-m
Copy link

A flatten could be useful, but I think there are more important/useful functions/iterators to add before a flatten.

@clarfonthey
Copy link
Contributor

@leonardo-m Clearly not important enough for someone to write a PR/RFC before this one, though. ;)

@clarfonthey
Copy link
Contributor

clarfonthey commented Feb 15, 2018

Also I can't list the number of times I've done .flat_map(|x| x), so, I'm glad this was implemented.

One thing that I think should be clarified is that this only flattens one level, i.e. it will not transform [[[1, 2], [3, 4]], [[5, 6], [7, 8]]] into [1, 2, 3, 4, 5, 6, 7, 8] but [[1, 2], [3, 4], [5, 6], [7, 8]].

@Centril
Copy link
Contributor Author

Centril commented Feb 15, 2018

@leonardo-m I don't think it has to be an either-or proposition - if there are more useful additions, and there certainly are some from Itertools, they should be made irrespective of this one. Someone just has to write the PRs for those =)

Regarding .flatten() specifically, I believe it gives us easier-to-express specialization opportunities than .flat_map() does such as, flattening a vec::IntoIter<Vec<_>> with a single allocation.
Also, it was mentioned explicitly that neither of .flat_map(|x| x) nor .flat_map(identity) were rather nice and that .flatten() was clearer at rust-lang/rfcs#2306 (comment), rust-lang/rfcs#2306 (comment), and rust-lang/rfcs#2306 (review).

@clarcharr hmm, it does say in the docs that:

and you want to remove one level of indirection.

Perhaps that's not clear enough? Can we fix this during stabilization perhaps if it is not?

@clarfonthey
Copy link
Contributor

@Centril I mentioned that having not actually read the docs yet. ;)

I honestly think that including an example would be clearer; there are tools for languages such as JavaScript, for example, which will completely flatten an array regardless of how many levels it has. Although such a thing would be very difficult and/or impossible to do with Rust's type system, I think it'd be best for it to be clear that this is not a "deep" flatten, just a shallow one.

@Centril
Copy link
Contributor Author

Centril commented Feb 15, 2018

@clarcharr Updated the docs with your example.

@clarfonthey
Copy link
Contributor

Awesome! Thanks

@yrashk
Copy link

yrashk commented Feb 26, 2018

Not sure, how important this angle is, but just in case -- it looks like this introduces a failure for nightly builds on code that relies on itertools' version of flatten:

https://it.sit-it.org/issue/fe5e68e5-22a1-4bc3-8ebf-36586460ba27
https://travis-ci.org/sit-it/sit/jobs/346132639#L1129

It's a trivial fix for the code that uses it (the patch is in the first link) but I thought it is worth mentioning this aspect.

yrashk added a commit to sit-fyi/sit that referenced this issue Feb 26, 2018
```
   --> sit-web/src/webapp.rs:202:32
       |
   202 |             let record = match sit_core::repository::IssueRecordIter::flatten(issue.record_iter().unwrap()).find(|r| r.encoded_hash() == record) {
       |                                                                       ^^^^^^^ multiple `flatten` found
       |
       = note: candidate #1 is defined in an impl of the trait `std::iter::Iterator` for the type `sit_core::repository::IssueRecordIter<'_>`
       = note: candidate #2 is defined in an impl of the trait `itertools::Itertools` for the type `_`
       = note: candidate #3 is defined in the trait `rayon::iter::ParallelIterator`
```

This is, perhaps, related to the addition of the [`iterator_flatten`
nightly feature](rust-lang/rust#48213).

Solution: use itertools' version explicitly
@eira-fransham
Copy link

I don't think arbitrary-level flattening is impossible with Rust's type system, it's just that it would rely on specialisation and could produce confusing results.

For example, [[Some(1), None], [None, None]].deep_flatten() would produce [1], obvious to a Rust veteran but maybe unintuitive for a newbie, plus implementing Iterator for a type would massively change the meaning of the expression.

Also, this would cause an infinite loop in the compiler when resolving the output type of deep_flatten:

struct Foo;

impl Iterator for Foo {
    type Item = Foo;

    fn next(&mut self) -> Foo {
        unimplemented!()
    }
}

JS is dynamically typed and so deep flattening is useful since you don't know the nesting level until runtime. With Rust you know ahead of time precisely how many levels deep an iterator will be nested.

@SoniEx2
Copy link
Contributor

SoniEx2 commented Feb 28, 2018

@Centril
Copy link
Contributor Author

Centril commented Feb 28, 2018

@SoniEx2 I think the documentation has a sufficient number of examples, more would be too many.

@leonardo-m
Copy link

@leonardo-m I don't think it has to be an either-or proposition - if there are more useful additions, and there certainly are some from Itertools, they should be made irrespective of this one. Someone just has to write the PRs for those =)

I don't agree. So far in my whole codebase I have seen zero places where to use a flatten(), on the other hand from the standard library I miss several iterable things already present in itertools. The API of the std library should be kept small. Adding functions is not free, the more things you add, the more time you need to use the API and find the right function. In my opinion the whole business of adding iterators to the Rust std library is currently broken. It should be based on real data. What are the most used iterators of the itertools crate? Add those to the std library.

@SoniEx2
Copy link
Contributor

SoniEx2 commented Feb 28, 2018

I have written scan+flat_map(|x|x) at least 3 times so far. And I don't even use Rust that often yet!

The more I use rust, the more I run into scan+flatten.

See also: rust-lang/rfcs#1978

It's most useful for string parsing, and partial collects.

@Centril
Copy link
Contributor Author

Centril commented Jun 10, 2018

@SimonSapin This has been in nightly for a few months now; could we stabilize?

@kennytm
Copy link
Member

kennytm commented Jun 10, 2018

We may want another (check-only) crater run before stabilizing due to conflict with Itertools::flatten after that.

@clarfonthey
Copy link
Contributor

Perhaps itertools could do what serde does now to detect new Rust versions and have a build script which automatically aliases the builtin flatten on Rust versions which support it? That way it could be released as a semver-compatible change, but the breakages would go away.

@Centril
Copy link
Contributor Author

Centril commented Jun 11, 2018

In either case, whether it is possible or not, I don't think a nested flatten operation should be called .flatten() since it would lead to surprises. Given that .map corresponds to functor mapping and that .flat_map corresponds to monadic bind, then .flatten should correspond to monadic join.

@reuvenpo
Copy link

I've been thinking about this issue for a while, and i came up with something, but sadly i don't currently have the time to try a full implementation, so I'll write down a sketch of it, maybe it makes some sense (If it does I'll invest a few hours into learning how to properly contribute over the weekend).

Basically, have a trait Nested with a method deep_flatten<I: Iterator>(self) -> I (or similar, i hope you get the idea) which returns the final iterator.
This trait can be implemented for all iterators to just return self. That is, flattening a "flat" iterator does nothing.
For any iterator of iterators (exact type signature missing for lack of author's experience), as a specialization, this trait can be implemented to return an iterator that collects its items from calling deep_flatten on its sub-iterators.
If i understand the type system correctly, this should be equivalent to the "type-level recursion with a stop condition" discussed earlier.

@Centril
Copy link
Contributor Author

Centril commented Jun 11, 2018

I did some experimentation: https://play.rust-lang.org/?gist=1bbdca273415c488e2f058d39fe995b2&version=nightly&mode=debug

@SimonSapin
Copy link
Contributor

@reuvenpo That sounds maybe plausible with specialization of a trait that has an associated type (for the returned iterator). But the design of the trait specialization feature itself in the language is still evolving, so it might be a long time before "deep flatten" is possible at all.

In the mean time, the existing Iterator::flatten can be stabilized very soon. I feel that the potential of a similar-but-different feature to maybe be added later is not enough to block stabilization now with the shorter name.

bors added a commit that referenced this issue Jun 11, 2018
…<try>

Stabilize Iterator::flatten in 1.29, fixes #48213.

This PR stabilizes [`Iterator::flatten`](https://doc.rust-lang.org/nightly/std/iter/trait.Iterator.html#method.flatten) in *version 1.29* (1.28 goes to beta in 10 days, I don't think there's enough time to land it in that time, but let's see...).

Tracking issue is:  #48213.

cc @bluss re. itertools.
r? @SimonSapin
ping @pietroalbini -- let's do a crater run when this passes CI :)
@reuvenpo
Copy link

reuvenpo commented Jun 12, 2018

Would it be appropriate to open a tracking issue for Iterator::unnest (name taken from @Centril 's earlier gist) to discuss possible implementations of this idea?

Or should an RFC be made first?

@eira-fransham
Copy link

eira-fransham commented Jun 13, 2018

Generally it's considered bad practice to have the behaviour rely on specialisation (rather than specialisation being an optimisation) because it means that Box::new(some_val) as Box<Trait> can act differently to just some_val depending on the circumstance.

In JavaScript the unnest method would be useful since you don't know until runtime the level of nesting (and different elements can nest different depths). Essentially, unnest is a tree-flattening operation in a dynamically-typed language. In a strongly-, strictly-typed language like Rust there is very little use for a method like unnest because the tree depth is known at compile-time. I don't mean to say this in a rude way, but I can't think of a place where this would be useful. Of course, if you can find an example of where this would be useful then I would like to see it.

@reuvenpo
Copy link

reuvenpo commented Jun 16, 2018

@Vurich, i'll start by addressing the second part of your message, and then elaborate on it, hopefully answering the first part too.

It's true that in dynamically typed languages this is useful because both the author and the language may not know the depth of nesting for a given object, but it doesn't necessarily mean that in a statically typed language the author would like to care about the level of nesting of a given object, in some contexts.

My example would be anything resembling a tree structure with unbounded depth (For example a directory structure). It may be desirable to have a function generalizes over Nested<String> (A trait) which provides a method unnest returning an iterator (presumably of type Unnest) over all the leaf-nodes in that tree directly(i.e. the names of plain files) instead of manually unnesting the structure.

In this context, say you had a function that received a structure whose concrete type was something like Vec<Vec<Vec<Vec<i32>>>>(but by some generalization it wasn't necessarily known how many layers of vectors there were), but the function would just like to iterate over the i32s inside. That's impossible to do right now (AFAIK). What if instead this function wanted to iterate over Vec<i32>s, or Vec<Vec<i32>>s, etc?

The type information is all there, it's just impossible to use currently, for these purposes.

It would be best if you couldn't call thing.unnest() on a concrete object, but rather had to specify exactly to which level of unnesting you wanted to reach(e.g. thing.unnest::<String>()), but if you were in a generic function where thing was already bound by a Nested<SomeThing>, then you could just unnest() it.

Under this system, Box::new(some_val) as Box<Trait> wouldn't act that differently differently than some_val. If Trait was Nested<Vec<i32>> but some_val was Vec<Vec<Vec<i32>>> then you couldn't call unnest::<Vec<Vec<i32>>>() on the trait-object anymore, but you could still call unnest::<Vec<i32>>() and unnest::<i32>() on it, plus you can now call unnest() on it, which would be equivalent to calling unnest::<Vec<i32>>(). This behavior would make sense in this context, restricting the object's behavior according to the trait bound, and extending it's behavior slightly by the same bound.

Of course, this is all pretty difficult, if at all possible, to implement today. I certainly don't know the "right" way to do this, if it was possible, I just think this interface makes sense and is doable, even if it still requires some features that aren't yet stabilized, or available at all.

P.S. Obviously for cases where the concrete type is known, one can just call .flatten() the necessary number of times to reach the desired type.

EDIT 1: changed Unnestable to Nested

@SoniEx2
Copy link
Contributor

SoniEx2 commented Jun 16, 2018

"unnestable" is weird for a trait name. it sounds like you can't put it inside stuff.

@reuvenpo
Copy link

reuvenpo commented Jun 16, 2018

The name is just a proposal too. It can be called anything else that makes sense.

(Denestable? Nested? Unnesting? I like Nested better now that i think of it.)

@eira-fransham
Copy link

The design where you specify the iterator type still fails for this:

struct Foo;

impl Iterator for Foo {
    type Item = Foo;

    fn next(&mut self) -> Foo {
        Foo
    }
}

@reuvenpo
Copy link

I'm not sure i understand.
Do you mean iterators of a type that always yield iterators of the same type?

(Is that even a useful pattern?)

@SoniEx2
Copy link
Contributor

SoniEx2 commented Jun 18, 2018

It would (should) do the least amount of work to return the desired result.

You want an Iterator<Item=Foo>? Easy, one operation. You want an Foo? Zero operations. You want Iterator<Item=Iterator<Item=Foo>>? Also mostly easy. Etc.

@reuvenpo
Copy link

I don't see where i would use this pattern (since it prevents Foo from being an iterator over anything else, and this is a weird recursive way of doing what std::iter::repeat(...) does) but i agree that in this case a Foo can be considered as a Nested<Foo>, Nested<Iterator<Item=Foo>>, etc, but trying to .unnest() any of these constructs would result in an infinite recursion when trying to yield the first item, because of the recursive definition of Iterator for Foo

But again... What is the use case? Would in that use case it be useful to treat a Foo as a Nested<Foo>?

@Centril
Copy link
Contributor Author

Centril commented Jun 18, 2018

Can we move discussion on flattening nested stuff to internals.rust-lang.org?

@reuvenpo
Copy link

(I'll have to open an account, but)
Sure

kennytm added a commit to kennytm/rust that referenced this issue Jun 30, 2018
…flatten, r=SimonSapin

Stabilize Iterator::flatten in 1.29, fixes rust-lang#48213.

This PR stabilizes [`Iterator::flatten`](https://doc.rust-lang.org/nightly/std/iter/trait.Iterator.html#method.flatten) in *version 1.29* (1.28 goes to beta in 10 days, I don't think there's enough time to land it in that time, but let's see...).

Tracking issue is:  rust-lang#48213.

cc @bluss re. itertools.
r? @SimonSapin
ping @pietroalbini -- let's do a crater run when this passes CI :)
pietroalbini added a commit to pietroalbini/rust that referenced this issue Jul 1, 2018
…flatten, r=SimonSapin

Stabilize Iterator::flatten in 1.29, fixes rust-lang#48213.

This PR stabilizes [`Iterator::flatten`](https://doc.rust-lang.org/nightly/std/iter/trait.Iterator.html#method.flatten) in *version 1.29* (1.28 goes to beta in 10 days, I don't think there's enough time to land it in that time, but let's see...).

Tracking issue is:  rust-lang#48213.

cc @bluss re. itertools.
r? @SimonSapin
ping @pietroalbini -- let's do a crater run when this passes CI :)
bors added a commit that referenced this issue Jul 1, 2018
Rollup of 7 pull requests

Successful merges:

 - #51511 (Stabilize Iterator::flatten in 1.29, fixes #48213.)
 - #51853 (Fix some doc links)
 - #51890 (Fix inconsequential typo in GlobalAlloc doc example)
 - #51920 (use literal span for concrete type suggestion)
 - #51922 (rename the llvm-tools component to llvm-tools-preview and tweak its image)
 - #51936 (rename rustc's lld to rust-lld)
 - #51961 (Fix typo in /src/librustc_resolve/lib.rs)

Failed merges:

r? @ghost
pietroalbini added a commit to pietroalbini/rust that referenced this issue Jul 1, 2018
…flatten, r=SimonSapin

Stabilize Iterator::flatten in 1.29, fixes rust-lang#48213.

This PR stabilizes [`Iterator::flatten`](https://doc.rust-lang.org/nightly/std/iter/trait.Iterator.html#method.flatten) in *version 1.29* (1.28 goes to beta in 10 days, I don't think there's enough time to land it in that time, but let's see...).

Tracking issue is:  rust-lang#48213.

cc @bluss re. itertools.
r? @SimonSapin
ping @pietroalbini -- let's do a crater run when this passes CI :)
bors added a commit that referenced this issue Jul 1, 2018
Rollup of 7 pull requests

Successful merges:

 - #51511 (Stabilize Iterator::flatten in 1.29, fixes #48213.)
 - #51853 (Fix some doc links)
 - #51890 (Fix inconsequential typo in GlobalAlloc doc example)
 - #51920 (use literal span for concrete type suggestion)
 - #51921 (improve the error message when `#[panic_implementation]` is missing)
 - #51922 (rename the llvm-tools component to llvm-tools-preview and tweak its image)
 - #51961 (Fix typo in /src/librustc_resolve/lib.rs)

Failed merges:

r? @ghost
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
9 participants