Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add impl to collect an iterator of Result<T,E> into (Collection<T>, Collection<E>) #87047

Closed

Conversation

Svetlitski
Copy link

Partitioning an iterator of Results into two collections containing the unwrapped values of the Ok and Err cases respectively requires a significant amount of boilerplate, as noted in Rust By Example.

The feature introduced by this commit provides a more ergonomic method of accomplishing this via a single call to Iterator::collect. As a concrete example, this commit enables the following:

let results = [Ok(1), Err(false), Ok(3), Ok(4), Err(true)];
let (oks, errs) = IntoIterator::into_iter(results).collect::<(Vec<_>, Vec<bool>)>();
assert_eq!(oks, [1, 3, 4]);
assert_eq!(errs, [false, true]);

…ollection<E>)

Partitioning an iterator of Results into two collections containing
the unwrapped values of the Ok and Err cases respectively requires a
significant amount of boilerplate [1]. The feature introduced by this
commit provides a more ergonomic method of accomplishing this via a
single call to Iterator::collect.

[1]: https://github.com/rust-lang/rust-by-example/blob/9a81505a26f199e7d8b0f21c6cd56a4211c2e6b1/src/error/iter_result.md#collect-all-valid-values-and-failures-with-partition
@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 10, 2021
@@ -1626,6 +1626,25 @@ impl<A, E, V: FromIterator<A>> FromIterator<Result<A, E>> for Result<V, E> {
}
}

//FIXME: Figure out how to mark this as unstable without x.py refusing to run any tests
#[stable(feature = "tuple_from_result_iter", since = "1.53.0")]
Copy link
Author

@Svetlitski Svetlitski Jul 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This obviously should be marked as unstable, but every time I did so, it would cause x.py to refuse any attempt I made to run the tests, instead exiting with the message:

error: an `#[unstable]` annotation here has no effect
    --> library/core/src/result.rs:1630:1
     |
1630 | #[unstable(feature = "tuple_from_result_iter", issue = "87047")]
     | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     |
     = note: `#[deny(ineffective_unstable_trait_impl)]` on by default
     = note: see issue #55436 <https://github.com/rust-lang/rust/issues/55436> for more information

Thus I temporarily marked it as stable as a kludge to get the tests to run. I realize there is probably just a mistake I'm making as a result of this being my first contribution here, so I'd appreciate some help resolving this small detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impls can't be unstable, see the linked issue. They're always insta-stable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I read the issue before but was confused because there is an impl just a few lines below where I inserted my code (line 1649, impl<T, E> ops::TryV2 for Result<T, E>) which is marked as unstable, but upon re-reading the issue I realize the difference is that there the trait itself is unstable.

@jyn514
Copy link
Member

jyn514 commented Jul 11, 2021

Doing this through collect() seems very confusing to me, I would be very surprised to see that in code. Could you do this through a new function instead maybe, that's only implemented on Iterator<Item = Result<T, E>? That would also let you make it unstable to start.

@Svetlitski
Copy link
Author

Svetlitski commented Jul 11, 2021

Doing this through collect() seems very confusing to me, I would be very surprised to see that in code. Could you do this through a new function instead maybe, that's only implemented on Iterator<Item = Result<T, E>? That would also let you make it unstable to start.

The motivation behind using collect() was to be logically consistent with the FromIterator implementation on Result and its corresponding usage via collect(), since it fulfills a very similar role, with the only significant difference being that it short-circuits upon encountering the first Err instead of collecting all of them. I'd be happy to get more opinions on this though.

@jyn514 jyn514 added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jul 11, 2021
@marmeladema
Copy link
Contributor

Don't have a strong opinion about whether to re-use collect or introduce a new method for this but I definitely needed the functionality in the past.

{
fn from_iter<I: IntoIterator<Item = Result<T, E>>>(iter: I) -> (U, V) {
let (mut oks, mut errs) = (U::default(), V::default());
for result in iter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this use for_each?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, why did this got a negative feedback? Internal iteration should always be either as fast as or faster than external iteration, so you should always try to use it if you can, hence why I would have used for_each instead of a normal for loop here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal iteration should always be either as fast as or faster than external iteration, so you should always try to use it if you can

I disagree with this. for_each introduces a lot more syntactic noise and does the same thing as a normal for loop. If this happens to be super slow for some reason it seems fine to use for_each, but I don't think it should be the default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you didn't mention at first that this was for performance reasons, so I thought it was just personal preference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this happens to be super slow for some reason it seems fine to use for_each,

It's already pretty slow for iterators like RangeInclusive and VecDeque, and probably many iterator adapters that need to do additional checks in each iteration.

but I don't think it should be the default.

But isn't it already the default in pretty much all the stdlib? Iterator's utility methods and any implementation of FromIterator, Extend, Sum and Product all use internal iteration methods. The only occurences of plain for loops I can find are where performance doesn't matter (for example tests) or when the iterator doesn't gain anything with internal iteration (for example exclusive ranges, slices and vecs).

Also, you didn't mention at first that this was for performance reasons, so I thought it was just personal preference.

Yeah my bad for that one, I was assuming that the difference between for and for_each is only performance.

@the8472
Copy link
Member

the8472 commented Jul 12, 2021

Note that we also have the unstable partition_in_place which could perform this task with fewer allocations. First you'd partition in place, then move the head and tail into separate vecs.

@joshtriplett
Copy link
Member

Since this is by necessity insta-stable, it'll need team consensus. Nominating for the libs-api team to consider. (I don't personally want to PFCP here.)

@camsteffen
Copy link
Contributor

It will be tempting to use this in the general case where I want to divide items into two groups.

let (evens, odds) = (0..10).map(|n| if n % 2 == 0 { Ok(n) } else { Err(n) }).collect();

But this is an abuse of Result. I fear that such misuse cases would be more commonly used in practice than the legitimate case of collecting actual errors. It's unusual to not fast fail on errors and also not care about where errors occurred in a sequence.

Another possibility is Output=Result<Collection<V>, Collection<E>>, FWIW.

@marmeladema
Copy link
Contributor

Don't have a strong opinion about whether to re-use collect or introduce a new method for this but I definitely needed the functionality in the past.

My (simplified) use-case was about reading a file line by line and parsing each of them. Parsing can fail and returns a Result<T,E> but for reporting/monitoring purposes, I need to know all the lines which have failed. There are obviously different ways of implementing this but I ended with something similar to this PR but specialized for Vec<Result<T, E>>.

@Svetlitski
Copy link
Author

It will be tempting to use this in the general case where I want to divide items into two groups.

let (evens, odds) = (0..10).map(|n| if n % 2 == 0 { Ok(n) } else { Err(n) }).collect();

But this is an abuse of Result. I fear that such misuse cases would be more commonly used in practice than the legitimate case of collecting actual errors. It's unusual to not fast fail on errors and also not care about where errors occurred in a sequence.

Another possibility is Output=Result<Collection<V>, Collection<E>>, FWIW.

I don't understand why anyone would abuse the new impl like this when you could achieve the same result more directly and with strictly less code by using Iterator::partition.

@camsteffen
Copy link
Contributor

I don't understand why anyone would abuse the new impl like this when you could achieve the same result more directly and with strictly less code by using Iterator::partition.

Good point. But there are also cases where partition does not work - if map includes some additional transformation of the Ok or Err value.

@jyn514
Copy link
Member

jyn514 commented Jul 13, 2021

Another possibility is Output=Result<Collection, Collection>, FWIW.

I don't see how this is possible without discarding items. Results can only be Ok or Err, not both.

@joshtriplett
Copy link
Member

joshtriplett commented Jul 14, 2021

We discussed this in today's @rust-lang/libs-api meeting. We'd want to have some very concrete use cases (not just an example from rust-by-example) that would be made substantially better by this method, and even then we're not sure it needs to be in std rather than itertools. (itertools already has a method partition_result for this.) Can we get an example from real-world code that doesn't just short-circuit on the first error?

Also, we were concerned about making this an implementation of Extend, because uses of collect can potentially be surprising. In particular, someone could have a collection of objects for which they forgot the ? to process Result, and then use .collect(), and end up with a confusing type.

@joshtriplett joshtriplett added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed I-nominated S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 14, 2021
@scottmcm
Copy link
Member

The general version of this is https://docs.rs/itertools/*/itertools/trait.Itertools.html#method.partition_map, which I'd rather have than anything result-specific, personally. Straight-up partition is rarely what I want, because it fails the "parse don't validate" philosophy.

The existing FromIterator impls on Result are also notoriously non-discoverable in practice, so I'm skeptical of adding more.

@scottmcm scottmcm added the needs-fcp This change is insta-stable, so needs a completed FCP to proceed. label Jul 14, 2021
@Svetlitski
Copy link
Author

Svetlitski commented Jul 14, 2021

@joshtriplett Here are a couple examples to get started with, I'll dig around further later today to find more.

Edit: Added another example (from TedDriggs/darling)

From BurntSushi/walkdir (in context):

Original

pub fn run_recursive<I>(&self, it: I) -> RecursiveResults
where
    I: IntoIterator<Item = result::Result<DirEntry, Error>>,
{
    let mut results = RecursiveResults { ents: vec![], errs: vec![] };
    for result in it {
        match result {
            Ok(ent) => results.ents.push(ent),
            Err(err) => results.errs.push(err),
        }
    }
    results
}

Rewritten

pub fn run_recursive<I>(&self, it: I) -> RecursiveResults
where
    I: IntoIterator<Item = result::Result<DirEntry, Error>>,
{
    let (ents, errs): (Vec<_>, Vec<_>) = it.collect();
    RecursiveResults { ents, errs }
}

From rust-lang/cargo (in context):

Original

let mut result = Vec::new();
for target in toml_targets {
    let path = target_path(
        &target,
        inferred,
        target_kind,
        package_root,
        edition,
        legacy_path,
    );
    let path = match path {
        Ok(path) => path,
        Err(e) => {
            errors.push(e);
            continue;
        }
    };
    result.push((path, target));
}
Ok(result)

Rewritten

This admittedly introduces additional allocation in this instance, but it's still a good example IMO.

let (result, errs): (Vec<_>, Vec<_>) = toml_targets
    .into_iter()
    .map(|target| {
        target_path(
            &target,
            inferred,
            target_kind,
            package_root,
            edition,
            legacy_path,
        )
        .map(|path| (path, target))
    })
    .collect();
errors.extend(errs);
Ok(result)

From TedDriggs/darling (in context)

Original

let mut items = Vec::with_capacity(data.variants.len());
let mut errors = Vec::new();
for v_result in data.variants.iter().map(FromVariant::from_variant) {
    match v_result {
        Ok(val) => items.push(val),
        Err(err) => errors.push(err),
    }
}

Rewritten

let (items, errors): (Vec<_>, Vec<_>) = data.variants.iter().map(FromVariant::from_variant).collect();

@scottmcm
Copy link
Member

This admittedly introduces additional allocation in this instance

This makes me think that this would be more useful as a lazy adapter, rather than a terminal like collect or partition.

Inspired by ResultShunt, it could be something like

let results = [Ok(1), Err(false), Ok(3), Ok(4), Err(true)];
let mut errs = vec![];
let it = IntoIterator::into_iter(results).error_collector(&mut errs);
let oks = it.collect::<Vec<_>>();
assert_eq!(oks, [1, 3, 4]);
assert_eq!(errs, [false, true]);

PoC implementation: https://play.rust-lang.org/?version=nightly&edition=2018&gist=819dd12076a2a7add7cfd8733b8b03bd

@Svetlitski
Copy link
Author

Svetlitski commented Jul 15, 2021

This admittedly introduces additional allocation in this instance

This makes me think that this would be more useful as a lazy adapter, rather than a terminal like collect or partition.

Inspired by ResultShunt, it could be something like

let results = [Ok(1), Err(false), Ok(3), Ok(4), Err(true)];
let mut errs = vec![];
let it = IntoIterator::into_iter(results).error_collector(&mut errs);
let oks = it.collect::<Vec<_>>();
assert_eq!(oks, [1, 3, 4]);
assert_eq!(errs, [false, true]);

PoC implementation: https://play.rust-lang.org/?version=nightly&edition=2018&gist=819dd12076a2a7add7cfd8733b8b03bd

This is really nice. While it's less succinct for the use-case of purely wanting to split the results into two collections, it seems to be more widely applicable. In my hunt for further examples, I've seen quite a lot of the pattern where self maintains a vector onto which any errors encountered during execution are .push'ed. The pattern is particularly pervasive in contexts where parsing (or something similar) is the primary objective (which makes a lot of sense).

@Mathspy
Copy link

Mathspy commented Jul 15, 2021

This makes me think that this would be more useful as a lazy adapter, rather than a terminal like collect or partition.

Inspired by ResultShunt, it could be something like

let results = [Ok(1), Err(false), Ok(3), Ok(4), Err(true)];
let mut errs = vec![];
let it = IntoIterator::into_iter(results).error_collector(&mut errs);
let oks = it.collect::<Vec<_>>();
assert_eq!(oks, [1, 3, 4]);
assert_eq!(errs, [false, true]);

PoC implementation: https://play.rust-lang.org/?version=nightly&edition=2018&gist=819dd12076a2a7add7cfd8733b8b03bd

This is actually amazing and exactly what I have been after for months if not years. Dealing with Iterators of Results have always been quite a bit of a pain and all I really ever wanted has always been “continue iteration over the Oks and short-circuit the Errs, preferably without the extra allocation from doing an intermediate collection” and this is so perfect for that. THANKS ❤️ ❤️

@camsteffen
Copy link
Contributor

FWIW you can do

let oks = iter
    .flat_map(|r| r.map_err(|e| errs.push(e)))
    .collect();

Another possible extension is Iterator::on_error.

let oks = iter
    .on_error(|e| errs.push(e))
    .collect();

@Svetlitski
Copy link
Author

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 19, 2021
@rustbot
Copy link
Collaborator

rustbot commented Jul 19, 2021

@scottmcm

@scottmcm
Copy link
Member

Since this is impls, I'll assign this over the a libs-api member who has commented here:

r? @joshtriplett

@camelid camelid added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 15, 2021
@joshtriplett
Copy link
Member

I'm not sure why this was marked as ready. Several comments suggested an alternate implementation, involving a "shunt" for Err values and then a collect that only gets the Ok values. That approach seems to have garnered more support. It doesn't look like this PR takes that approach; the PR appears unchanged since before that discussion.

@joshtriplett joshtriplett added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 4, 2021
@camelid camelid added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 5, 2021
@camelid camelid added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 10, 2021
@Dylan-DPC
Copy link
Member

@Svetlitski any updates on this?

scottmcm added a commit to scottmcm/rust-by-example that referenced this pull request Feb 25, 2022
Inspired by the comments from people looking for something like this in <rust-lang/rust#87047 (comment)>.
@Dylan-DPC
Copy link
Member

Closing this as it is inactive and needs a lot of further work

@Dylan-DPC Dylan-DPC closed this Jun 14, 2022
@Dylan-DPC Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-fcp This change is insta-stable, so needs a completed FCP to proceed. S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.