-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Override try_fold for StepBy #51435
Override try_fold for StepBy #51435
Conversation
r? @kennytm (rust_highfive has picked a reviewer for you, use r? to override) |
@scottmcm This PR is from your repo. I’m not good at reading assembly but your comment at #27741 (comment) suggests that this is an improvement. Should we land it? The diff looks good to me. |
I've done a simple benchmark and the test code from the comment is improved from 800µs/iter to 600µs/iter.
Source code: #![feature(try_trait, test)]
extern crate test;
use std::ops::Try;
use test::{Bencher, black_box};
#[no_mangle]
pub fn std_compute(a: u64, b: u64) -> u64 {
StepByWithoutTryFold(StepBy {
iter: a..b,
step: 6,
first_take: true,
}).map(|x| x ^ (x - 1)).sum()
}
#[no_mangle]
pub fn pr_compute(a: u64, b: u64) -> u64 {
StepBy {
iter: a..b,
step: 6,
first_take: true,
}.map(|x| x ^ (x - 1)).sum()
}
#[bench]
fn std_bench(bencher: &mut Bencher) {
let a = black_box(1);
let b = black_box(5000000);
bencher.iter(|| {
black_box(std_compute(a, b));
});
}
#[bench]
fn pr_bench(bencher: &mut Bencher) {
let a = black_box(1);
let b = black_box(5000000);
bencher.iter(|| {
black_box(pr_compute(a, b));
});
}
struct StepBy<I> {
iter: I,
step: usize,
first_take: bool,
}
struct StepByWithoutTryFold<I>(StepBy<I>);
impl<I: Iterator> Iterator for StepBy<I> {
type Item = I::Item;
#[inline]
fn next(&mut self) -> Option<Self::Item> {
if self.first_take {
self.first_take = false;
self.iter.next()
} else {
self.iter.nth(self.step)
}
}
#[inline]
fn try_fold<B, F, R>(&mut self, init: B, mut f: F) -> R where
Self: Sized, F: FnMut(B, Self::Item) -> R, R: Try<Ok=B>
{
let mut accum = init;
if self.first_take {
self.first_take = false;
if let Some(x) = self.iter.next() {
accum = f(accum, x)?;
} else {
return Try::from_ok(accum);
}
}
while let Some(x) = self.iter.nth(self.step) {
accum = f(accum, x)?;
}
Try::from_ok(accum)
}
}
impl<I: Iterator> Iterator for StepByWithoutTryFold<I> {
type Item = I::Item;
#[inline]
fn next(&mut self) -> Option<Self::Item> {
if self.0.first_take {
self.0.first_take = false;
self.0.iter.next()
} else {
self.0.iter.nth(self.0.step)
}
}
} |
Hmm, I'd forgotten I'd started this 😅 I see two options:
Thoughts? |
I think we could do the refactoring later. @bors r+ |
📌 Commit 2d55d28 has been approved by |
🔒 Merge conflict |
I've rerun @kennytm's code but replaced the
|
Thanks for looking into this @Emerentius. Closing in favor of #51601. |
Specialize StepBy<Range(Inclusive)> Part of #51557, related to #43064, #31155 As discussed in the above issues, `step_by` optimizes very badly on ranges which is related to 1. the special casing of the first `StepBy::next()` call 2. the need to do 2 additions of `n - 1` and `1` inside the range's `next()` This PR eliminates both by overriding `next()` to always produce the current element and also step ahead by `n` elements in one go. The generated code is much better, even identical in the case of a `Range` with constant `start` and `end` where `start+step` can't overflow. Without constant bounds it's a bit longer than the manual loop. `RangeInclusive` doesn't optimize as nicely but is still much better than the original asm. Unsigned integers optimize better than signed ones for some reason. See the following two links for a comparison. [godbolt: specialization for ..](https://godbolt.org/g/haHLJr) [godbolt: specialization for ..=](https://godbolt.org/g/ewyMu6) `RangeFrom`, the only other range with an `Iterator` implementation can't be specialized like this without changing behaviour due to overflow. There is no way to save "finished-ness". The approach can not be used in general, because it would produce side effects of the underlying iterator too early. May obsolete #51435, haven't checked.
Since #51601 was reverted, should this be considered again? |
… r=scottmcm Override `StepBy::{try_fold, try_rfold}` Previous PR: rust-lang#51435 The previous PR was closed in favor of rust-lang#51601, which was later reverted. I don't think these implementations will make it harder to specialize `StepBy<Range<_>>` later, so we should be able to land this without any consequences. This should fix rust-lang#57517 – in my benchmarks `iter` and `iter.step_by(1)` now perform equally well, provided internal iteration is used.
No description provided.