-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specialize StepBy<Range(Inclusive)> #51601
Conversation
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
#36262 strikes again. And here I thought I'd be safe 'cause it's always clear which range type we have and all integers implement |
Making Specialize StepBy<Range(Inclusive)> efficient has a high priority in system language design. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Pro-tip: if your local tests don't run against your own code, they are much more likely to succeed. |
src/libcore/iter/mod.rs
Outdated
#[inline] | ||
fn next(&mut self) -> Option<Self::Item> { | ||
self.first_take = false; | ||
if self.iter.start >= self.iter.end { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step
only requires PartialOrd
, so using >=
is technically a behaviour change. It wouldn't change anything that's stable today, but I think the original way is better, with an NAN-like endpoint making the range empty. (See similar discussion for Range::contains
in #49130 (comment))
Picking someone from the libs team to review this. r? @sfackler |
6d6e976
to
2879174
Compare
I flipped the comparisons back around to keep the original behaviour for NaN-like values (for when they are possible) and squashed the commits together. |
src/libcore/iter/mod.rs
Outdated
@@ -737,6 +733,76 @@ impl<I> Iterator for StepBy<I> where I: Iterator { | |||
} | |||
} | |||
|
|||
// hidden trait for specializing iterator methods | |||
// current use is limited to next() on StepBy | |||
trait SpecIter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this trait mention StepBy
in its name just to make it a bit more clear that it's limited to those impls?
src/libcore/iter/mod.rs
Outdated
#[inline] | ||
fn next(&mut self) -> Option<Self::Item> { | ||
self.first_take = false; | ||
if !(self.iter.start < self.iter.end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comparison is structured weirdly to handle NaN properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's if start < end { return stuff } else { None }
flipped around so the positive case has no indent. Should I change it?
Seems reasonable to me! |
the originally generated code was highly suboptimal this brings it close to the same code or even exactly the same as a manual while-loop by eliminating a branch and the double stepping of n-1 + 1 steps The intermediate trait lets us circumvent the specialization type inference bugs
2879174
to
000aff6
Compare
@sfackler Renamed |
@bors r+ |
📌 Commit 000aff6 has been approved by |
Specialize StepBy<Range(Inclusive)> Part of #51557, related to #43064, #31155 As discussed in the above issues, `step_by` optimizes very badly on ranges which is related to 1. the special casing of the first `StepBy::next()` call 2. the need to do 2 additions of `n - 1` and `1` inside the range's `next()` This PR eliminates both by overriding `next()` to always produce the current element and also step ahead by `n` elements in one go. The generated code is much better, even identical in the case of a `Range` with constant `start` and `end` where `start+step` can't overflow. Without constant bounds it's a bit longer than the manual loop. `RangeInclusive` doesn't optimize as nicely but is still much better than the original asm. Unsigned integers optimize better than signed ones for some reason. See the following two links for a comparison. [godbolt: specialization for ..](https://godbolt.org/g/haHLJr) [godbolt: specialization for ..=](https://godbolt.org/g/ewyMu6) `RangeFrom`, the only other range with an `Iterator` implementation can't be specialized like this without changing behaviour due to overflow. There is no way to save "finished-ness". The approach can not be used in general, because it would produce side effects of the underlying iterator too early. May obsolete #51435, haven't checked.
☀️ Test successful - status-appveyor, status-travis |
This is buggy: #55985 |
So should we temporary revert this PR? Or should we leave it as-is and wait for the proper fix? I personally prefer the first option. |
Let's revert until it's fixed, yeah. |
The performance of this patch was not good enough :-( |
beta backport rollup Backports of some beta-approved PRs - [x] #55385: NLL: cast causes failure to promote to static - [x] #56043: remove "approx env bounds" if we already know from trait - [x] #56003: do not propagate inferred bounds on trait objects if they involve `Self` - [x] #55852: Rewrite `...` as `..=` as a `MachineApplicable` 2018 idiom lint - [x] #55804: rustdoc: don't inline `pub use some_crate` unless directly asked to - [x] #56059: Increase `Duration` approximate equal threshold to 1us - [x] Keep resolved defs in path prefixes and emit them in save-analysis #54145 - [x] Adjust Ids of path segments in visibility modifiers #55487 - [x] save-analysis: bug fix and optimisation. #55521 - [x] save-analysis: be even more aggressive about ignorning macro-generated defs #55936 - [x] save-analysis: fallback to using path id #56060 - [x] save-analysis: Don't panic for macro-generated use globs #55879 - [x] Add temporary renames to manifests for rustfmt/clippy #56081 - [x] Revert #51601 #56049 - [x] Fix stability hole with `static _` #55983 - [x] #56077 - [x] Fix Rustdoc ICE when checking blanket impls #55258 - [x] Updated RELEASES.md for 1.31.0 #55678 - [x] ~~#56061~~ #56111 - [x] Stabilize `extern_crate_item_prelude` #56032 Still running tests locally, and I plan to backport @nrc's other PRs too (cc @petrochenkov -- thanks for the advice)
… r=scottmcm Override `StepBy::{try_fold, try_rfold}` Previous PR: rust-lang#51435 The previous PR was closed in favor of rust-lang#51601, which was later reverted. I don't think these implementations will make it harder to specialize `StepBy<Range<_>>` later, so we should be able to land this without any consequences. This should fix rust-lang#57517 – in my benchmarks `iter` and `iter.step_by(1)` now perform equally well, provided internal iteration is used.
Part of #51557, related to #43064, #31155
As discussed in the above issues,
step_by
optimizes very badly on ranges which is related toStepBy::next()
calln - 1
and1
inside the range'snext()
This PR eliminates both by overriding
next()
to always produce the current element and also step ahead byn
elements in one go. The generated code is much better, even identical in the case of aRange
with constantstart
andend
wherestart+step
can't overflow. Without constant bounds it's a bit longer than the manual loop.RangeInclusive
doesn't optimize as nicely but is still much better than the original asm.Unsigned integers optimize better than signed ones for some reason.
See the following two links for a comparison.
godbolt: specialization for ..
godbolt: specialization for ..=
RangeFrom
, the only other range with anIterator
implementation can't be specialized like this without changing behaviour due to overflow. There is no way to save "finished-ness".The approach can not be used in general, because it would produce side effects of the underlying iterator too early.
May obsolete #51435, haven't checked.