-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slice::iter() does not preserve number of iterations information for optimizer causing unneeded bounds checks #75935
Comments
I should probably add that while this code is very contrived, it's based on real code that shows the same behaviour. |
Maybe a duplicate of #74938: An upgrade to LLVM 12 or so is needed to fix the issue. |
I'll try adding only that change to the LLVM version used by rustc. Let's see if that solves anything (or works at all :) ). |
No, does not work at all. Gives a segfault in exactly that function that is changed inside LLVM. |
Got it to work. It doesn't fix this issue here, but it fixes #75936 . This one here is still valid. |
My guess is that this is because the slice iterators don't go via a counter but instead via an end pointer, so it's not obvious anymore that it's just iterating exactly |
Yes, going with a simple iterator that counts instead gets rid of the bounds check. Code: pub fn foo1(x: &[u32], y: &[u32]) -> u32 {
let mut sum = 0;
let chunk_size = y.len();
for (c, y) in Iter::new(y).enumerate() {
for chunk in x.chunks_exact(chunk_size) {
sum += chunk[c] + y;
}
}
sum
}
struct Iter<'a> {
ptr: *const u32,
len: usize,
phantom: std::marker::PhantomData<&'a [u32]>,
}
impl<'a> Iter<'a> {
fn new(v: &'a [u32]) -> Iter<'a> {
Iter {
ptr: v.as_ptr(),
len: v.len(),
phantom: std::marker::PhantomData,
}
}
}
impl<'a> Iterator for Iter<'a> {
type Item = &'a u32;
fn next(&mut self) -> Option<&'a u32> {
unsafe {
if self.len == 0 {
return None;
}
let item = &*self.ptr;
self.ptr = self.ptr.add(1);
self.len -= 1;
Some(item)
}
}
} |
You could update the issue description for new information. |
Indeed, thanks. Done! |
Is there any particular reason why the std-implementation ( |
I don't know the history, but in theory one instruction less per iteration (one pointer addition vs. one pointer addition and one counter addition). And it might be taken advantage of in some specialized impls but I don't know. Might be worth looking at what |
The comment in code says that it's because of an optimization for ZST: rust/library/core/src/slice/mod.rs Lines 4009 to 4014 in 118860a
|
If you had a counter that would work basically the same way, you'd just check |
I can do an implementation of |
And another variant with specialization of the Check assembly here, the new one is The I'll create a PR for this later. pub fn foo2(x: &[u32], y: &[u32]) -> u32 {
let mut sum = 0;
let chunk_size = y.len();
for (c, y) in Enumerate::new(y.iter()) {
for chunk in x.chunks_exact(chunk_size) {
sum += chunk[c] + y;
}
}
sum
}
struct Enumerate<I> {
iter: I,
count: usize,
len: usize,
}
impl<I: Iterator> Enumerate<I> {
fn new(iter: I) -> Self {
EnumerateImpl::new(iter)
}
}
impl<I> Iterator for Enumerate<I>
where
I: Iterator,
{
type Item = (usize, <I as Iterator>::Item);
#[inline]
fn next(&mut self) -> Option<(usize, <I as Iterator>::Item)> {
EnumerateImpl::next(self)
}
}
// Enumerate specialization trait
#[doc(hidden)]
trait EnumerateImpl<I> {
type Item;
fn new(iter: I) -> Self;
fn next(&mut self) -> Option<(usize, Self::Item)>;
}
impl<I> EnumerateImpl<I> for Enumerate<I>
where
I: Iterator,
{
type Item = I::Item;
default fn new(iter: I) -> Self {
Enumerate {
iter,
count: 0,
len: 0, // unused
}
}
#[inline]
default fn next(&mut self) -> Option<(usize, I::Item)> {
let a = self.iter.next()?;
let i = self.count;
// Possible undefined overflow.
self.count += 1;
Some((i, a))
}
}
impl<I> EnumerateImpl<I> for Enumerate<I>
where
// FIXME: Should probably be TrustedRandomAccess because otherwise size_hint() might be expensive?
I: std::iter::TrustedLen + ExactSizeIterator + Iterator,
{
fn new(iter: I) -> Self {
let len = iter.size_hint().0;
Enumerate {
iter,
count: 0,
len,
}
}
#[inline]
fn next(&mut self) -> Option<(usize, I::Item)> {
let a = self.iter.next()?;
unsafe { std::intrinsics::assume(self.count < self.len); }
let i = self.count;
// Possible undefined overflow.
self.count += 1;
Some((i, a))
}
} |
I've updated the issue description with a minimal testcase in Rust, C/C++ and as discussed in #77822 will report this to LLVM. |
Reported to https://bugs.llvm.org/show_bug.cgi?id=48965 |
This allows to hint at the compiler via `intrinsics::assume()` that the returned index is smaller than the length of the underlying iterator and allows the compiler to optimize code better based on that. Fixes rust-lang#75935
Requesting a perf run on the PR would be a start, which runs an extensive performance test compiling (but not benchmarking) various crates. It might miss unusual number-crunching uses of slices though. |
Thanks, but considering that C++ iterators work the same way, it's probably safer to keep it with this pattern and wait for LLVM to solve that. I would assume that patterns (also) used by C++ are more likely to be optimized better. |
Because in cases where the iterator doesn't get SRoA'd away, When computing the length of a slice iterator we use both One thing we could to better is at least tell LLVM that the pointers are aligned even when not read, though. I wish we had a better core pointer type that could actually do that. |
Godbolt link to the code below: https://rust.godbolt.org/z/aKf3Wq
This code has a bounds check for
chunk[c]
althoughc < chunk_size
by construction.The same code a bit more convoluted gets rid of the bounds check
It seems like the information that
0 <= c < y.len()
gets lost for the optimizer when going viay.iter().enumerate()
. So this is unrelated tochunks_exact()
specifically but I can't come up with an equivalent example without it.edit: As noticed in #75935 (comment), this can be worked around by defining a custom slice iterator that does counting of elements instead of working with an end pointer.
The problem is that the
slice::iter()
works with an end pointer to know when the iteration can stop and keeps no information around for the optimizer that it's actually going to iterate exactly N times. Unclear to me how this information can be preserved without changing how the iterator works, which will probably have other negative effects.edit2:
As noticed in #77822 this happens with C++/C too and can also simplified a lot on the Rust side
edit3: This is now also reported to https://bugs.llvm.org/show_bug.cgi?id=48965
The text was updated successfully, but these errors were encountered: