[WIP] New implementation of the slice iterators using NonZero #46223

bluss · 2017-11-23T20:36:15Z

Reimplement the slice iterators using NonZero where possible.

Iter, IterMut now have a non-nullable field
a. The usual enum layout optimizations apply
b. Automatic and more principled nonnull metadata for that pointer
Big change for zero sized elements. The end field is a counter.
The produced pointer value is equivalent to slice.as_ptr() and
thus &slice[i] for each index i in this case.

Implementation Notes

Regular case, T is not zero-sized

(Nothing changes in this case, but this is the documentation in the code.)

The iterator represents a contiguous range of elements.
ptr is a pointer to the start of the range.
end is a pointer to one past the end of the range.

Initial values are ptr = slice.as_ptr() and end = ptr.offset(slice.len())

The iterator is empty when ptr == end.
The iterator is stepped at the front by incrementing ptr, the
yielded element is the old value of ptr ("post increment" semantic).
The iterator is stepped at the back by decrementing end, the
yielded element is the new value of end ("pre decrement" semantic).

When T is a Zero-sized type (ZST)

ptr is a pointer to the start of the range.
end is used as an integer: the count of elements in the range.

Initial values are ptr = slice.as_ptr() and end = self.len() as *const T

The iterator is empty when end == 0
The iterator is stepped at the front by decrementing end.
The iterator is stepped at the back by decrementing end.

For both front and back, the yielded element is ptr, which does not change
during the iteration.

NonZero

ptr is derived from slice.as_ptr() and just like it is is non-null.
end is allowed to be null (used by the ZST implementation); this is
required since for iterating a [(); usize::MAX] all values of usize (and
equivalently the raw pointer) are used, including zero.

bluss · 2017-11-23T20:37:55Z

I haven't run so many benchmarks yet, but the ones I do are unchanged. These ones below are of very simple loops.

from ndarray "cargo bench --bench higher-order":

 name                   xx ns/iter  yy ns/iter  diff ns/iter  diff %
 fold_axis              681         676                   -5  -0.73%
 map_axis_0             451         452                    1   0.22%
 map_axis_1             452         453                    1   0.22%
 map_regular            172         169                   -3  -1.74%
 map_stride_double_f64  504         506                    2   0.40%
 map_stride_f64         652         666                   14   2.15%
 map_stride_u32         690         681                   -9  -1.30%

Gankra · 2017-11-23T20:43:52Z

These differences seem small enough to just be noise; is the generated assembly actually identical?

eddyb · 2017-11-23T20:46:21Z

cc @rust-lang/libs

bluss · 2017-11-23T20:54:25Z

@gankro Sure, I intended to present them as exactly that, noise. However I'll have to find better benchmarks. I removed the zip benchmarks -- those are entirely in a specialization and don't go through Iterator::next().

The PR posted here so that the full test suite is ran and for comments.

bluss · 2017-11-24T19:53:15Z

Having looked through some benchmarks, nothing stood out. (Some examples a few pct better and apparently some a few pct worse, but it needs to be reduced to very small examples to tell us something.)

I found this code gen example that autovectorizes on nightly but not using a compiler with this PR:

#[no_mangle]
pub fn zip_and(a: &[i32], b: &[i32]) -> i32 {
    let mut result = 0;
    let mut iter_a = a.iter();
    let mut iter_b = b.iter();
    loop {
        if let Some(&elt_a) = iter_a.next() {
            if let Some(&elt_b) = iter_b.next() {
                result |= elt_a | elt_b;
                continue;
            }
        }
        break;
    }
    result
}

A "small" example of 450 lines of unoptimized IR without debug info.

This PR removes the assume(!end.is_null()) or equivalent, and maybe that causes this regression. Otherwise I am open to suggestions. (Edit: I checked, it doesn't seem to help to use a NonZero temprary around end in Iterator::next, for the sized case. 😕 )

eddyb · 2017-11-24T23:15:04Z

Yeah, LLVM is inconsistent with its metadata, keeping the assume would still be useful. The only reason we don't automatically introduce assumes is because we're assuming they're expensive.

Reimplement the slice iterators using `NonZero` where possible. 1. Iter, IterMut now have a non-nullable field a. The usual enum layout optimizations apply b. Automatic and more principled nonnull metadata for that pointer 2. Big change for zero sized elements. The end field is a counter. The produced pointer value is equivalent to `slice.as_ptr()` and thus `&slice[i]` for each index `i` in this case. * Implementation Notes ** Regular case, T is not zero-sized (Nothing changes in this case, but this is the documentation in the code.) The iterator represents a contiguous range of elements. `ptr` is a pointer to the start of the range. `end` is a pointer to one past the end of the range. Initial values are ptr = slice.as_ptr() and end = ptr.offset(slice.len()) The iterator is empty when ptr == end. The iterator is stepped at the front by incrementing ptr, the yielded element is the old value of ptr ("post increment" semantic). The iterator is stepped at the back by decrementing end, the yielded element is the new value of end ("pre decrement" semantic). ** When T is a Zero-sized type (ZST) `ptr` is a pointer to the start of the range. `end` is used as an integer: the count of elements in the range. Initial values are ptr = slice.as_ptr() and end = self.len() as *const T The iterator is empty when end == 0 The iterator is stepped at the front by decrementing end. The iterator is stepped at the back by decrementing end. For both front and back, the yielded element is `ptr`, which does not change during the iteration. ** NonZero `ptr` is derived from `slice.as_ptr()` and just like it is is non-null. `end` is allowed to be null (used by the ZST implementation); this is required since for iterating a [(); usize::MAX] all values of usize (and equivalently the raw pointer) are used, including zero.

bluss · 2017-11-26T19:05:46Z

(Rebased & updated with some cleanups, no major changes).

I've officially given up trying to fix the regression in #46223 (comment) I prototyped this out of tree, which was a bit quicker, and that "regression" does not care about assumes or other changes I thought were natural. The change to make_ref is literally the culprit for the "regression".

That's the macro that changes from:

macro_rules! make_ref {
    ($ptr:expr) => {{
        let ptr = $ptr;
        if size_from_ptr(ptr) == 0 {
            // Use a non-null pointer value
            &*(1 as *mut _)
        } else {
            &*ptr
        }
    }};
}

to this:

macro_rules! make_ref {
    ($ptr:expr) => { &*$ptr };
}

Even funnier, the macro can even be this (which is nonsense but correct in the new impl) and the vectorization does not regress:

macro_rules! make_ref {
    ($ptr:expr) => {{
        let ptr = $ptr;
        if false && size_from_ptr(ptr) == 0 {
            // Use a non-null pointer value
            &*(1 as *mut _)
        } else {
            &*ptr
        }
    }};
}

but we must refuse to insert nonsense of that degree; then I think there's no principle but some strange accident that is causing the original #46223 (comment) code to compile well.

eddyb · 2017-11-28T00:03:18Z

@bluss Oh, have you tried having a ptr variable, instead of applying &* directly on $ptr?

bluss · 2017-11-28T16:14:50Z

@eddyb oh I think so, I tried a lot of things. I have resolved one bounds check regression in the nom testsuite (related to the unrolling / try_fold code ping @arthurprs ; unregressing it involved just tweaking how the ptrdistance was computed in that loop. This WIP PR is not entirely up to date since I'm still trying to fix regressions locally.)

arthurprs · 2017-11-28T17:43:58Z

You mean this #45501? That's an awesome side effect 😃

bluss · 2017-11-28T17:45:43Z

Oh just maybe, it was actually just regress (due to this PR) and unregress (due to WIP on this PR). Just thinking of it since it's the same issue.

kennytm · 2017-12-06T12:45:59Z

Hi @bluss, is there anything supposed to happen for this PR to proceed?

bluss · 2017-12-07T20:12:19Z

@kennytm I've been trying to work on a no-performance-regressions version with no success. I'll have to come back to this with a different goal, probably demonstrating some code savings in some places and code gen regressions in other places.

I was using nom benchmarks for the no-performance-regressions goal. cc @Geal

aidanhs · 2017-12-14T10:39:58Z

Visiting for triage: just to be clear, it sounds like you're going to continue work in this PR rather than create a new one when ready?

bluss · 2017-12-21T18:37:57Z

Yes. I just haven't had time, sorry. Someone else can also pick it up.

bluss changed the title ~~New implementation of the slice iterators using NonZero~~ [WIP] New implementation of the slice iterators using NonZero Nov 23, 2017

kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Nov 23, 2017

bluss mentioned this pull request Nov 23, 2017

Add functions core::ptr::dangling/_mut<T>() -> *const/*mut T #45527

Closed

bluss force-pushed the new-slice-iter branch from 7aa749b to 48c2497 Compare November 26, 2017 19:00

bluss closed this Dec 21, 2017

bluss mentioned this pull request Jan 7, 2018

Iterators over slices of ZST behave strange #42789

Closed

RalfJung mentioned this pull request Jul 11, 2018

slices: fix ZST slice iterators making up pointers; debug_assert alignment in from_raw_parts #52206

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] New implementation of the slice iterators using NonZero #46223

[WIP] New implementation of the slice iterators using NonZero #46223

bluss commented Nov 23, 2017 •

edited

Loading

bluss commented Nov 23, 2017 •

edited

Loading

Gankra commented Nov 23, 2017

eddyb commented Nov 23, 2017

bluss commented Nov 23, 2017 •

edited

Loading

bluss commented Nov 24, 2017 •

edited

Loading

eddyb commented Nov 24, 2017

bluss commented Nov 26, 2017 •

edited

Loading

eddyb commented Nov 28, 2017

bluss commented Nov 28, 2017

arthurprs commented Nov 28, 2017

bluss commented Nov 28, 2017

kennytm commented Dec 6, 2017

bluss commented Dec 7, 2017 •

edited

Loading

aidanhs commented Dec 14, 2017

bluss commented Dec 21, 2017

[WIP] New implementation of the slice iterators using NonZero #46223

[WIP] New implementation of the slice iterators using NonZero #46223

Conversation

bluss commented Nov 23, 2017 • edited Loading

Implementation Notes

Regular case, T is not zero-sized

When T is a Zero-sized type (ZST)

NonZero

bluss commented Nov 23, 2017 • edited Loading

Gankra commented Nov 23, 2017

eddyb commented Nov 23, 2017

bluss commented Nov 23, 2017 • edited Loading

bluss commented Nov 24, 2017 • edited Loading

eddyb commented Nov 24, 2017

bluss commented Nov 26, 2017 • edited Loading

eddyb commented Nov 28, 2017

bluss commented Nov 28, 2017

arthurprs commented Nov 28, 2017

bluss commented Nov 28, 2017

kennytm commented Dec 6, 2017

bluss commented Dec 7, 2017 • edited Loading

aidanhs commented Dec 14, 2017

bluss commented Dec 21, 2017

bluss commented Nov 23, 2017 •

edited

Loading

bluss commented Nov 23, 2017 •

edited

Loading

bluss commented Nov 23, 2017 •

edited

Loading

bluss commented Nov 24, 2017 •

edited

Loading

bluss commented Nov 26, 2017 •

edited

Loading

bluss commented Dec 7, 2017 •

edited

Loading