-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for chunks_exact/_mut; slice chunks with exact size #47115
Comments
Example can be found here The relevant part with differences in the assembly is before: .LBB4_24:
cmp r11, 4
mov eax, 4
cmovb rax, r11
test rbx, rbx
je .LBB4_18
cmp rbx, 4
mov edx, 4
cmovb rdx, rbx
test r13, r13
je .LBB4_18
mov qword ptr [rbp - 96], rax
mov qword ptr [rbp - 48], rsi
mov qword ptr [rbp - 56], r9
cmp r11, 3
jbe .LBB4_27
mov qword ptr [rbp - 96], rdx
mov qword ptr [rbp - 48], rsi
mov qword ptr [rbp - 56], r9
cmp rbx, 3
jbe .LBB4_29
cmp rax, 1
je .LBB4_39
lea r10, [r13 + rax]
sub r11, rax
lea r12, [r15 + rdx]
sub rbx, rdx
cmp rax, 3
jb .LBB4_41
je .LBB4_42
movzx r14d, byte ptr [r13]
movzx r8d, byte ptr [r13 + 1]
movzx eax, byte ptr [r13 + 2]
imul r13d, eax, 19595
imul edi, r8d, 38470
imul eax, r14d, 7471
add eax, edi
add eax, r13d
shr eax, 16
mov byte ptr [r15], al
cmp rdx, 1
je .LBB4_44
mov byte ptr [r15 + 1], al
cmp rdx, 3
jb .LBB4_45
mov byte ptr [r15 + 2], al
je .LBB4_46
mov byte ptr [r15 + 3], 0
test r11, r11
mov r15, r12
mov r13, r10
jne .LBB4_24 after: .LBB5_18:
test rsi, rsi
je .LBB5_20
add rdx, -4
movzx r10d, byte ptr [rsi]
movzx eax, byte ptr [rsi + 1]
movzx ebx, byte ptr [rsi + 2]
lea rsi, [rsi + 4]
imul r13d, ebx, 19595
imul eax, eax, 38470
imul ebx, r10d, 7471
add ebx, eax
add ebx, r13d
shr ebx, 16
mov byte ptr [rcx], bl
mov byte ptr [rcx + 1], bl
mov byte ptr [rcx + 2], bl
mov byte ptr [rcx + 3], 0
lea rcx, [rcx + 4]
cmp rdx, 4
jae .LBB5_18 |
I don't want to derail your discussion too much. Const generics and value-level chunks both have their uses. I'm reminded of this existing implementation of the "const" kind of chunking, in this case in an iterator that actually allows access to the whole blocks and then the uneven tail at the end: BlockedIter. Note that a |
Interesting, thanks for mentioning that. For my use case that would probably work more or less the same way, but it's slightly different indeed. |
BlockedIter was developed while looking at exactly the hand off between the blocks and the elementwise tail; the idea was to avoid some of the loss that otherwise shows up in code that converts between slices and slice iterators. In this case it's the same pointer being bumped through the whole iteration. |
The chunks iterators are a good candidate for zip specialization (TrustedRandomAccess trait) |
True. I'll add that in a bit, as a separate PR for the existing chunked iterators and as a separate commit for the new ones. |
I forgot to add some benchmark results earlier. This is with the code from #47115 (comment) and running on a 1920*1080*4 byte slice. Basically 2.46x as fast.
|
…s, r=kennytm Implement TrustedRandomAccess for slice::{Chunks, ChunksMut, Windows} As suggested by @bluss in rust-lang#47115 (comment)
Implement TrustedRandomAccess for slice::{Chunks, ChunksMut, Windows} As suggested by @bluss in #47115 (comment)
These guarantee that always the requested slice size will be returned and any leftoever elements at the end will be ignored. It allows llvm to get rid of bounds checks in the code using the iterator. This is inspired by the same iterators provided by ndarray. See rust-lang#47115
Thinking of this from another direction, if either Rust or LLVM were to magically figure out how to perform this optimization in the case of |
Add slice::ExactChunks and ::ExactChunksMut iterators These guarantee that always the requested slice size will be returned and any leftoever elements at the end will be ignored. It allows llvm to get rid of bounds checks in the code using the iterator. This is inspired by the same iterators provided by ndarray. Fixes rust-lang#47115 I'll add unit tests for all this if the general idea and behaviour makes sense for everybody. Also see rust-lang#47115 (comment) for an example what this improves.
Add slice::ExactChunks and ::ExactChunksMut iterators These guarantee that always the requested slice size will be returned and any leftoever elements at the end will be ignored. It allows llvm to get rid of bounds checks in the code using the iterator. This is inspired by the same iterators provided by ndarray. Fixes rust-lang#47115 I'll add unit tests for all this if the general idea and behaviour makes sense for everybody. Also see rust-lang#47115 (comment) for an example what this improves.
This shouldn't have been closed by the merge, can someone reopen it? I don't have the permissions for that it seems |
The libs team discussed this and the consensus was to stabilize this with the methods panicking before returning an iterator if the slice’s length is not a multiple of the requested chunk size. This is consistent with e.g. @rfcbot fcp merge |
Rename slice::exact_chunks() to slice::chunks_exact() See rust-lang#47115 (comment) and rust-lang#47115 (comment)
With that done, how do we go from here to stabilization? |
The next step would be FCP, but this issue is already in FCP. It has a blocking concern that would need to be resolved by @SimonSapin to make progress (assuming there's consensus about what to do with panicking) |
Ok we discussed this a bit at libs triage, and the conclusion is that we'd like to recheck that the concerns with the original panicking API are resolved with today's implementation. To recap, today's implementation (on nightly) doesn't panic if the slice doesn't have an exact multiple of the length, but rather it's silently ignored. There are inherent methods on each iterator, though, to pull out the remainder. @shepmaster does this resolve your original concern? Or others as well, any opposition to having the current semantics be stabilized? |
I am happy with the current external behavior of the function. |
Ok great! @SimonSapin can you |
Are there any new concerns? @SimonSapin |
@rfcbot resolve panicking |
Ping checkbox @aturon, @sfackler, or @withoutboats |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
Ok! It's been quite awhile here so I think it's ok to shirt circuit the FCP slightly, @sdroege want to send the stabilization PR? |
Should we consider #54580 to be "trivially enough" similar to this to stabilize at the same time? |
There's now #55178 for the stabilization of this here (but not |
Add slice::rchunks(), rchunks_mut(), rchunks_exact() and rchunks_exact_mut() These work exactly like the normal chunks iterators but start creating chunks from the end of the slice. ---- The new iterators were motivated by a [comment](#47115 (comment)) by @DutchGhost. ~~~This currently includes the commits from #54537 to not have to rename things twice or have merge conflicts. I'll force-push a new version of the branch ones those are in master.~~~ Also the stabilization tracking issue is just some number right now. I'll create the corresponding issue once this is reviewed and otherwise mergeable. cc @DutchGhost
…lexcrichton Stabilize slice::chunks_exact(), chunks_exact_mut(), rchunks(), rchunks_mut(), rchunks_exact(), rchunks_exact_mut() Fixes rust-lang#47115, rust-lang#55177
This is inspired by ndarray and generally seems to allow llvm to remove more bounds checks in the code using the iterator (because the slices will always be exactly the requested size), and doesn't require the caller to add additional checks.
A PR adding these for further discussion will come in a bit.
Open questions:
chunk_size
, or omit any leftover elements.The latter is implemented right now and very similar to how
zip
works and @shepmaster even argues that without this, this iterator is kind of useless and the optimization should be implemented as part of the normal chunks iterator (which seems non-trivial, see ).Omission of leftover elements is also how this iterator is implemented in ndarray (but far more general).
The text was updated successfully, but these errors were encountered: