Add a special case for align_offset /w stride != 1 #98866

nagisa · 2022-07-03T21:47:18Z

This generalizes the previous stride == 1 special case to apply to any
situation where the requested alignment is divisible by the stride. This
in turn allows the test case from #98809 produce ideal assembly, along
the lines of:

leaq 15(%rdi), %rax
andq $-16, %rax

This also produces pretty high quality code for situations where the
alignment of the input pointer isn’t known:

pub unsafe fn ptr_u32(slice: *const u32) -> *const u32 {
    slice.offset(slice.align_offset(16) as isize)
}

// =>

movl %edi, %eax
andl $3, %eax
leaq 15(%rdi), %rcx
andq $-16, %rcx
subq %rdi, %rcx
shrq $2, %rcx
negq %rax
sbbq %rax, %rax
orq  %rcx, %rax
leaq (%rdi,%rax,4), %rax

Here LLVM is smart enough to replace the usize::MAX special case with
a branch-less bitwise-OR approach, where the mask is constructed using
the neg and sbb instructions. This appears to work across various
architectures I’ve tried.

This change ends up introducing more branches and code in situations
where there is less knowledge of the arguments. For example when the
requested alignment is entirely unknown. This use-case was never really
a focus of this function, so I’m not particularly worried, especially
since llvm-mca is saying that the new code is still appreciably faster,
despite all the new branching.

Fixes #98809.
Sadly, this does not help with #72356.

rustbot · 2022-07-03T21:47:21Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

rust-highfive · 2022-07-03T21:47:21Z

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

nagisa · 2022-07-06T20:59:40Z

library/core/src/ptr/mod.rs

+            let byte_offset = wrapping_sub(aligned_address, addr);
+            // SAFETY: `stride` is non-zero. This is guaranteed to divide exactly as well, because
+            // addr has been verified to be aligned to the original type’s alignment requirements.
+            unsafe { exact_div(byte_offset, stride) }


Thinking about it again, there may be a better way to compute the element offset in the first place rather than dividing the byte offset. I believe this won’t regress stride == 1 case anyway and this division won’t actually appear in most of the code that immediately passes the result from this function to an offset, either so I wouldn’t consider it a blocker for the time being.

Mark-Simulacrum · 2022-07-14T00:51:31Z

r=me modulo perf:

@bors try @rust-timer queue

rust-timer · 2022-07-14T00:51:33Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-07-14T00:51:41Z

⌛ Trying commit 3d98adb558215e0f5078c33d60760bad1295481b with merge da5de824feffc6dd3116d84c8dbc8083dcd61358...

Mark-Simulacrum · 2022-07-14T00:51:59Z

It might also make sense to add an asm or LLVM test verifying the optimization here is working, so that we don't unintentionally regress in the future.

bors · 2022-07-14T02:45:51Z

☀️ Try build successful - checks-actions
Build commit: da5de824feffc6dd3116d84c8dbc8083dcd61358 (da5de824feffc6dd3116d84c8dbc8083dcd61358)

rust-timer · 2022-07-14T02:45:53Z

Queued da5de824feffc6dd3116d84c8dbc8083dcd61358 with parent 87588a2, future comparison URL.

rust-timer · 2022-07-14T05:25:59Z

Finished benchmarking commit (da5de824feffc6dd3116d84c8dbc8083dcd61358): comparison url.

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	1.1%	1.1%	1
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-2.7%	-3.5%	3
All 😿🎉 (primary)	N/A	N/A	0

Cycles

Results

Primary benchmarks: no relevant changes found
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	2.5%	2.5%	1
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	-4.1%	-4.1%	1
All 😿🎉 (primary)	N/A	N/A	0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

the arithmetic mean of the percent change ↩ ↩²
number of relevant changes ↩ ↩²

Mark-Simulacrum · 2022-07-14T12:48:03Z

I'll hold off on approving in case you want to add the codegen test:

It might also make sense to add an asm or LLVM test verifying the optimization here is working, so that we don't unintentionally regress in the future.

but otherwise r=me, perf looks neutral (but not unexpected, this kind of thing is likely to have marginal impact at best for something as large as rustc).

nagisa · 2022-07-16T13:44:29Z

@bors r=Mark-Simulacrum

bors · 2022-07-16T13:44:31Z

📌 Commit 36a96bb6f67d2c48a36988d742cba02003eaab98 has been approved by Mark-Simulacrum

It is now in the queue for this repository.

bors · 2022-07-16T19:45:54Z

⌛ Testing commit 36a96bb6f67d2c48a36988d742cba02003eaab98 with merge fc0ddaf7ea2f0db718d1bdb7693eabf84d197611...

bors · 2022-07-16T20:07:09Z

💔 Test failed - checks-actions

nagisa · 2022-07-16T20:37:13Z

@bors r=Mark-Simulacrum

bors · 2022-07-16T20:37:14Z

📌 Commit fe2a05e53ef813151f6314a8bfa57c790fafec6e has been approved by Mark-Simulacrum

It is now in the queue for this repository.

This generalizes the previous `stride == 1` special case to apply to any situation where the requested alignment is divisible by the stride. This in turn allows the test case from rust-lang#98809 produce ideal assembly, along the lines of: leaq 15(%rdi), %rax andq $-16, %rax This also produces pretty high quality code for situations where the alignment of the input pointer isn’t known: pub unsafe fn ptr_u32(slice: *const u32) -> *const u32 { slice.offset(slice.align_offset(16) as isize) } // => movl %edi, %eax andl $3, %eax leaq 15(%rdi), %rcx andq $-16, %rcx subq %rdi, %rcx shrq $2, %rcx negq %rax sbbq %rax, %rax orq %rcx, %rax leaq (%rdi,%rax,4), %rax Here LLVM is smart enough to replace the `usize::MAX` special case with a branch-less bitwise-OR approach, where the mask is constructed using the neg and sbb instructions. This appears to work across various architectures I’ve tried. This change ends up introducing more branches and code in situations where there is less knowledge of the arguments. For example when the requested alignment is entirely unknown. This use-case was never really a focus of this function, so I’m not particularly worried, especially since llvm-mca is saying that the new code is still appreciably faster, despite all the new branching. Fixes rust-lang#98809. Sadly, this does not help with rust-lang#72356.

nagisa · 2022-07-16T22:28:21Z

@bors r=Mark-Simulacrum

The test is definitely turning out to be as finicky as I feared it would be.

bors · 2022-07-16T22:28:22Z

📌 Commit 62a182c has been approved by Mark-Simulacrum

It is now in the queue for this repository.

bors · 2022-07-16T23:28:31Z

⌛ Testing commit 62a182c with merge db41351...

bors · 2022-07-17T02:09:04Z

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing db41351 to master...

rust-timer · 2022-07-17T03:23:45Z

Finished benchmarking commit (db41351): comparison url.

Instruction count

Primary benchmarks: no relevant changes found
Secondary benchmarks: 😿 relevant regressions found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	2.3%	3.0%	6
Improvements 🎉 (primary)	N/A	N/A	0
Improvements 🎉 (secondary)	N/A	N/A	0
All 😿🎉 (primary)	N/A	N/A	0

Max RSS (memory usage)

Results

Primary benchmarks: 🎉 relevant improvement found
Secondary benchmarks: 🎉 relevant improvements found

	mean¹	max	count²
Regressions 😿 (primary)	N/A	N/A	0
Regressions 😿 (secondary)	N/A	N/A	0
Improvements 🎉 (primary)	-3.3%	-3.3%	1
Improvements 🎉 (secondary)	-4.5%	-9.4%	6
All 😿🎉 (primary)	-3.3%	-3.3%	1

Cycles

Results

Primary benchmarks: mixed results
Secondary benchmarks: mixed results

	mean¹	max	count²
Regressions 😿 (primary)	2.4%	2.4%	1
Regressions 😿 (secondary)	3.0%	3.8%	3
Improvements 🎉 (primary)	-2.5%	-2.5%	1
Improvements 🎉 (secondary)	-2.5%	-2.5%	1
All 😿🎉 (primary)	-0.1%	-2.5%	2

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

the arithmetic mean of the percent change ↩ ↩² ↩³
number of relevant changes ↩ ↩² ↩³

rylev · 2022-07-19T16:15:08Z

Given this is a change in library code, and it only impacts one secondary benchmark (deeply-nested-multi) , I'm going to mark this as triaged.

@rustbot label: +perf-regression-triaged

Edit: it looks like it's just noise, corrected from the previous run.

rust-highfive assigned Mark-Simulacrum Jul 3, 2022

rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Jul 3, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 3, 2022

nagisa force-pushed the nagisa/align-offset-wroom branch from f495c37 to 3d98adb Compare July 3, 2022 22:18

nagisa commented Jul 6, 2022

View reviewed changes

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 14, 2022

Mark-Simulacrum removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 14, 2022

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 14, 2022

nagisa force-pushed the nagisa/align-offset-wroom branch from 3d98adb to 36a96bb Compare July 16, 2022 12:56

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 16, 2022

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 16, 2022

nagisa force-pushed the nagisa/align-offset-wroom branch from 36a96bb to fe2a05e Compare July 16, 2022 20:37

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 16, 2022

This comment has been minimized.

Sign in to view

nagisa force-pushed the nagisa/align-offset-wroom branch from fe2a05e to 62a182c Compare July 16, 2022 22:27

bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 17, 2022

bors merged commit db41351 into rust-lang:master Jul 17, 2022

rustbot added this to the 1.64.0 milestone Jul 17, 2022

rustbot added the perf-regression Performance regression. label Jul 17, 2022

rustbot added the perf-regression-triaged The performance regression has been triaged. label Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a special case for align_offset /w stride != 1 #98866

Add a special case for align_offset /w stride != 1 #98866

nagisa commented Jul 3, 2022

rustbot commented Jul 3, 2022

rust-highfive commented Jul 3, 2022

nagisa Jul 6, 2022

Mark-Simulacrum commented Jul 14, 2022

rust-timer commented Jul 14, 2022

bors commented Jul 14, 2022

Mark-Simulacrum commented Jul 14, 2022

bors commented Jul 14, 2022

rust-timer commented Jul 14, 2022

rust-timer commented Jul 14, 2022

Mark-Simulacrum commented Jul 14, 2022

nagisa commented Jul 16, 2022

bors commented Jul 16, 2022

bors commented Jul 16, 2022

bors commented Jul 16, 2022

nagisa commented Jul 16, 2022

bors commented Jul 16, 2022

This comment has been minimized.

This comment has been minimized.

nagisa commented Jul 16, 2022

bors commented Jul 16, 2022

bors commented Jul 16, 2022

bors commented Jul 17, 2022

rust-timer commented Jul 17, 2022

rylev commented Jul 19, 2022 •

edited

Loading

Add a special case for align_offset /w stride != 1 #98866

Add a special case for align_offset /w stride != 1 #98866

Conversation

nagisa commented Jul 3, 2022

rustbot commented Jul 3, 2022

rust-highfive commented Jul 3, 2022

nagisa Jul 6, 2022

Choose a reason for hiding this comment

Mark-Simulacrum commented Jul 14, 2022

rust-timer commented Jul 14, 2022

bors commented Jul 14, 2022

Mark-Simulacrum commented Jul 14, 2022

bors commented Jul 14, 2022

rust-timer commented Jul 14, 2022

rust-timer commented Jul 14, 2022

Instruction count

Max RSS (memory usage)

Cycles

Footnotes

Mark-Simulacrum commented Jul 14, 2022

nagisa commented Jul 16, 2022

bors commented Jul 16, 2022

bors commented Jul 16, 2022

bors commented Jul 16, 2022

nagisa commented Jul 16, 2022

bors commented Jul 16, 2022

This comment has been minimized.

This comment has been minimized.

nagisa commented Jul 16, 2022

bors commented Jul 16, 2022

bors commented Jul 16, 2022

bors commented Jul 17, 2022

rust-timer commented Jul 17, 2022

Instruction count

Max RSS (memory usage)

Cycles

Footnotes

rylev commented Jul 19, 2022 • edited Loading

rylev commented Jul 19, 2022 •

edited

Loading