Don't use usub.with.overflow intrinsic #103299

nikic · 2022-10-20T10:49:48Z

The canonical form of a usub.with.overflow check in LLVM are separate sub + icmp instructions, rather than a usub.with.overflow intrinsic. Using usub.with.overflow will generally result in worse optimization potential.

The backend will attempt to form usub.with.overflow when it comes to actual instruction selection. This is not fully reliable, but I believe this is a better tradeoff than using the intrinsic in IR.

Fixes #103285.

The canonical form of a usub.with.overflow check in LLVM are separate sub + icmp instructions, rather than a usub.with.overflow intrinsic. Using usub.with.overflow will generally result in worse optimization potential. The backend will attempt to form usub.with.overflow when it comes to actual instruction selection. This is not fully reliable, but I believe this is a better tradeoff than using the intrinsic in IR. Fixes rust-lang#103285.

rust-highfive · 2022-10-20T10:49:52Z

r? @wesleywiser

(rust-highfive has picked a reviewer for you, use r? to override)

nikic · 2022-10-20T10:51:18Z

@bors try @rust-timer queue

rust-timer · 2022-10-20T10:51:20Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2022-10-20T10:51:27Z

⌛ Trying commit 7833012 with merge 9e5787ab4583e67d16150a014f6c118dfa47ab43...

bors · 2022-10-20T13:09:39Z

☀️ Try build successful - checks-actions
Build commit: 9e5787ab4583e67d16150a014f6c118dfa47ab43 (9e5787ab4583e67d16150a014f6c118dfa47ab43)

rust-timer · 2022-10-20T13:09:41Z

Queued 9e5787ab4583e67d16150a014f6c118dfa47ab43 with parent 4b3b731, future comparison URL.

rust-timer · 2022-10-20T17:03:20Z

Finished benchmarking commit (9e5787ab4583e67d16150a014f6c118dfa47ab43): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.5%	[2.5%, 2.5%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean¹	range	count²
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.3%	[-2.3%, -2.3%]	1
All ❌✅ (primary)	-	-	0

the arithmetic mean of the percent change ↩ ↩²
number of relevant changes ↩ ↩²

wesleywiser

🎉

wesleywiser · 2022-10-21T01:36:10Z

@bors r+

bors · 2022-10-21T01:36:12Z

📌 Commit 7833012 has been approved by wesleywiser

It is now in the queue for this repository.

bors · 2022-10-30T17:45:08Z

⌛ Testing commit 7833012 with merge f42b6fa...

bors · 2022-10-30T20:25:49Z

☀️ Test successful - checks-actions
Approved by: wesleywiser
Pushing f42b6fa to master...

rust-timer · 2022-10-30T22:42:02Z

Finished benchmarking commit (f42b6fa): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.3%	[1.2%, 1.4%]	2
Regressions ❌ (secondary)	3.6%	[3.2%, 4.1%]	6
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	1.3%	[1.2%, 1.4%]	2

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.0%	[-3.2%, -2.8%]	2
All ❌✅ (primary)	-	-	0

nnethercote · 2022-10-30T23:03:09Z

This is benchmark noise.

@rustbot label: +perf-regression-triaged

Don't use usub.with.overflow intrinsic The canonical form of a usub.with.overflow check in LLVM are separate sub + icmp instructions, rather than a usub.with.overflow intrinsic. Using usub.with.overflow will generally result in worse optimization potential. The backend will attempt to form usub.with.overflow when it comes to actual instruction selection. This is not fully reliable, but I believe this is a better tradeoff than using the intrinsic in IR. Fixes rust-lang#103285.

Make `checked` ops emit *unchecked* LLVM operations where feasible For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write then checked methods in such a way that we stop emitting wrapping versions of them. For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither ```rust a.checked_sub(b).unwrap() ``` nor ```rust a.checked_sub(b).unwrap_unchecked() ``` actually optimizes to `sub nuw`. After this PR they do. cc rust-lang#103299

…bilee Make `checked` ops emit *unchecked* LLVM operations where feasible For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write the checked methods in such a way that we stop emitting wrapping versions of them. For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither ```rust a.checked_sub(b).unwrap() ``` nor ```rust a.checked_sub(b).unwrap_unchecked() ``` actually optimizes to `sub nuw`. After this PR they do. cc rust-lang#103299

Invert comparison in `uN::checked_sub` After rust-lang#124114, LLVM no longer combines the comparison and subtraction in `uN::checked_sub` when either operand is a constant (demo: https://rust.godbolt.org/z/MaeoYbsP1). The difference is more pronounced when the expression is slightly more complex (https://rust.godbolt.org/z/4rPavsYdc). This is due to the use of `>=` here: https://github.com/rust-lang/rust/blob/ee97564e3a9f9ac8c65103abb37c6aa48d95bfa2/library/core/src/num/uint_macros.rs#L581-L593 For constant `C`, LLVM eagerly converts `a >= C` into `a > C - 1`, but the backend can only combine `a < C` with `a - C`, not `C - 1 < a` and `a - C`: https://github.com/llvm/llvm-project/blob/e586556e375fc5c4f7e76b5c299cb981f2016108/llvm/lib/CodeGen/CodeGenPrepare.cpp#L1697-L1742 This PR[^1] simply inverts the `>=` into `<` to restore the LLVM magic, and somewhat align this with the implementation of `uN::overflowing_sub` from rust-lang#103299. When the result is stored as an `Option` (rather than being branched/cmoved on), the discriminant is `self >= rhs`. This PR doesn't affect the codegen (and relevant tests) of that since LLVM will negate `self < rhs` to `self >= rhs` when necessary. [^1]: Note to `self`: My very first contribution to publicly-used code. Hopefully like what I should learn to always be, tiny and humble.

Rollup merge of rust-lang#125038 - ivan-shrimp:checked_sub, r=joboet Invert comparison in `uN::checked_sub` After rust-lang#124114, LLVM no longer combines the comparison and subtraction in `uN::checked_sub` when either operand is a constant (demo: https://rust.godbolt.org/z/MaeoYbsP1). The difference is more pronounced when the expression is slightly more complex (https://rust.godbolt.org/z/4rPavsYdc). This is due to the use of `>=` here: https://github.com/rust-lang/rust/blob/ee97564e3a9f9ac8c65103abb37c6aa48d95bfa2/library/core/src/num/uint_macros.rs#L581-L593 For constant `C`, LLVM eagerly converts `a >= C` into `a > C - 1`, but the backend can only combine `a < C` with `a - C`, not `C - 1 < a` and `a - C`: https://github.com/llvm/llvm-project/blob/e586556e375fc5c4f7e76b5c299cb981f2016108/llvm/lib/CodeGen/CodeGenPrepare.cpp#L1697-L1742 This PR[^1] simply inverts the `>=` into `<` to restore the LLVM magic, and somewhat align this with the implementation of `uN::overflowing_sub` from rust-lang#103299. When the result is stored as an `Option` (rather than being branched/cmoved on), the discriminant is `self >= rhs`. This PR doesn't affect the codegen (and relevant tests) of that since LLVM will negate `self < rhs` to `self >= rhs` when necessary. [^1]: Note to `self`: My very first contribution to publicly-used code. Hopefully like what I should learn to always be, tiny and humble.

rust-highfive assigned wesleywiser Oct 20, 2022

rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 20, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 20, 2022

nikic mentioned this pull request Oct 20, 2022

Between opt-level=1 and opt-level=2, LLVM deoptimizes the output of ptr::addr, if it ptr::addr is #[inline] #103285

Closed

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 20, 2022

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 20, 2022

wesleywiser approved these changes Oct 21, 2022

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 21, 2022

bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 30, 2022

bors merged commit f42b6fa into rust-lang:master Oct 30, 2022

rustbot added this to the 1.67.0 milestone Oct 30, 2022

rustbot added the perf-regression Performance regression. label Oct 30, 2022

rustbot added the perf-regression-triaged The performance regression has been triaged. label Oct 30, 2022

scottmcm mentioned this pull request Apr 18, 2024

Make checked ops emit *unchecked* LLVM operations where feasible #124114

Merged

ivan-shrimp mentioned this pull request May 12, 2024

Invert comparison in uN::checked_sub #125038

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use usub.with.overflow intrinsic #103299

Don't use usub.with.overflow intrinsic #103299

nikic commented Oct 20, 2022

rust-highfive commented Oct 20, 2022

nikic commented Oct 20, 2022

rust-timer commented Oct 20, 2022

bors commented Oct 20, 2022

bors commented Oct 20, 2022

rust-timer commented Oct 20, 2022

rust-timer commented Oct 20, 2022

wesleywiser left a comment

wesleywiser commented Oct 21, 2022

bors commented Oct 21, 2022

bors commented Oct 30, 2022

bors commented Oct 30, 2022

rust-timer commented Oct 30, 2022

nnethercote commented Oct 30, 2022

Don't use usub.with.overflow intrinsic #103299

Don't use usub.with.overflow intrinsic #103299

Conversation

nikic commented Oct 20, 2022

rust-highfive commented Oct 20, 2022

nikic commented Oct 20, 2022

rust-timer commented Oct 20, 2022

bors commented Oct 20, 2022

bors commented Oct 20, 2022

rust-timer commented Oct 20, 2022

rust-timer commented Oct 20, 2022

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Footnotes

wesleywiser left a comment

Choose a reason for hiding this comment

wesleywiser commented Oct 21, 2022

bors commented Oct 21, 2022

bors commented Oct 30, 2022

bors commented Oct 30, 2022

rust-timer commented Oct 30, 2022

Overall result: ❌ regressions - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

nnethercote commented Oct 30, 2022