Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uint256: optimize div-related functions by reducing bounds check #168

Merged
merged 1 commit into from
May 16, 2024

Conversation

AaronChen0
Copy link
Contributor

@AaronChen0 AaronChen0 commented May 15, 2024

Since functions addTo and subMulTo are in a for loop inside udivremKnuth, reducing bounds check in them should result in an observable performance boost.

len(y) > 0 should be guaranteed by those respective non-zero checks for divisors inside Div, Mod, AddMod, etc.

Test

go test ./...
ok  	github.com/holiman/uint256	1.375s

Benchmark

go test -run - -bench 'Mod|Div' -count 10 -timeout 40m >/tmp/old
go test -run - -bench 'Mod|Div' -count 10 -timeout 40m >/tmp/new
benchstat old new | grep -v big
goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: AMD Ryzen 7 7735H with Radeon Graphics         
                                 │     old     │                 new                 │
                                 │   sec/op    │   sec/op     vs base                │
Div/small/uint256-16               3.121n ± 1%   3.087n ± 1%   -1.07% (p=0.020 n=10)
Div/mod64/uint256-16               21.66n ± 1%   21.41n ± 2%   -1.15% (p=0.001 n=10)
Div/mod128/uint256-16              40.81n ± 0%   37.78n ± 2%   -7.42% (p=0.000 n=10)
Div/mod192/uint256-16              36.58n ± 1%   34.38n ± 1%   -5.99% (p=0.000 n=10)
Div/mod256/uint256-16              28.12n ± 1%   26.75n ± 1%   -4.86% (p=0.000 n=10)
Mod/small/uint256-16               4.300n ± 3%   4.075n ± 2%   -5.23% (p=0.000 n=10)
Mod/mod64/uint256-16               27.47n ± 1%   27.69n ± 2%        ~ (p=0.085 n=10)
Mod/mod128/uint256-16              42.11n ± 2%   39.26n ± 1%   -6.77% (p=0.000 n=10)
Mod/mod192/uint256-16              37.63n ± 2%   35.48n ± 2%   -5.70% (p=0.000 n=10)
Mod/mod256/uint256-16              29.85n ± 1%   28.09n ± 1%   -5.91% (p=0.000 n=10)
AddMod/small/uint256-16            6.307n ± 1%   6.368n ± 0%   +0.97% (p=0.011 n=10)
AddMod/mod64/uint256-16            10.02n ± 2%   10.04n ± 0%        ~ (p=0.422 n=10)
AddMod/mod128/uint256-16           19.04n ± 1%   18.72n ± 1%   -1.68% (p=0.001 n=10)
AddMod/mod192/uint256-16           20.96n ± 1%   20.24n ± 1%   -3.41% (p=0.000 n=10)
AddMod/mod256/uint256-16           6.428n ± 1%   6.479n ± 1%   +0.80% (p=0.011 n=10)
MulMod/small/uint256-16            17.71n ± 1%   17.55n ± 2%        ~ (p=0.078 n=10)
MulMod/mod64/uint256-16            36.68n ± 2%   36.83n ± 1%        ~ (p=0.123 n=10)
MulMod/mod128/uint256-16           57.66n ± 1%   55.88n ± 1%   -3.08% (p=0.000 n=10)
MulMod/mod192/uint256-16           73.38n ± 2%   69.39n ± 1%   -5.44% (p=0.000 n=10)
MulMod/mod256/uint256-16           86.87n ± 1%   87.06n ± 0%        ~ (p=0.404 n=10)
MulMod/mod256/uint256r-16          38.22n ± 2%   37.80n ± 1%   -1.12% (p=0.023 n=10)
SDiv/large/uint256-16              37.59n ± 1%   34.93n ± 1%   -7.08% (p=0.000 n=10)
MulDivOverflow/small/uint256-16    22.60n ± 1%   22.38n ± 2%   -1.00% (p=0.001 n=10)
MulDivOverflow/div64/uint256-16    27.58n ± 1%   27.07n ± 2%   -1.83% (p=0.002 n=10)
MulDivOverflow/div128/uint256-16   43.95n ± 2%   40.23n ± 2%   -8.44% (p=0.000 n=10)
MulDivOverflow/div192/uint256-16   56.92n ± 2%   52.31n ± 2%   -8.09% (p=0.000 n=10)
MulDivOverflow/div256/uint256-16   74.00n ± 1%   65.19n ± 2%  -11.90% (p=0.000 n=10)
geomean                            48.88n        47.08n        -3.68%

Copy link

codecov bot commented May 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (66a4528) to head (7aa34b6).

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #168   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines         1626      1628    +2     
=========================================
+ Hits          1626      1628    +2     

Copy link
Owner

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether it makes any positive improvement or not is a but unclear, I think, because the benchmarks are inherently a bit flaky.
But I'm fairly certain that it can't hurt, and probably a slight improvement, so why not.

Thanks!

@AaronChen0
Copy link
Contributor Author

AaronChen0 commented May 16, 2024

Using godbolt(https://go.godbolt.org/z/4nGefdYrM), I do see the bounds check is moved outside the loop. And the loop is inside another loop, so this PR should help.

If you benchmark using an idle computer (don't benchmark with background jobs running), you definitely would see an upto 7% boost, several ns slashed off.

If the percent number in 37.59n ± 1% is large in benchmark result, that would suggest your computer is busy with other background tasks during benchmarking. Try killing them before benchmarking. You could get a more accurate result.

I have another PR doing loop unrolling which combined with this PR could have an upto 17% boost for div-related functions.

@AaronChen0
Copy link
Contributor Author

AaronChen0 commented May 16, 2024

Should I add loop unrolling in this PR to make the performance boost more obvious, or in another PR?

@holiman
Copy link
Owner

holiman commented May 16, 2024

Should I add loop unrolling in this PR to make the performance boost more obvious, or in another PR?

Let's do that in a separate PR.

@holiman holiman merged commit 11a325c into holiman:master May 16, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants