Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of BigInteger.Multiply(large, small) #92208

Merged
merged 2 commits into from
Nov 6, 2023

Conversation

kzrnm
Copy link
Contributor

@kzrnm kzrnm commented Sep 18, 2023

BigInteger.Multiply is based on Karatsuba algorithm. If implemented correctly, the computational complexity of multiply is $\Theta(n^{\log_2 3})$ where n is number of digits.

However, in the current implementation, it is not. This is because it the half of the smaller value is used when the larger one should be.

Benchmark


BenchmarkDotNet v0.13.8, Windows 11 (10.0.22621.2283/22H2/2022Update/SunValley2)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.100-rc.1.23415.11
  [Host]   : .NET 8.0.0 (8.0.23.41404), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.0 (8.0.23.41404), X64 RyuJIT AVX2

Job=ShortRun  Toolchain=.NET 8.0  IterationCount=3  
LaunchCount=1  WarmupCount=3  

Method largeLength smallLength Mean Error StdDev Ratio RatioSD
PR_Multiply 1000 1000 0.0375 ms 0.0442 ms 0.0024 ms 1.37 0.09
Multiply 1000 1000 0.0274 ms 0.0002 ms 0.0000 ms 1.00 0.00
PR_Multiply 10000 1000 0.3143 ms 0.0750 ms 0.0041 ms 0.58 0.01
Multiply 10000 1000 0.5409 ms 0.0160 ms 0.0009 ms 1.00 0.00
PR_Multiply 10000 10000 1.1306 ms 0.2769 ms 0.0152 ms 1.29 0.02
Multiply 10000 10000 0.8797 ms 0.0351 ms 0.0019 ms 1.00 0.00
PR_Multiply 100000 1000 3.0304 ms 0.1180 ms 0.0065 ms 0.52 0.00
Multiply 100000 1000 5.7730 ms 0.6991 ms 0.0383 ms 1.00 0.00
PR_Multiply 100000 10000 11.0477 ms 0.7075 ms 0.0388 ms 0.23 0.00
Multiply 100000 10000 48.9797 ms 1.8604 ms 0.1020 ms 1.00 0.00
PR_Multiply 100000 100000 39.4357 ms 0.2381 ms 0.0131 ms 1.25 0.00
Multiply 100000 100000 31.6607 ms 0.2959 ms 0.0162 ms 1.00 0.00
PR_Multiply 1000000 1000 30.2207 ms 1.4899 ms 0.0817 ms 0.48 0.00
Multiply 1000000 1000 63.0998 ms 10.1509 ms 0.5564 ms 1.00 0.00
PR_Multiply 1000000 10000 120.0384 ms 8.3355 ms 0.4569 ms 0.21 0.00
Multiply 1000000 10000 582.8313 ms 39.3648 ms 2.1577 ms 1.00 0.00
PR_Multiply 1000000 100000 448.9208 ms 3.7954 ms 0.2080 ms 0.09 0.00
Multiply 1000000 100000 5,091.4759 ms 151.9431 ms 8.3285 ms 1.00 0.00
PR_Multiply 1000000 995000 1,608.9999 ms 75.3231 ms 4.1287 ms 1.09 0.00
Multiply 1000000 995000 1,482.7404 ms 58.9864 ms 3.2332 ms 1.00 0.00
PR_Multiply 1000000 1000000 1,627.7224 ms 63.8822 ms 3.5016 ms 1.35 0.01
Multiply 1000000 1000000 1,209.0913 ms 45.2731 ms 2.4816 ms 1.00 0.00
public class BigIntegerMultiplyBenchmark
{
    static ReadOnlySpan<byte> MakeBytes(int length)
    {
        var random = new Random(918);
        var bytes = new byte[length];
        random.NextBytes(bytes);
        return bytes;
    }

    public IEnumerable<object[]> LengthArguments()
    {
        var lengths = new int[] { 1000, 10000, 100000, 1000000 };
        for (int i = lengths.Length - 1; i >= 0; i--)
        {
            if (i == lengths.Length - 1)
            {
                yield return new object[] { lengths[i], (int)(0.995 * lengths[i]), };
            }
            for (int j = i; j >= 0; j--)
            {
                yield return new object[] { lengths[i], lengths[j], };
            }
        }
    }

    [Benchmark]
    [ArgumentsSource(nameof(LengthArguments))]
    public PrBigInteger PR_Multiply(int largeLength, int smallLength)
    {
        return (new PrBigInteger(MakeBytes(smallLength)) * new PrBigInteger(MakeBytes(largeLength)));
    }

    [Benchmark(Baseline = true)]
    [ArgumentsSource(nameof(LengthArguments))]
    public BigInteger Multiply(int largeLength, int smallLength)
    {
        return (new BigInteger(MakeBytes(smallLength)) * new BigInteger(MakeBytes(largeLength)));
    }
}

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Sep 18, 2023
@ghost
Copy link

ghost commented Sep 18, 2023

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

BigInteger.Multiply is based on Karatsuba algorithm. If implemented correctly, the computational complexity of multiply is $\Theta(n^{\log_2 3})$ where n is number of digits.

However, in the current implementation, it is not. This is because it the half of the smaller value is used when the larger one should be.

In this PR, the larger one is used. The reason for using ceiling value is to ensure that rightLow.Length is larger than or equal to rightHigh.Length.

https://github.com/dotnet/runtime/blob/ccc9ccfb51df6c914ae8e51f04e49e1aa8b41a16/src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.SquMul.cs#L214

Benchmark


BenchmarkDotNet v0.13.8, Windows 11 (10.0.22621.2283/22H2/2022Update/SunValley2)
13th Gen Intel Core i5-13500, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.100-rc.1.23415.11
  [Host]   : .NET 8.0.0 (8.0.23.41404), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.0 (8.0.23.41404), X64 RyuJIT AVX2

Job=ShortRun  Toolchain=.NET 8.0  IterationCount=3  
LaunchCount=1  WarmupCount=3  

Method LargeLength SmallLength Mean Error StdDev
PR_Multiply 100000 1000 2.971 ms 0.0429 ms 0.0024 ms
Multiply 100000 1000 5.473 ms 0.0598 ms 0.0033 ms
PR_Multiply 100000 10000 11.227 ms 0.6197 ms 0.0340 ms
Multiply 100000 10000 48.625 ms 0.5162 ms 0.0283 ms
PR_Multiply 100000 100000 40.675 ms 0.7527 ms 0.0413 ms
Multiply 100000 100000 31.146 ms 0.9128 ms 0.0500 ms
PR_Multiply 500000 1000 15.168 ms 0.3428 ms 0.0188 ms
Multiply 500000 1000 28.169 ms 1.0945 ms 0.0600 ms
PR_Multiply 500000 10000 61.152 ms 1.8457 ms 0.1012 ms
Multiply 500000 10000 265.648 ms 9.8196 ms 0.5382 ms
PR_Multiply 500000 100000 232.004 ms 7.7583 ms 0.4253 ms
Multiply 500000 100000 2,119.005 ms 73.9840 ms 4.0553 ms
PR_Multiply 1000000 1000 30.607 ms 2.0196 ms 0.1107 ms
Multiply 1000000 1000 60.214 ms 3.2863 ms 0.1801 ms
PR_Multiply 1000000 10000 122.832 ms 7.3378 ms 0.4022 ms
Multiply 1000000 10000 576.569 ms 104.5699 ms 5.7318 ms
PR_Multiply 1000000 100000 462.791 ms 27.2802 ms 1.4953 ms
Multiply 1000000 100000 5,094.210 ms 349.3127 ms 19.1470 ms
public class Benchmark
{
    [Params(100000, 500000, 1000000)]
    public int LargeLength { get; set; }

    [Params(1000, 10000, 100000)]
    public int SmallLength { get; set; }

    byte[] bytes1, bytes2;

    [GlobalSetup]
    public void Setup()
    {
        var random = new Random(918);
        bytes1 = new byte[LargeLength];
        bytes2 = new byte[SmallLength];
        random.NextBytes(bytes1);
        random.NextBytes(bytes2);
    }

    [Benchmark]
    public PrBigInteger PR_Multiply()
    {
        return (new PrBigInteger(bytes1) * new PrBigInteger(bytes2));
    }

    [Benchmark]
    public BigInteger Multiply()
    {
        return (new BigInteger(bytes1) * new BigInteger(bytes2));
    }
}
Author: kzrnm
Assignees: -
Labels:

area-System.Numerics, community-contribution

Milestone: -

@kzrnm kzrnm force-pushed the fix/BigIntegerMultiply branch 2 times, most recently from 396fa2d to 20c5d77 Compare September 18, 2023 06:38
@tannergooding
Copy link
Member

Could you add the benchmark to https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Runtime.Numerics/Perf.BigInteger.cs (or ensure the existing Multiply benchmark sufficiently covers the scenario)?

Changes in general LGTM, just want to ensure we have some perf numbers before we merge so it can be correctly tracked in our historical data.

kzrnm added a commit to kzrnm/performance that referenced this pull request Sep 21, 2023
kzrnm added a commit to kzrnm/performance that referenced this pull request Sep 21, 2023
kzrnm added a commit to kzrnm/performance that referenced this pull request Sep 21, 2023
cincuranet pushed a commit to dotnet/performance that referenced this pull request Sep 21, 2023
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM.

I've used the benchmarks provided by @kzrnm in dotnet/performance#3361 and run them on my PC. For large inputs, where right is half of the left size the gains are up to 60%. For other test cases the difference is within the range of error.

BenchmarkDotNet v0.13.10-nightly.20231019.90, Windows 11 (10.0.22621.2428/22H2/2022Update/SunValley2)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-alpha.1.23531.2
  [Host]     : .NET 8.0.0 (8.0.23.47906), X64 RyuJIT AVX2
          PR : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
        main : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
Method Job arguments Mean Ratio
Multiply PR 1024,1024 bits 879.885 ns 1.01
Multiply main 1024,1024 bits 868.360 ns 1.00
Multiply PR 1024,512 bits 496.243 ns 1.02
Multiply main 1024,512 bits 484.839 ns 1.00
Multiply PR 16,16 bits 8.754 ns 0.99
Multiply main 16,16 bits 9.107 ns 1.00
Multiply PR 16,8 bits 8.457 ns 0.88
Multiply main 16,8 bits 9.658 ns 1.00
Multiply PR 65536,32768 bits 517,364.993 ns 0.38
Multiply main 65536,32768 bits 1,364,740.916 ns 1.00
Multiply PR 65536,65536 bits 776,079.607 ns 1.01
Multiply main 65536,65536 bits 771,191.496 ns 1.00

Thank you for your contribution @kzrnm !

Comment on lines +204 to +208
ulong carry = 0UL;
for (int j = 0; j < left.Length; j++)
{
ref uint elementPtr = ref Unsafe.Add(ref resultPtr, i + j);
ulong digits = elementPtr + carry + (ulong)left[j] * right[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could most likely get minor improvement by hoisting the result of right[i] (however I am not 100% sure that JIT does not perform this optimization already)

Suggested change
ulong carry = 0UL;
for (int j = 0; j < left.Length; j++)
{
ref uint elementPtr = ref Unsafe.Add(ref resultPtr, i + j);
ulong digits = elementPtr + carry + (ulong)left[j] * right[i];
ulong carry = 0UL;
uint right_i = right[i];
for (int j = 0; j < left.Length; j++)
{
ref uint elementPtr = ref Unsafe.Add(ref resultPtr, i + j);
ulong digits = elementPtr + carry + (ulong)left[j] * right_i;

Comment on lines +277 to +279
upperRight.Clear();

Multiply(left, rightHigh, upperRight);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiply does not use upperRight as an input and it's going to overwrite all values starting from 0 to left.Length:

for ( ; i < left.Length; i++)
{
ulong digits = (ulong)left[i] * right + carry;
bits[i] = unchecked((uint)digits);
carry = digits >> 32;
}
bits[i] = (uint)carry;

So we can reduce the clear to only last element (this span has left.Length + 1 elements)

Suggested change
upperRight.Clear();
Multiply(left, rightHigh, upperRight);
// Multiply has set 0..left.Length elements, the size is left.Length+1
// We need to zero the last element to make sure it does not contain any garbage.
Multiply(left, rightHigh, upperRight);
upperRight[^1] = 0;

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Numerics community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants