Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[resubmit] BigInteger parsing optimization for large decimal string #55121

Merged
merged 16 commits into from
Mar 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
349 changes: 271 additions & 78 deletions src/libraries/System.Runtime.Numerics/src/System/Numerics/BigNumber.cs
Original file line number Diff line number Diff line change
Expand Up @@ -494,23 +494,57 @@ private static bool HexNumberToBigInteger(ref BigNumberBuffer number, out BigInt
}
}

//
// This threshold is for choosing the algorithm to use based on the number of digits.
//
// Let N be the number of digits. If N is less than or equal to the bound, use a naive
// algorithm with a running time of O(N^2). And if it is greater than the threshold, use
// a divide-and-conquer algorithm with a running time of O(NlogN).
//
private static int s_naiveThreshold = 20000;
tannergooding marked this conversation as resolved.
Show resolved Hide resolved
private static bool NumberToBigInteger(ref BigNumberBuffer number, out BigInteger result)
{
Span<uint> stackBuffer = stackalloc uint[BigIntegerCalculator.StackAllocThreshold];
Span<uint> currentBuffer = stackBuffer;
int currentBufferSize = 0;
int[]? arrayFromPool = null;

uint partialValue = 0;
int partialDigitCount = 0;
int totalDigitCount = 0;
int numberScale = number.scale;

const int MaxPartialDigits = 9;
const uint TenPowMaxPartial = 1000000000;

int[]? arrayFromPoolForResultBuffer = null;
if (numberScale < 0)
{
result = default;
return false;
}

try
{
if (number.digits.Length <= s_naiveThreshold)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how big each path is, I would if it would be better to break them into 2 helper methods. Basically leaving:

if (number.digits.Length <= s_naiveThreshold)
{
    AlgorithmA(...);
}
else
{
    AlgorithmB(...);
}

-- The method is getting pretty big, which means the JIT might give up on optimizing it otherwise (haven't confirmed if it actually does).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late replying. I think this is worth doing in terms of improving readability. I will implement it as soon as possible.

Copy link
Contributor Author

@key-moon key-moon Oct 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented and benchmarked. As a result, there is no significant difference in speed and memory. However, readability has definitely been improved.

Benchmark result
  • Job-VEBSQX: before split to methods (a9942c5)
  • Job-HLAVXS: after split to methods (460664f)
BenchmarkDotNet=v0.13.1.1611-nightly, OS=Windows 10.0.22000
11th Gen Intel Core i7-1165G7 2.80GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100-rc.1.21463.6
  [Host]     : .NET 5.0.9 (5.0.921.35908), X64 RyuJIT
  Job-VEBSQX : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT
  Job-HLAVXS : .NET 6.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Method Job Toolchain numberString Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Parse Job-VEBSQX \artifacts-a9942c\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [50000] 10.50 ms 0.350 ms 0.389 ms 10.56 ms 9.902 ms 11.42 ms 1.00 0.00 312.5000 93.7500 - 2 MB
Parse Job-HLAVXS \artifacts-460664\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [50000] 10.29 ms 0.384 ms 0.427 ms 10.34 ms 9.563 ms 11.13 ms 0.98 0.03 312.5000 93.7500 - 2 MB
Parse Job-VEBSQX \artifacts-a9942c\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [100000] 36.77 ms 1.531 ms 1.702 ms 36.58 ms 33.684 ms 40.74 ms 1.00 0.00 1125.0000 250.0000 - 7 MB
Parse Job-HLAVXS \artifacts-460664\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [100000] 36.64 ms 1.445 ms 1.606 ms 36.23 ms 33.401 ms 39.95 ms 1.00 0.07 1166.6667 166.6667 - 7 MB
Parse Job-VEBSQX \artifacts-a9942c\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [150000] 41.08 ms 1.350 ms 1.555 ms 40.75 ms 38.486 ms 44.25 ms 1.00 0.00 1000.0000 500.0000 - 7 MB
Parse Job-HLAVXS \artifacts-460664\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [150000] 40.97 ms 0.742 ms 0.762 ms 41.26 ms 39.155 ms 42.20 ms 1.01 0.04 1000.0000 500.0000 - 7 MB
Parse Job-VEBSQX \artifacts-a9942c\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [200000] 129.14 ms 3.434 ms 3.527 ms 129.89 ms 120.135 ms 133.89 ms 1.00 0.00 4500.0000 500.0000 - 28 MB
Parse Job-HLAVXS \artifacts-460664\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [200000] 130.05 ms 2.571 ms 2.279 ms 130.45 ms 125.832 ms 134.43 ms 1.01 0.04 4500.0000 500.0000 - 28 MB
Parse Job-VEBSQX \artifacts-a9942c\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [250000] 160.84 ms 8.920 ms 9.544 ms 158.21 ms 146.879 ms 179.63 ms 1.00 0.00 5500.0000 500.0000 - 34 MB
Parse Job-HLAVXS \artifacts-460664\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [250000] 162.85 ms 7.046 ms 8.114 ms 161.09 ms 145.608 ms 178.37 ms 1.01 0.08 5500.0000 500.0000 - 34 MB
Parse Job-VEBSQX \artifacts-a9942c\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [300000] 132.28 ms 3.731 ms 3.992 ms 131.79 ms 122.929 ms 137.93 ms 1.00 0.00 3500.0000 1500.0000 1000.0000 24 MB
Parse Job-HLAVXS \artifacts-460664\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [300000] 134.72 ms 5.300 ms 6.103 ms 133.63 ms 123.651 ms 147.27 ms 1.03 0.06 3500.0000 1500.0000 1000.0000 24 MB
Parse Job-VEBSQX \artifacts-a9942c\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [350000] 356.66 ms 20.078 ms 21.484 ms 351.60 ms 321.365 ms 406.99 ms 1.00 0.00 14000.0000 12000.0000 11000.0000 71 MB
Parse Job-HLAVXS \artifacts-460664\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [350000] 354.54 ms 17.497 ms 19.448 ms 348.05 ms 323.012 ms 396.16 ms 0.99 0.07 15000.0000 13000.0000 12000.0000 71 MB
Parse Job-VEBSQX \artifacts-a9942c\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [400000] 524.19 ms 47.271 ms 54.437 ms 510.63 ms 461.465 ms 640.54 ms 1.00 0.00 17000.0000 3000.0000 2000.0000 107 MB
Parse Job-HLAVXS \artifacts-460664\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [400000] 553.00 ms 91.333 ms 105.179 ms 487.60 ms 444.111 ms 710.82 ms 1.06 0.21 17000.0000 3000.0000 2000.0000 107 MB
Parse Job-VEBSQX \artifacts-a9942c\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [450000] 627.40 ms 91.780 ms 105.695 ms 570.04 ms 526.129 ms 800.22 ms 1.00 0.00 24000.0000 1000.0000 - 149 MB
Parse Job-HLAVXS \artifacts-460664\bin\testhost\net6.0-windows-Release-x64\shared\Microsoft.NETCore.App\6.0.0\corerun.exe 12345678901(...)01234567890 [450000] 554.17 ms 49.428 ms 56.921 ms 542.68 ms 472.918 ms 657.23 ms 0.90 0.14 24000.0000 1000.0000 - 149 MB

return Naive(ref number, out result);
}
else
{
return DivideAndConquer(ref number, out result);
}
}
finally
{
if (arrayFromPoolForResultBuffer != null)
{
ArrayPool<int>.Shared.Return(arrayFromPoolForResultBuffer);
}
}

bool Naive(ref BigNumberBuffer number, out BigInteger result)
{
Span<uint> stackBuffer = stackalloc uint[BigIntegerCalculator.StackAllocThreshold];
Span<uint> currentBuffer = stackBuffer;
uint partialValue = 0;
int partialDigitCount = 0;

foreach (ReadOnlyMemory<char> digitsChunk in number.digits.GetChunks())
{
if (!ProcessChunk(digitsChunk.Span, ref currentBuffer))
Expand All @@ -525,6 +559,231 @@ private static bool NumberToBigInteger(ref BigNumberBuffer number, out BigIntege
MultiplyAdd(ref currentBuffer, s_uint32PowersOfTen[partialDigitCount], partialValue);
}

result = NumberBufferToBigInteger(currentBuffer, number.sign);
return true;

bool ProcessChunk(ReadOnlySpan<char> chunkDigits, ref Span<uint> currentBuffer)
{
int remainingIntDigitCount = Math.Max(numberScale - totalDigitCount, 0);
ReadOnlySpan<char> intDigitsSpan = chunkDigits.Slice(0, Math.Min(remainingIntDigitCount, chunkDigits.Length));

bool endReached = false;

// Storing these captured variables in locals for faster access in the loop.
uint _partialValue = partialValue;
int _partialDigitCount = partialDigitCount;
int _totalDigitCount = totalDigitCount;

tannergooding marked this conversation as resolved.
Show resolved Hide resolved
for (int i = 0; i < intDigitsSpan.Length; i++)
{
char digitChar = chunkDigits[i];
if (digitChar == '\0')
{
endReached = true;
break;
}

_partialValue = _partialValue * 10 + (uint)(digitChar - '0');
_partialDigitCount++;
_totalDigitCount++;

// Update the buffer when enough partial digits have been accumulated.
if (_partialDigitCount == MaxPartialDigits)
{
MultiplyAdd(ref currentBuffer, TenPowMaxPartial, _partialValue);
_partialValue = 0;
_partialDigitCount = 0;
}
}

// Check for nonzero digits after the decimal point.
if (!endReached)
{
ReadOnlySpan<char> fracDigitsSpan = chunkDigits.Slice(intDigitsSpan.Length);
for (int i = 0; i < fracDigitsSpan.Length; i++)
{
char digitChar = fracDigitsSpan[i];
if (digitChar == '\0')
{
break;
}
if (digitChar != '0')
{
return false;
}
}
}

partialValue = _partialValue;
partialDigitCount = _partialDigitCount;
totalDigitCount = _totalDigitCount;

return true;
}
}

bool DivideAndConquer(ref BigNumberBuffer number, out BigInteger result)
{
Span<uint> currentBuffer;
int[]? arrayFromPoolForMultiplier = null;
try
{
totalDigitCount = Math.Min(number.digits.Length - 1, numberScale);
int bufferSize = (totalDigitCount + MaxPartialDigits - 1) / MaxPartialDigits;

Span<uint> buffer = new uint[bufferSize];
tannergooding marked this conversation as resolved.
Show resolved Hide resolved
arrayFromPoolForResultBuffer = ArrayPool<int>.Shared.Rent(bufferSize);
Span<uint> newBuffer = MemoryMarshal.Cast<int, uint>(arrayFromPoolForResultBuffer).Slice(0, bufferSize);
newBuffer.Clear();

// Separate every MaxPartialDigits digits and store them in the buffer.
// Buffers are treated as little-endian. That means, the array { 234567890, 1 }
// represents the number 1234567890.
int bufferIndex = bufferSize - 1;
uint currentBlock = 0;
int shiftUntil = (totalDigitCount - 1) % MaxPartialDigits;
int remainingIntDigitCount = totalDigitCount;
foreach (ReadOnlyMemory<char> digitsChunk in number.digits.GetChunks())
{
ReadOnlySpan<char> digitsChunkSpan = digitsChunk.Span;
ReadOnlySpan<char> intDigitsSpan = digitsChunkSpan.Slice(0, Math.Min(remainingIntDigitCount, digitsChunkSpan.Length));

for (int i = 0; i < intDigitsSpan.Length; i++)
{
char digitChar = intDigitsSpan[i];
Debug.Assert(char.IsDigit(digitChar));
currentBlock *= 10;
currentBlock += unchecked((uint)(digitChar - '0'));
if (shiftUntil == 0)
{
buffer[bufferIndex] = currentBlock;
currentBlock = 0;
bufferIndex--;
shiftUntil = MaxPartialDigits;
}
shiftUntil--;
}
remainingIntDigitCount -= intDigitsSpan.Length;
key-moon marked this conversation as resolved.
Show resolved Hide resolved
Debug.Assert(0 <= remainingIntDigitCount);

ReadOnlySpan<char> fracDigitsSpan = digitsChunkSpan.Slice(intDigitsSpan.Length);
tannergooding marked this conversation as resolved.
Show resolved Hide resolved
for (int i = 0; i < fracDigitsSpan.Length; i++)
{
char digitChar = fracDigitsSpan[i];
if (digitChar == '\0')
{
break;
}
if (digitChar != '0')
{
result = default;
return false;
}
}
}
Debug.Assert(currentBlock == 0);
Debug.Assert(bufferIndex == -1);

int blockSize = 1;
arrayFromPoolForMultiplier = ArrayPool<int>.Shared.Rent(blockSize);
Span<uint> multiplier = MemoryMarshal.Cast<int, uint>(arrayFromPoolForMultiplier).Slice(0, blockSize);
multiplier[0] = TenPowMaxPartial;

// This loop is executed ceil(log_2(bufferSize)) times.
while (true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this loop is executed ceil(log_2(bufferSize)), why do you not use a for loop? I think these are better optimized by the JIT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please benchmark before/after to make sure that this is really the case.

{
// merge each block pairs.
// When buffer represents:
// | A | B | C | D |
// Make newBuffer like:
// | A + B * multiplier | C + D * multiplier |
for (int i = 0; i < bufferSize; i += blockSize * 2)
{
Span<uint> curBufffer = buffer.Slice(i);
Span<uint> curNewBuffer = newBuffer.Slice(i);

int len = Math.Min(bufferSize - i, blockSize * 2);
int lowerLen = Math.Min(len, blockSize);
int upperLen = len - lowerLen;
tannergooding marked this conversation as resolved.
Show resolved Hide resolved
if (upperLen != 0)
{
Debug.Assert(blockSize == lowerLen);
Debug.Assert(blockSize == multiplier.Length);
Debug.Assert(multiplier.Length == lowerLen);
BigIntegerCalculator.Multiply(multiplier, curBufffer.Slice(blockSize, upperLen), curNewBuffer.Slice(0, len));
}

long carry = 0;
int j = 0;
for (; j < lowerLen; j++)
{
long digit = (curBufffer[j] + carry) + curNewBuffer[j];
curNewBuffer[j] = unchecked((uint)digit);
carry = digit >> 32;
}
if (carry != 0)
{
while (true)
{
curNewBuffer[j]++;
if (curNewBuffer[j] != 0)
{
break;
}
j++;
}
}
}

Span<uint> tmp = buffer;
buffer = newBuffer;
newBuffer = tmp;
blockSize *= 2;

if (bufferSize <= blockSize)
{
break;
}
newBuffer.Clear();
int[]? arrayToReturn = arrayFromPoolForMultiplier;

arrayFromPoolForMultiplier = ArrayPool<int>.Shared.Rent(blockSize);
Span<uint> newMultiplier = MemoryMarshal.Cast<int, uint>(arrayFromPoolForMultiplier).Slice(0, blockSize);
newMultiplier.Clear();
BigIntegerCalculator.Square(multiplier, newMultiplier);
multiplier = newMultiplier;
if (arrayToReturn is not null)
{
ArrayPool<int>.Shared.Return(arrayToReturn);
}
}

// shrink buffer to the currently used portion.
// First, calculate the rough size of the buffer from the ratio that the number
// of digits follows. Then, shrink the size until there is no more space left.
// The Ratio is calculated as: log_{2^32}(10^9)
const double digitRatio = 0.934292276687070661;
currentBufferSize = Math.Min((int)(bufferSize * digitRatio) + 1, bufferSize);
Debug.Assert(buffer.Length == currentBufferSize || buffer[currentBufferSize] == 0);
while (0 < currentBufferSize && buffer[currentBufferSize - 1] == 0)
{
currentBufferSize--;
}
currentBuffer = buffer.Slice(0, currentBufferSize);
result = NumberBufferToBigInteger(currentBuffer, number.sign);
}
finally
{
if (arrayFromPoolForMultiplier != null)
tannergooding marked this conversation as resolved.
Show resolved Hide resolved
{
ArrayPool<int>.Shared.Return(arrayFromPoolForMultiplier);
}
}
return true;
}

BigInteger NumberBufferToBigInteger(Span<uint> currentBuffer, bool signa)
{
int trailingZeroCount = numberScale - totalDigitCount;

while (trailingZeroCount >= MaxPartialDigits)
Expand All @@ -548,85 +807,19 @@ private static bool NumberToBigInteger(ref BigNumberBuffer number, out BigIntege
}
else if (currentBufferSize == 1 && currentBuffer[0] <= int.MaxValue)
{
sign = (int)(number.sign ? -currentBuffer[0] : currentBuffer[0]);
sign = (int)(signa ? -currentBuffer[0] : currentBuffer[0]);
bits = null;
}
else
{
sign = number.sign ? -1 : 1;
sign = signa ? -1 : 1;
bits = currentBuffer.Slice(0, currentBufferSize).ToArray();
}

result = new BigInteger(sign, bits);
return true;
}
finally
{
if (arrayFromPool != null)
{
ArrayPool<int>.Shared.Return(arrayFromPool);
}
}

bool ProcessChunk(ReadOnlySpan<char> chunkDigits, ref Span<uint> currentBuffer)
{
int remainingIntDigitCount = Math.Max(numberScale - totalDigitCount, 0);
ReadOnlySpan<char> intDigitsSpan = chunkDigits.Slice(0, Math.Min(remainingIntDigitCount, chunkDigits.Length));

bool endReached = false;

// Storing these captured variables in locals for faster access in the loop.
uint _partialValue = partialValue;
int _partialDigitCount = partialDigitCount;
int _totalDigitCount = totalDigitCount;

for (int i = 0; i < intDigitsSpan.Length; i++)
{
char digitChar = chunkDigits[i];
if (digitChar == '\0')
{
endReached = true;
break;
}

_partialValue = _partialValue * 10 + (uint)(digitChar - '0');
_partialDigitCount++;
_totalDigitCount++;

// Update the buffer when enough partial digits have been accumulated.
if (_partialDigitCount == MaxPartialDigits)
{
MultiplyAdd(ref currentBuffer, TenPowMaxPartial, _partialValue);
_partialValue = 0;
_partialDigitCount = 0;
}
}

// Check for nonzero digits after the decimal point.
if (!endReached)
{
ReadOnlySpan<char> fracDigitsSpan = chunkDigits.Slice(intDigitsSpan.Length);
for (int i = 0; i < fracDigitsSpan.Length; i++)
{
char digitChar = fracDigitsSpan[i];
if (digitChar == '\0')
{
break;
}
if (digitChar != '0')
{
return false;
}
}
}

partialValue = _partialValue;
partialDigitCount = _partialDigitCount;
totalDigitCount = _totalDigitCount;

return true;
return new BigInteger(sign, bits);
}

// This function should only be used for result buffer.
void MultiplyAdd(ref Span<uint> currentBuffer, uint multiplier, uint addValue)
{
Span<uint> curBits = currentBuffer.Slice(0, currentBufferSize);
Expand All @@ -646,10 +839,10 @@ void MultiplyAdd(ref Span<uint> currentBuffer, uint multiplier, uint addValue)

if (currentBufferSize == currentBuffer.Length)
{
int[]? arrayToReturn = arrayFromPool;
int[]? arrayToReturn = arrayFromPoolForResultBuffer;

arrayFromPool = ArrayPool<int>.Shared.Rent(checked(currentBufferSize * 2));
Span<uint> newBuffer = MemoryMarshal.Cast<int, uint>(arrayFromPool);
arrayFromPoolForResultBuffer = ArrayPool<int>.Shared.Rent(checked(currentBufferSize * 2));
Span<uint> newBuffer = MemoryMarshal.Cast<int, uint>(arrayFromPoolForResultBuffer);
currentBuffer.CopyTo(newBuffer);
currentBuffer = newBuffer;

Expand Down
Loading