Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format/Parse binary from/to BigInteger #85392

Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
3c127ea
Initial draft commit: add FormatBigIntegerToBin().
lateapexearlyspeed Apr 26, 2023
64a0ead
Fix comment: use '?:' to assign ValueStringBuilder variable to make i…
lateapexearlyspeed Apr 28, 2023
4216498
Refine FormatBigIntegerToBin(); and consider chars overflow scenario.
lateapexearlyspeed Apr 29, 2023
1bcf7f8
Update Format code for final binary format definition.
lateapexearlyspeed May 15, 2023
63923da
Refine FormatBigIntegerToBin().
lateapexearlyspeed May 23, 2023
19c701d
consider case where output is span
lateapexearlyspeed May 25, 2023
56e701f
Turn to use try..finally to return array pool.
lateapexearlyspeed May 27, 2023
817c58c
Initial add method BinNumberToBigInteger().
lateapexearlyspeed May 27, 2023
24d88c7
Update FormatProvider.Number.cs to support AllowBinarySpecifier.
lateapexearlyspeed May 30, 2023
5a90e15
Use BinNumberToBigInteger().
lateapexearlyspeed May 30, 2023
3987f71
Add tests of Format.
lateapexearlyspeed May 30, 2023
8b58eb7
Add tests of Parse().
lateapexearlyspeed Jun 1, 2023
6cea91a
Improve Format(): use ValueStringBuilder just as wrapper for destinat…
lateapexearlyspeed Jun 1, 2023
3007b48
Fix comment: use ch == '0' || ch == '1'
lateapexearlyspeed Jun 2, 2023
cd0a03d
Fix comment: refactor ParseNumber() to extract common abstract operat…
lateapexearlyspeed Aug 25, 2023
f22d2e9
Fix comment: refine naming; make BinNumberToBigInteger() general patt…
lateapexearlyspeed Sep 1, 2023
17434aa
Fix comment: use internal 'kcbitUint'.
lateapexearlyspeed Sep 15, 2023
bdfddf5
Fix comment: rename 'Bin' method names to 'Binary' ones; remove unnec…
lateapexearlyspeed Oct 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -440,11 +440,11 @@ private static unsafe bool ParseNumber(ref char* str, char* strEnd, NumberStyles
int digEnd = 0;
while (true)
{
if (char.IsAsciiDigit(ch) || (((options & NumberStyles.AllowHexSpecifier) != 0) && char.IsBetween((char)(ch | 0x20), 'a', 'f')))
if ((options & NumberStyles.AllowBinarySpecifier) == 0 ? char.IsAsciiDigit(ch) || ((options & NumberStyles.AllowHexSpecifier) != 0 && char.IsBetween((char)(ch | 0x20), 'a', 'f')) : char.IsBetween(ch, '0', '1'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that last bit of the ternary be better as : (char == '0' || char == '1) instead of the char.IsBetween(ch, '0', '1')? Seems we're really driving toward optimal performance :)

Copy link
Contributor Author

@lateapexearlyspeed lateapexearlyspeed Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also struggled on these 2, but selected to use char.IsBetween() based on its implementation because it can always just use one check but need 2 arithmetic operation..

Np, changed to (char == '0' || char == '1) now, thanks.

{
state |= StateDigits;

if (ch != '0' || (state & StateNonZero) != 0 || (bigNumber && ((options & NumberStyles.AllowHexSpecifier) != 0)))
if (ch != '0' || (state & StateNonZero) != 0 || (bigNumber && (options & (NumberStyles.AllowHexSpecifier | NumberStyles.AllowBinarySpecifier)) != 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we hoist this options test and the ones on line 443 to outside the loop so they are only bool flags by the time we're inside the loop?

Perhaps insert around line 440...

const bool allowBinary = (options & NumberStyles.AllowBinarySpecifier) != 0 ;
const bool allowHex = (options & NumberStyles.AllowHexSpecifier) != 0 ;
const bool allowLeadingZero = allowBinary || allowHex;

Then inside the loop, at line 443 looks like

if (allowBinary
    ? (char == '0' || char == '1)
    : char.IsAsciiDigit(ch) || (allowHex && char.IsBetween((char)(ch | 0x20), 'a', 'f')))

and line 447 becomes

if (ch != '0' || (state & StateNonZero) != 0 || (bigNumber && allowLeadingZero))

The same thing could be done for options tested elsewhere inside the loop if performance testing makes that wise...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure what's going on with the bigNumber && here. This flag is set to indicate that we're backed by a StringBuilder, not that we're parsing a particular kind of number...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love all the new tests :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we hoist this options test

Got your point. As you found all the existing options tests code were inside whole loop, and during run time some option tests may happen only when data is met, so not sure whether we should hoist 'options' test which will turn to be called once but always happen. That is why I did not change this convention :)

I am also not sure what's going on with the bigNumber && here.

I am also not sure, so that I just added Binary stuff behind it like Hex :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting it apart for readability is generally helpful. The logic is getting quite verbose to have on a single line IMO.

For the primitive types, we have it split apart into some early checks that go to TryParseBinaryInteger*Style: https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/Number.Parsing.cs,437

The check for "IsValidChar" can then be specialized by use of the generics and made a lot cheaper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding Fixed. After introducing Hex/Bin parser types, here simplifies not only prev line 447, but also prev line 443 which involved hex/binary checks previously. Please check, thanks.

{
if (digCount < maxParseDigits)
{
Expand Down
202 changes: 198 additions & 4 deletions src/libraries/System.Runtime.Numerics/src/System/Numerics/BigNumber.cs
Original file line number Diff line number Diff line change
Expand Up @@ -285,7 +285,8 @@ internal static class BigNumber
| NumberStyles.AllowLeadingSign | NumberStyles.AllowTrailingSign
| NumberStyles.AllowParentheses | NumberStyles.AllowDecimalPoint
| NumberStyles.AllowThousands | NumberStyles.AllowExponent
| NumberStyles.AllowCurrencySymbol | NumberStyles.AllowHexSpecifier);
| NumberStyles.AllowCurrencySymbol | NumberStyles.AllowHexSpecifier
| NumberStyles.AllowBinarySpecifier);

private static ReadOnlySpan<uint> UInt32PowersOfTen => new uint[] { 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000 };

Expand Down Expand Up @@ -371,10 +372,13 @@ internal static ParsingStatus TryParseBigInteger(ReadOnlySpan<char> value, Numbe
{
return HexNumberToBigInteger(ref bigNumber, out result);
}
else

if ((style & NumberStyles.AllowBinarySpecifier) != 0)
{
return NumberToBigInteger(ref bigNumber, out result);
return BinNumberToBigInteger(ref bigNumber, out result);
}

return NumberToBigInteger(ref bigNumber, out result);
}

internal static BigInteger ParseBigInteger(string value, NumberStyles style, NumberFormatInfo info)
Expand Down Expand Up @@ -511,6 +515,96 @@ private static ParsingStatus HexNumberToBigInteger(ref BigNumberBuffer number, o
}
}

private static ParsingStatus BinNumberToBigInteger(ref BigNumberBuffer number, out BigInteger result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we expand "Bin" to "Binary" in the name? I realize this is likely an attempt to match the conciseness of "Hex", but "Bin" and "Big" are so close that this keeps making me do a double-take to know which it was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed method names, also for other 'Bin' related ones.

{
if (number.digits is null || number.digits.Length < 2)
{
result = default;
return ParsingStatus.Failed;
}

const int DigitsPerBlock = 32;
tannergooding marked this conversation as resolved.
Show resolved Hide resolved

int totalDigitCount = number.digits.Length - 1; // Ignore trailing '\0'

int blockCount = Math.DivRem(totalDigitCount, DigitsPerBlock, out int remainingDigitsInBlock);
tannergooding marked this conversation as resolved.
Show resolved Hide resolved
if (remainingDigitsInBlock == 0)
{
remainingDigitsInBlock = DigitsPerBlock;
}
else
{
blockCount++;
}

Debug.Assert(number.digits[0] is '0' or '1');
bool isNegative = number.digits[0] == '1';

uint[]? arrayFromPool = null;
Span<uint> bufferSpan = blockCount <= BigIntegerCalculator.StackAllocThreshold ?
stackalloc uint[blockCount] : (arrayFromPool = ArrayPool<uint>.Shared.Rent(blockCount)).AsSpan(0, blockCount);
tannergooding marked this conversation as resolved.
Show resolved Hide resolved

try
{
uint currentBlock = isNegative ? 0xFF_FF_FF_FF : 0x0;
int bufferPos = blockCount;
foreach (ReadOnlyMemory<char> chunkMem in number.digits.GetChunks())
tannergooding marked this conversation as resolved.
Show resolved Hide resolved
{
ReadOnlySpan<char> chunk = chunkMem.Span;
foreach (char c in chunk)
{
if (c == '\0')
{
break;
}

Debug.Assert(c is '0' or '1');
currentBlock = (currentBlock << 1) | (uint)(c - '0');

if (--remainingDigitsInBlock == 0)
{
bufferSpan[--bufferPos] = currentBlock;
remainingDigitsInBlock = DigitsPerBlock;

// we do not need to reset currentBlock now, because it should always set all its bits by left shift in subsequent iterations
}
}

Debug.Assert(bufferPos > 0 || remainingDigitsInBlock == DigitsPerBlock);
}

Debug.Assert(bufferPos == 0 && remainingDigitsInBlock == DigitsPerBlock);

if (isNegative)
{
NumericsHelpers.DangerousMakeTwosComplement(bufferSpan);
}

bufferSpan = bufferSpan.TrimEnd(0u);
if (bufferSpan.IsEmpty)
{
result = BigInteger.Zero;
}
else if (bufferSpan.Length == 1 && bufferSpan[0] <= int.MaxValue)
{
result = new BigInteger((int)(isNegative ? -bufferSpan[0] : bufferSpan[0]), (uint[]?)null);
}
else
{
result = new BigInteger(isNegative ? -1 : 1, bufferSpan.ToArray());
}

return ParsingStatus.OK;
}
finally
{
if (arrayFromPool is not null)
{
ArrayPool<uint>.Shared.Return(arrayFromPool);
}
}
}

//
// This threshold is for choosing the algorithm to use based on the number of digits.
//
Expand Down Expand Up @@ -1002,6 +1096,103 @@ internal static char ParseFormatSpecifier(ReadOnlySpan<char> format, out int dig
}
}

private static string? FormatBigIntegerToBin(bool targetSpan, BigInteger value, int digits, Span<char> destination, out int charsWritten, out bool spanSuccess)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike existing FormatBigIntegerToHex(), this FormatBigIntegerToBin() is implemented by calculating required char length before format, this can:

  • avoids from extending ValueStringBuilder's capacity during append char;
  • for targetSpan flow, formatted chars can write to wanted destination span directly rather than allocate buffer in ValueStringBuilder and copy to destination at the end

Please give advice if not proper, thanks !

{
// Get the bytes that make up the BigInteger.
byte[]? arrayToReturnToPool = null;
Span<byte> bytes = stackalloc byte[64]; // arbitrary threshold
stephentoub marked this conversation as resolved.
Show resolved Hide resolved
if (!value.TryWriteOrCountBytes(bytes, out int bytesWrittenOrNeeded))
{
bytes = arrayToReturnToPool = ArrayPool<byte>.Shared.Rent(bytesWrittenOrNeeded);
bool success = value.TryWriteBytes(bytes, out _);
Debug.Assert(success);
stephentoub marked this conversation as resolved.
Show resolved Hide resolved
}
bytes = bytes.Slice(0, bytesWrittenOrNeeded);

Debug.Assert(!bytes.IsEmpty);

byte highByte = bytes[^1];

int charsInHighByte = 9 - byte.LeadingZeroCount(value._sign >= 0 ? highByte : (byte)~highByte);
long tmpCharCount = charsInHighByte + ((long)(bytes.Length - 1) << 3);

if (tmpCharCount > Array.MaxLength)
{
Debug.Assert(arrayToReturnToPool is not null);
ArrayPool<byte>.Shared.Return(arrayToReturnToPool);

throw new FormatException(SR.Format_TooLarge);
}

int charsForBits = (int)tmpCharCount;

Debug.Assert(digits < Array.MaxLength);
int charsIncludeDigits = Math.Max(digits, charsForBits);

try
{
scoped ValueStringBuilder sb;
if (targetSpan)
{
if (charsIncludeDigits > destination.Length)
{
charsWritten = 0;
spanSuccess = false;
return null;
}

// Because we have ensured destination can take actual char length, so now just use ValueStringBuilder as wrapper so that subsequent logic can be reused by 2 flows (targetSpan and non-targetSpan);
// meanwhile there is no need to copy to destination again after format data for targetSpan flow.
sb = new ValueStringBuilder(destination);
}
else
{
// each byte is typically eight chars
sb = charsIncludeDigits > 512 ? new ValueStringBuilder(charsIncludeDigits) : new ValueStringBuilder(stackalloc char[charsIncludeDigits]);
}

if (digits > charsForBits)
{
sb.Append(value._sign >= 0 ? '0' : '1', digits - charsForBits);
}

AppendByte(ref sb, highByte, charsInHighByte - 1);

for (int i = bytes.Length - 2; i >= 0; i--)
{
AppendByte(ref sb, bytes[i]);
}

Debug.Assert(sb.Length == charsIncludeDigits);

if (targetSpan)
{
charsWritten = charsIncludeDigits;
spanSuccess = true;
return null;
}

charsWritten = 0;
spanSuccess = false;
return sb.ToString();
}
finally
{
if (arrayToReturnToPool is not null)
{
ArrayPool<byte>.Shared.Return(arrayToReturnToPool);
}
}

static void AppendByte(ref ValueStringBuilder sb, byte b, int startHighBit = 7)
{
for (int i = startHighBit; i >= 0; i--)
{
sb.Append((char)('0' + ((b >> i) & 0x1)));
}
}
}

internal static string FormatBigInteger(BigInteger value, string? format, NumberFormatInfo info)
{
return FormatBigInteger(targetSpan: false, value, format, format, info, default, out _, out _)!;
Expand All @@ -1026,7 +1217,10 @@ internal static bool TryFormatBigInteger(BigInteger value, ReadOnlySpan<char> fo
{
return FormatBigIntegerToHex(targetSpan, value, fmt, digits, info, destination, out charsWritten, out spanSuccess);
}

if (fmt == 'b' || fmt == 'B')
{
return FormatBigIntegerToBin(targetSpan, value, digits, destination, out charsWritten, out spanSuccess);
}

if (value._bits == null)
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ public static void RunProviderToStringTests()
RunSimpleProviderToStringTests(s_random, "N", nfi, nfi.NumberDecimalDigits, NumberFormatter);
RunSimpleProviderToStringTests(s_random, "P", nfi, nfi.PercentDecimalDigits, PercentFormatter);
RunSimpleProviderToStringTests(s_random, "X", nfi, 0, HexFormatter);
RunSimpleProviderToStringTests(s_random, "B", nfi, 0, BinFormatter);
RunSimpleProviderToStringTests(s_random, "R", nfi, 0, DecimalFormatter);
}

Expand Down Expand Up @@ -186,6 +187,17 @@ public static void RunStandardFormatToStringTests()
RunStandardFormatToStringTests_Helper(s_random, "x100", CultureInfo.CurrentCulture.NumberFormat.NegativeSign, -100, HexFormatter);
RunStandardFormatToStringTests_Helper(s_random, "x" + intMaxPlus1String, CultureInfo.CurrentCulture.NumberFormat.NegativeSign, -101, HexFormatter, true);

// Binary
RunStandardFormatToStringTests_Helper(s_random, "B", CultureInfo.CurrentCulture.NumberFormat.NegativeSign, 0, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "B0", CultureInfo.CurrentCulture.NumberFormat.NegativeSign, 0, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "b1", CultureInfo.CurrentCulture.NumberFormat.NegativeSign, 1, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "B2", CultureInfo.CurrentCulture.NumberFormat.NegativeSign, 2, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "b5", CultureInfo.CurrentCulture.NumberFormat.NegativeSign, 5, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "B33", CultureInfo.CurrentCulture.NumberFormat.NegativeSign, 33, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "b99", CultureInfo.CurrentCulture.NumberFormat.NegativeSign, 99, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "b100", CultureInfo.CurrentCulture.NumberFormat.NegativeSign, 100, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "b" + intMaxPlus1String, CultureInfo.CurrentCulture.NumberFormat.NegativeSign, 101, BinFormatter, true);

// RoundTrip
RunStandardFormatToStringTests_Helper(s_random, "R", CultureInfo.CurrentCulture.NumberFormat.NegativeSign, 0, DecimalFormatter);
RunStandardFormatToStringTests_Helper(s_random, "R0", CultureInfo.CurrentCulture.NumberFormat.NegativeSign, 0, DecimalFormatter);
Expand Down Expand Up @@ -308,6 +320,17 @@ public static void RunRegionSpecificStandardFormatToStringTests()
RunStandardFormatToStringTests_Helper(s_random, "x100", culture.NumberFormat.NegativeSign, -100, HexFormatter);
RunStandardFormatToStringTests_Helper(s_random, "x" + intMaxPlus1String, culture.NumberFormat.NegativeSign, -101, HexFormatter, true);

// Binary
RunStandardFormatToStringTests_Helper(s_random, "B", culture.NumberFormat.NegativeSign, 0, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "B0", culture.NumberFormat.NegativeSign, 0, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "b1", culture.NumberFormat.NegativeSign, 1, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "B2", culture.NumberFormat.NegativeSign, 2, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "b5", culture.NumberFormat.NegativeSign, 5, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "B33", culture.NumberFormat.NegativeSign, 33, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "b99", culture.NumberFormat.NegativeSign, 99, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "b100", culture.NumberFormat.NegativeSign, 100, BinFormatter);
RunStandardFormatToStringTests_Helper(s_random, "b" + intMaxPlus1String, culture.NumberFormat.NegativeSign, 101, BinFormatter, true);

// RoundTrip
RunStandardFormatToStringTests_Helper(s_random, "R", culture.NumberFormat.NegativeSign, 0, DecimalFormatter);
RunStandardFormatToStringTests_Helper(s_random, "R0", culture.NumberFormat.NegativeSign, 0, DecimalFormatter);
Expand Down Expand Up @@ -1023,6 +1046,23 @@ private static string PercentFormatter(string input, int precision, NumberFormat
return pre + GroupFormatDigits(input, nfi.PercentGroupSeparator, nfi.PercentGroupSizes, nfi.PercentDecimalSeparator, precision) + post;
}

private static string BinFormatter(string input, int precision, NumberFormatInfo nfi)
{
string output = ConvertDecimalToBin(input, nfi);

if (output[0] == '1')
{
output = OneString(precision - output.Length) + output;
}
else
{
Debug.Assert(output[0] == '0');
output = ZeroString(precision - output.Length) + output;
}

return output;
}

private static string HexFormatter(string input, int precision, NumberFormatInfo nfi)
{
bool upper = true;
Expand Down Expand Up @@ -1634,7 +1674,7 @@ private static string GetHexDigitSequence(int min, int max, Random random)
}
private static string GetRandomInvalidFormatChar(Random random)
{
char[] digits = new char[] { 'C', 'c', 'D', 'd', 'E', 'e', 'F', 'f', 'G', 'g', 'N', 'n', 'P', 'p', 'X', 'x', 'R', 'r' };
char[] digits = new char[] { 'C', 'c', 'D', 'd', 'E', 'e', 'F', 'f', 'G', 'g', 'N', 'n', 'P', 'p', 'X', 'x', 'B', 'b', 'R', 'r' };
char result = 'C';
while (result == 'C')
{
Expand Down Expand Up @@ -1763,6 +1803,34 @@ private static string ConvertDecimalToHex(string input, bool upper, NumberFormat
return output;
}

private static string ConvertDecimalToBin(string input, NumberFormatInfo nfi)
{
string output = string.Empty;
BigInteger bi = BigInteger.Parse(input, nfi);
byte[] bytes = bi.ToByteArray();
int[] chars = new int[bytes.Length * 8];
for (int i = 0; i < bytes.Length; i++)
{
chars[i * 8] = bytes[i] % 2;
chars[i * 8 + 1] = (bytes[i] / 2) % 2;
chars[i * 8 + 2] = (bytes[i] / 4) % 2;
chars[i * 8 + 3] = (bytes[i] / 8) % 2;
chars[i * 8 + 4] = (bytes[i] / 16) % 2;
chars[i * 8 + 5] = (bytes[i] / 32) % 2;
chars[i * 8 + 6] = (bytes[i] / 64) % 2;
chars[i * 8 + 7] = (bytes[i] / 128) % 2;
}

ReadOnlySpan<int> trimmedChars = chars.AsSpan(0, chars.AsSpan().LastIndexOf(chars[^1] == 0 ? 1 : 0) + 2);

for (int i = trimmedChars.Length - 1; i >= 0; i--)
{
output = $"{output}{trimmedChars[i]}";
}

return output;
}

private static string ConvertToExp(string input, int places)
{
char[] temp = input.Substring(0, places + 2).ToCharArray();
Expand Down Expand Up @@ -1938,6 +2006,11 @@ private static string ZeroString(int size)
return size >= 1 ? new string('0', size) : string.Empty;
}

private static string OneString(int size)
{
return size >= 1 ? new string('1', size) : string.Empty;
}

private static string FString(int size, bool upper)
{
string ret = string.Empty;
Expand Down
Loading