Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Improve performance of BigInteger.Add/Sub/Div/Rem #1618

Merged
merged 1 commit into from
May 21, 2015

Conversation

axelheer
Copy link
Contributor

@axelheer axelheer commented May 3, 2015

To introduce further performance tweaks, some minor improvements are added to BigIntegerCalculator, which cover all the basic operations. Having these algorithms based on raw uint[] objects will lead to more interesting tweaks of the more fancy stuff (ModPow).

A basic performance comparison based on this code unveils the following results:

Add

# of bits # of vals before ms after ms
16 1,000,000 40 37
64 1,000,000 97 94
256 1,000,000 110 107
1,024 1,000,000 163 152
4,096 100,000 40 37
16,384 100,000 147 122
65,536 100,000 515 440

Subtract

# of bits # of vals before ms after ms
16 1,000,000 44 38
64 1,000,000 106 95
256 1,000,000 128 111
1,024 1,000,000 182 164
4,096 100,000 42 38
16,384 100,000 143 131
65,536 100,000 511 444

Divide

# of bits # of vals before ms after ms
16 1,000,000 49 29
64 1,000,000 125 110
256 1,000,000 349 287
1,024 100,000 215 119
4,096 10,000 254 124
16,384 1,000 378 172
65,536 100 595 263

Note: during making this pull request I noted that all the Debug.Assert aren't doing anything... (?)

Attention: because of using stackalloc I had to reduce an unit test value, which is currently very high and leads to a stack overflow. Would it be better to switch to fixed or should the actual stack size be increased, if someone really needs that huge numbers?
#1307

@stephentoub
Copy link
Member

As can be seen by looking in the CI log, this change is causing multiple asserts when run on Linux:

13:53:27  System.Numerics.Tests.BigIntegerConstructorTest.RunCtorInt64Tests [FAIL]
13:53:28             at System.Diagnostics.Debug.Assert(Boolean condition)
13:53:28             at System.Numerics.BigIntegerCalculator.LeftShift(UInt32* value, Int32 valueLength, UInt32* target, Int32 targetLength, Int32 shift)
13:53:28             at System.Numerics.BigIntegerCalculator.Divide(UInt32* left, Int32 leftLength, UInt32* right, Int32 rightLength, UInt32* quotient, Int32 quotientLength, UInt32* remainder, Int32 remainderLength)
13:53:28             at System.Numerics.BigIntegerCalculator.Divide(UInt32[] left, UInt32[] right)
13:53:28             at System.Numerics.BigInteger.op_Division(BigInteger dividend, BigInteger divisor)
13:53:28             at System.Numerics.Tests.BigIntegerConstructorTest.VerifyBigIntegerUsingIdentities(BigInteger bigInteger, Boolean isZero)
13:53:28             at System.Numerics.Tests.BigIntegerConstructorTest.VerifyCtorInt64(Int64 value)
13:53:28             at System.Numerics.Tests.BigIntegerConstructorTest.RunCtorInt64Tests()
13:53:28   System.Numerics.Tests.BigIntegerConstructorTest.RunCtorUInt64Tests [FAIL]
13:53:28             at System.Diagnostics.Debug.Assert(Boolean condition)
13:53:28             at System.Numerics.BigIntegerCalculator.LeftShift(UInt32* value, Int32 valueLength, UInt32* target, Int32 targetLength, Int32 shift)
13:53:28             at System.Numerics.BigIntegerCalculator.Divide(UInt32* left, Int32 leftLength, UInt32* right, Int32 rightLength, UInt32* quotient, Int32 quotientLength, UInt32* remainder, Int32 remainderLength)
13:53:28             at System.Numerics.BigIntegerCalculator.Divide(UInt32[] left, UInt32[] right)
13:53:28             at System.Numerics.BigInteger.op_Division(BigInteger dividend, BigInteger divisor)
13:53:28             at System.Numerics.Tests.BigIntegerConstructorTest.VerifyBigIntegerUsingIdentities(BigInteger bigInteger, Boolean isZero)
13:53:28             at System.Numerics.Tests.BigIntegerConstructorTest.VerifyCtorUInt64(UInt64 value)
13:53:28             at System.Numerics.Tests.BigIntegerConstructorTest.RunCtorUInt64Tests()
13:53:28   System.Numerics.Tests.BigIntegerConstructorTest.RunCtorSingleTests [FAIL]
13:53:28             at System.Diagnostics.Debug.Assert(Boolean condition)
13:53:28             at System.Numerics.BigIntegerCalculator.LeftShift(UInt32* value, Int32 valueLength, UInt32* target, Int32 targetLength, Int32 shift)
13:53:28             at System.Numerics.BigIntegerCalculator.Divide(UInt32* left, Int32 leftLength, UInt32* right, Int32 rightLength, UInt32* quotient, Int32 quotientLength, UInt32* remainder, Int32 remainderLength)
13:53:28             at System.Numerics.BigIntegerCalculator.Divide(UInt32[] left, UInt32[] right)
13:53:28             at System.Numerics.BigInteger.op_Division(BigInteger dividend, BigInteger divisor)
13:53:28             at System.Numerics.Tests.BigIntegerConstructorTest.VerifyBigIntegerUsingIdentities(BigInteger bigInteger, Boolean isZero)
13:53:28             at System.Numerics.Tests.BigIntegerConstructorTest.VerifyCtorSingle(Single value)
13:53:28             at System.Numerics.Tests.BigIntegerConstructorTest.RunCtorSingleTests()
13:53:28   System.Numerics.Tests.BigIntegerConstructorTest.RunCtorDoubleTests [FAIL]
13:53:28             at System.Diagnostics.Debug.Assert(Boolean condition)
13:53:28             at System.Numerics.BigIntegerCalculator.LeftShift(UInt32* value, Int32 valueLength, UInt32* target, Int32 targetLength, Int32 shift)
13:53:28             at System.Numerics.BigIntegerCalculator.Divide(UInt32* left, Int32 leftLength, UInt32* right, Int32 rightLength, UInt32* quotient, Int32 quotientLength, UInt32* remainder, Int32 remainderLength)
13:53:28             at System.Numerics.BigIntegerCalculator.Divide(UInt32[] left, UInt32[] right)
13:53:28             at System.Numerics.BigInteger.op_Division(BigInteger dividend, BigInteger divisor)
13:53:28             at System.Numerics.Tests.BigIntegerConstructorTest.VerifyBigIntegerUsingIdentities(BigInteger bigInteger, Boolean isZero)
13:53:28             at System.Numerics.Tests.BigIntegerConstructorTest.VerifyCtorDouble(Double value)
13:53:28             at System.Numerics.Tests.BigIntegerConstructorTest.RunCtorDoubleTests()
13:53:28 Finished:    System.Runtime.Tests

@axelheer
Copy link
Contributor Author

axelheer commented May 4, 2015

@stephentoub thanks, I hope it's fixed now. Running msbuild /t:BuildAndTest locally (Windows) doesn't trigger any Debug.Assert statement -- any idea what's wrong with my system?

else
{
uint leftValue = NumericsHelpers.Abs(left._sign),
rightValue = NumericsHelpers.Abs(right._sign);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please declare each variable separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@stephentoub
Copy link
Member

Running msbuild /t:BuildAndTest locally (Windows) doesn't trigger any Debug.Assert statement -- any idea what's wrong with my system?

There was a build bug that was causing some of the Debug.Assert overloads on Windows to be empty, such that they were nops. I thought that was already fixed in the NuGet packages, but either that's not the case (@mellinoe?) or maybe you could try clearing your package cache and rebuilding.

@stephentoub
Copy link
Member

I hope it's fixed now

Yup, appears to be. The CI passed, including for Linux (where Debug.Assert is not broken).

// Since the caller is working with uint[] objects, we're
// doing him a favor with already providing it that way.

uint[] bits = new uint[2];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the caller require that this be two elements, or could left and right be small enough that only one element is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result may have more than 32 bits. Since the caller does already some fancy compression (ctor of BigInteger), there would be no benefit in doing it here too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following why there'd be no benefit. There's an array being allocated here, and it's possible bigger than it needs to be. In general, reducing both the number of allocations and the amount of space allocated leads to better performing applications, as the impact of GC is reduced. Now, it's possible in this situation the cost of the extra check/branch to determine that a smaller array could be used would outweigh the benefit of allocating less, but I've not seen such an analysis,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only try to keep this simple and thought an additional compression before the actual compression doesn't really help. Since (nearly) every other operation (within BigIntegerCalculator and BigIntegerBuilder) allocates the space probably needed (it's not always that easy to know this beforehand), I haven't seen any point to spare a few bits.

I'll make a performance comparison (for the trivial addition and multiplication), and update the code/comments accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally found another approach to handle these trivial cases. Without any additional arrays the appropriate cast operator for long creates the the result -- it's even better than this creating/compressing workflow.

@ghost
Copy link

ghost commented May 10, 2015

👍 IMO, a millisecond save (with no side-effect) in core is worth it.

{
if (Int32.MinValue <= value && value <= Int32.MaxValue)
if (int.MinValue <= value && value <= int.MaxValue)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this section of conditionals be changed to:

if (int.MinValue < value && value <= int.MaxValue)
{
    _sign = (int)value;
    _bits = null;
}
else if (value == int.MinValue)
{
    this = s_bnMinInt;
}
else ...

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@stephentoub
Copy link
Member

I left a few comments, but otherwise LGTM.

Do we already have test cases that cover all of the various inputs that would hit all of the code paths being modified?

@KrzysztofCwalina
Copy link
Member

@axelheer, as to the stack overflow issues: would it make sense to have two internal implementations and switch between them depending on the size? or even provide an explicit API to select between the implementations.

@axelheer
Copy link
Contributor Author

@stephentoub thanks, I'll work on them. Since I added no new functionality I just assumed we've enough tests. Is there any way to do some kind of code coverage analysis?

@KrzysztofCwalina since the operations can be called within code already using much of the limited stack size, I don't know how to handle that automatically. Furthermore, I cannot measure any performance gain in using stackallock instead of fixed. I just thought using the stack is more efficient, but now I think I was wrong. And, if we work without pointers, we'd use ordinary arrays anyway. Hence, I don't see the benefit in using stackallock anymore. What do you think?

@stephentoub
Copy link
Member

@axelheer, you can get code coverage information by going into the tests directory:

cd corefx\src\System.Runtime.Numerics\tests

and running:

msbuild /t:rebuild /t:BuildAndTest /p:Coverage=true

Assuming that completes successfully, you can then view the index.htm page that's dropped into the corefx\bin\tests\coverage directory.

@KrzysztofCwalina
Copy link
Member

Let's just use fixed buffer than. Thanks!

@axelheer
Copy link
Contributor Author

@stephentoub that's a very nice coverage analysis. I've found a corner case of the integer division, which was not covered (added tests for that). We have still 12 not covered lines of code within BigIntegerCalculator, but those cannot be reached at the moment.

  1. the carry handling of AddSelf and SubtractSelf cannot be reached (yet), since they are only used by code, which should not lead to an extra carry
  2. one path of the Compare method for pointers is dead, but just removing it is too error prone for my taste (future code should assume that Compare is working...)
  3. the path for 32-bits of LeadingZeros is dead, but again, removing it is just not correct

@axelheer
Copy link
Contributor Author

@stephentoub is it time to squash my commits or is there anything else I should do?


// ... and finally merge the result! :-)
AddSelf(bits + n, bitsLength - n, core, coreLength);
fixed (uint* fold = new uint[foldLength],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't entirely follow the conversation between you and @KrzysztofCwalina... why did we change the existing stackallocs to instead be array allocations with fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are really large numbers (~ 1MB or more) we'd have a stack overflow here. Performance seems to be the same, not using pointer arithmetic would involve array allocations anyway.

And, there seems to be now way in telling how much we may use of the current stack, since the calling code may rely on it already more or less heavily.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are really large numbers (~ 1MB or more) we'd have a stack overflow here

That's an existing issue in .NET rather than something new that your performance changes have introduced, though, right?

Performance seems to be the same

How are you measuring this, though? Allocations like these often don't show up as having a measurable impact when doing single-threaded performance measurements; it's in the aggregate when you have lots of work happening on a box in parallel, especially server-side, that the impact of the GC from such allocations accumulates and leads to measurable slowdowns. It's possible these particular ones are not problematic, but I'm suspect of that.

not using pointer arithmetic would involve array allocations anyway

I'm not understanding this. We're talking about the difference between stackalloc vs allocating a buffer and using fixed, right? In both cases you end up with a pointer.

@KrzysztofCwalina, you're ok with the change to use allocations and fixed instead of using stackalloc? If so, I am, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an existing issue in .NET rather than something new that your performance changes have introduced, though, right?

Nope, I introduced this issue. Shame on me.

How are you measuring this, though? [...]

As you're pointing out, it's only a dumb single-threaded performance measurement. But since the code leads to stack overflow, we're choosing between putting some pressure on the heap or letting the stack plainly crash. Right?

I'm not understanding this. We're talking about the difference between stackalloc vs allocating a buffer and using fixed, right? In both cases you end up with a pointer.

I just tried to point out, that using ordinary "safe" code (which we don't) we wouldn't have this discussion. Thus, we're "only" not using a rather "exotic" possibility. Sadly, there's nothing in between stackalloc or fixed -- or am I overlooking something?

I'd prefer to ask less of the managed heap too, but since the stack size is only limited and there seems to be no way in telling how much the calling code can spare of the current stack, and since we want to support arbitrarily (!) large integers, I fear it's the only solution I can provide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I introduced this issue

Ah! Ok, then using fixed does seem like the right solution, at least for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few days ago, @KrzysztofCwalina suggested to do both and switch depending on the size. Problem: which size? The calling code may already use the current stack near to its limit or maybe we could nearly use all of it. Deciding based on a fixed size would be easy to implement, but I've no idea how to choose that size.

We could just say 4096 bits (or something like that) is the limit, we'll switch to the heap then. I've really no idea if that's the way to go, but if you recommend it and name a limit, I can push a further commit accordingly.

@stephentoub
Copy link
Member

LGTM, once squashed, and other than my one open question on stackalloc vs fixed. @KrzysztofCwalina?

To introduce further performance tweaks, some minor improvements are
added to `BigIntegerCalculator`, which cover all the basic operations.
Having these algorithms based on raw uint[] objects will lead to more
interesting tweaks of the more fancy stuff (`ModPow`).
@axelheer axelheer force-pushed the biginteger-performance branch from 1c8e621 to be16ac1 Compare May 19, 2015 17:18
@axelheer
Copy link
Contributor Author

@stephentoub the build of the squashed commit has failed, but I can't see why?

@Maxwe11
Copy link
Contributor

Maxwe11 commented May 19, 2015

@Maxwe11
Copy link
Contributor

Maxwe11 commented May 19, 2015

@stephentoub
Copy link
Member

@dotnet-bot test this please

@KrzysztofCwalina
Copy link
Member

Looks good to me too!

@KrzysztofCwalina
Copy link
Member

@dotnet-bot test this please

1 similar comment
@stephentoub
Copy link
Member

@dotnet-bot test this please

KrzysztofCwalina added a commit that referenced this pull request May 21, 2015
Improve performance of BigInteger.Add/Sub/Div/Rem
@KrzysztofCwalina KrzysztofCwalina merged commit 360607a into dotnet:master May 21, 2015
@KrzysztofCwalina
Copy link
Member

Thanks for the contribution! It's a very nice improvement.

@axelheer axelheer deleted the biginteger-performance branch May 22, 2015 12:48
@KrzysztofCwalina KrzysztofCwalina self-assigned this Jun 1, 2015
@KrzysztofCwalina KrzysztofCwalina added bug Product bug (most likely) and removed bug Product bug (most likely) labels Jun 1, 2015
@KrzysztofCwalina KrzysztofCwalina removed their assignment Jun 1, 2015
axelheer added a commit to axelheer/dotnet-corefx that referenced this pull request Jul 9, 2015
- An allocation for BigInteger.DivRem with small divisor is unnecessary,
  which gets fixed. BigIngeger handles these cases better now.
- The square and multiply code submitted with dotnet#1436 used stackalloc,
  which led to stack overflows for huge numbers. With dotnet#1618 these has
  been changed to ordinary array allocations, which put some pressure
  on the managed heap. Thus, a hybrid solution should be better.
axelheer added a commit to axelheer/dotnet-corefx that referenced this pull request Jul 9, 2015
- An allocation for BigInteger.DivRem with small divisor is unnecessary,
  which gets fixed. BigIngeger handles these cases better now.
- The square and multiply code submitted with dotnet#1436 used stackalloc,
  which led to stack overflows for huge numbers. With dotnet#1618 these has
  been changed to ordinary array allocations, which put some pressure
  on the managed heap. Thus, a hybrid solution should be better.
@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…mance

Improve performance of BigInteger.Add/Sub/Div/Rem

Commit migrated from dotnet/corefx@360607a
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
- An allocation for BigInteger.DivRem with small divisor is unnecessary,
  which gets fixed. BigIngeger handles these cases better now.
- The square and multiply code submitted with dotnet/corefx#1436 used stackalloc,
  which led to stack overflows for huge numbers. With dotnet/corefx#1618 these has
  been changed to ordinary array allocations, which put some pressure
  on the managed heap. Thus, a hybrid solution should be better.


Commit migrated from dotnet/corefx@b79a60c
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants