-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Improve performance of BigInteger.Add/Sub/Div/Rem #1618
Improve performance of BigInteger.Add/Sub/Div/Rem #1618
Conversation
As can be seen by looking in the CI log, this change is causing multiple asserts when run on Linux:
|
@stephentoub thanks, I hope it's fixed now. Running |
else | ||
{ | ||
uint leftValue = NumericsHelpers.Abs(left._sign), | ||
rightValue = NumericsHelpers.Abs(right._sign); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please declare each variable separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a build bug that was causing some of the Debug.Assert overloads on Windows to be empty, such that they were nops. I thought that was already fixed in the NuGet packages, but either that's not the case (@mellinoe?) or maybe you could try clearing your package cache and rebuilding. |
Yup, appears to be. The CI passed, including for Linux (where Debug.Assert is not broken). |
// Since the caller is working with uint[] objects, we're | ||
// doing him a favor with already providing it that way. | ||
|
||
uint[] bits = new uint[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the caller require that this be two elements, or could left and right be small enough that only one element is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result may have more than 32 bits. Since the caller does already some fancy compression (ctor of BigInteger
), there would be no benefit in doing it here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following why there'd be no benefit. There's an array being allocated here, and it's possible bigger than it needs to be. In general, reducing both the number of allocations and the amount of space allocated leads to better performing applications, as the impact of GC is reduced. Now, it's possible in this situation the cost of the extra check/branch to determine that a smaller array could be used would outweigh the benefit of allocating less, but I've not seen such an analysis,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only try to keep this simple and thought an additional compression before the actual compression doesn't really help. Since (nearly) every other operation (within BigIntegerCalculator
and BigIntegerBuilder
) allocates the space probably needed (it's not always that easy to know this beforehand), I haven't seen any point to spare a few bits.
I'll make a performance comparison (for the trivial addition and multiplication), and update the code/comments accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally found another approach to handle these trivial cases. Without any additional arrays the appropriate cast operator for long
creates the the result -- it's even better than this creating/compressing workflow.
👍 IMO, a millisecond save (with no side-effect) in core is worth it. |
{ | ||
if (Int32.MinValue <= value && value <= Int32.MaxValue) | ||
if (int.MinValue <= value && value <= int.MaxValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this section of conditionals be changed to:
if (int.MinValue < value && value <= int.MaxValue)
{
_sign = (int)value;
_bits = null;
}
else if (value == int.MinValue)
{
this = s_bnMinInt;
}
else ...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
I left a few comments, but otherwise LGTM. Do we already have test cases that cover all of the various inputs that would hit all of the code paths being modified? |
@axelheer, as to the stack overflow issues: would it make sense to have two internal implementations and switch between them depending on the size? or even provide an explicit API to select between the implementations. |
@stephentoub thanks, I'll work on them. Since I added no new functionality I just assumed we've enough tests. Is there any way to do some kind of code coverage analysis? @KrzysztofCwalina since the operations can be called within code already using much of the limited stack size, I don't know how to handle that automatically. Furthermore, I cannot measure any performance gain in using |
@axelheer, you can get code coverage information by going into the tests directory:
and running:
Assuming that completes successfully, you can then view the |
Let's just use fixed buffer than. Thanks! |
@stephentoub that's a very nice coverage analysis. I've found a corner case of the integer division, which was not covered (added tests for that). We have still 12 not covered lines of code within
|
@stephentoub is it time to squash my commits or is there anything else I should do? |
|
||
// ... and finally merge the result! :-) | ||
AddSelf(bits + n, bitsLength - n, core, coreLength); | ||
fixed (uint* fold = new uint[foldLength], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't entirely follow the conversation between you and @KrzysztofCwalina... why did we change the existing stackallocs to instead be array allocations with fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are really large numbers (~ 1MB or more) we'd have a stack overflow here. Performance seems to be the same, not using pointer arithmetic would involve array allocations anyway.
And, there seems to be now way in telling how much we may use of the current stack, since the calling code may rely on it already more or less heavily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are really large numbers (~ 1MB or more) we'd have a stack overflow here
That's an existing issue in .NET rather than something new that your performance changes have introduced, though, right?
Performance seems to be the same
How are you measuring this, though? Allocations like these often don't show up as having a measurable impact when doing single-threaded performance measurements; it's in the aggregate when you have lots of work happening on a box in parallel, especially server-side, that the impact of the GC from such allocations accumulates and leads to measurable slowdowns. It's possible these particular ones are not problematic, but I'm suspect of that.
not using pointer arithmetic would involve array allocations anyway
I'm not understanding this. We're talking about the difference between stackalloc vs allocating a buffer and using fixed, right? In both cases you end up with a pointer.
@KrzysztofCwalina, you're ok with the change to use allocations and fixed instead of using stackalloc? If so, I am, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an existing issue in .NET rather than something new that your performance changes have introduced, though, right?
Nope, I introduced this issue. Shame on me.
How are you measuring this, though? [...]
As you're pointing out, it's only a dumb single-threaded performance measurement. But since the code leads to stack overflow, we're choosing between putting some pressure on the heap or letting the stack plainly crash. Right?
I'm not understanding this. We're talking about the difference between stackalloc vs allocating a buffer and using fixed, right? In both cases you end up with a pointer.
I just tried to point out, that using ordinary "safe" code (which we don't) we wouldn't have this discussion. Thus, we're "only" not using a rather "exotic" possibility. Sadly, there's nothing in between stackalloc
or fixed
-- or am I overlooking something?
I'd prefer to ask less of the managed heap too, but since the stack size is only limited and there seems to be no way in telling how much the calling code can spare of the current stack, and since we want to support arbitrarily (!) large integers, I fear it's the only solution I can provide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I introduced this issue
Ah! Ok, then using fixed does seem like the right solution, at least for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few days ago, @KrzysztofCwalina suggested to do both and switch depending on the size. Problem: which size? The calling code may already use the current stack near to its limit or maybe we could nearly use all of it. Deciding based on a fixed size would be easy to implement, but I've no idea how to choose that size.
We could just say 4096 bits (or something like that) is the limit, we'll switch to the heap then. I've really no idea if that's the way to go, but if you recommend it and name a limit, I can push a further commit accordingly.
LGTM, once squashed, and other than my one open question on stackalloc vs fixed. @KrzysztofCwalina? |
To introduce further performance tweaks, some minor improvements are added to `BigIntegerCalculator`, which cover all the basic operations. Having these algorithms based on raw uint[] objects will lead to more interesting tweaks of the more fancy stuff (`ModPow`).
1c8e621
to
be16ac1
Compare
@stephentoub the build of the squashed commit has failed, but I can't see why? |
Failed again with same issue http://dotnet-ci.cloudapp.net/job/dotnet_corefx_windows_release_prtest/1351/console |
@dotnet-bot test this please |
Looks good to me too! |
@dotnet-bot test this please |
1 similar comment
@dotnet-bot test this please |
Improve performance of BigInteger.Add/Sub/Div/Rem
Thanks for the contribution! It's a very nice improvement. |
- An allocation for BigInteger.DivRem with small divisor is unnecessary, which gets fixed. BigIngeger handles these cases better now. - The square and multiply code submitted with dotnet#1436 used stackalloc, which led to stack overflows for huge numbers. With dotnet#1618 these has been changed to ordinary array allocations, which put some pressure on the managed heap. Thus, a hybrid solution should be better.
- An allocation for BigInteger.DivRem with small divisor is unnecessary, which gets fixed. BigIngeger handles these cases better now. - The square and multiply code submitted with dotnet#1436 used stackalloc, which led to stack overflows for huge numbers. With dotnet#1618 these has been changed to ordinary array allocations, which put some pressure on the managed heap. Thus, a hybrid solution should be better.
…mance Improve performance of BigInteger.Add/Sub/Div/Rem Commit migrated from dotnet/corefx@360607a
- An allocation for BigInteger.DivRem with small divisor is unnecessary, which gets fixed. BigIngeger handles these cases better now. - The square and multiply code submitted with dotnet/corefx#1436 used stackalloc, which led to stack overflows for huge numbers. With dotnet/corefx#1618 these has been changed to ordinary array allocations, which put some pressure on the managed heap. Thus, a hybrid solution should be better. Commit migrated from dotnet/corefx@b79a60c
To introduce further performance tweaks, some minor improvements are added to
BigIntegerCalculator
, which cover all the basic operations. Having these algorithms based on raw uint[] objects will lead to more interesting tweaks of the more fancy stuff (ModPow
).A basic performance comparison based on this code unveils the following results:
Add
Subtract
Divide
Note: during making this pull request I noted that all the
Debug.Assert
aren't doing anything... (?)Attention: because of using
stackalloc
I had to reduce an unit test value, which is currently very high and leads to a stack overflow. Would it be better to switch tofixed
or should the actual stack size be increased, if someone really needs that huge numbers?#1307