-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update BenchmarkDotNet to latest version to take advantage of improvements #511
Conversation
Thanks @adamsitnik! I have done a comparison between our current version of BDN and 0.11.5.1086, so very close to this diff. I have put the total results here: https://gist.github.com/billwert/9469538fb651b8b022e7bebdf8c7295f Of particular interest:
These seem to be impacted by the shift from Mean to Median. Here's what the raw results look like for MemoryMarshalCreateSpan: CandidateOverheadActual 1: 152565216 op, 242377100.00 ns, 1.5887 ns/op WorkloadWarmup 1: 152565216 op, 243468100.00 ns, 1.5958 ns/op // BeforeActualRun // AfterActualRun BaselineOverheadActual 1: 124610960 op, 192520900.00 ns, 1.5450 ns/op WorkloadWarmup 1: 124610960 op, 262845500.00 ns, 2.1093 ns/op // BeforeActualRun // AfterActualRun Note the actual results are often zero with thew new BDN. How should we handle this sort of test? Should we add inner iterations perhaps? |
I think that this particular update has not introduced a new problem, but instead revealed an existing one. This update changes only the way we do the math, not the way we run the benchmarks. I have put the "Candidate" numbers to excel to show the difference:
As you can see, here there was no outliers and Average is almost same as Median. The difference is very small (0,005233333). The results are not affected. The same for "Baseline" numbers:
Here, we had some outliers (bold numbers) and the difference between Average and Median is noticable: 0,02968ns (it's still a little difference). This is the case @AndyAyersMS has reported and @AndreyAkinshin has fixed by using the Median instead of Average. I will try to find some time this week and answer the most important question: why do we get zeros? Are the two mentioned benchmarks optimized by JIT to a const? Are they single instruction and BDN can't handle that and we should rewrite them? Are they very dependent on the array alignment and we should rewrite them? Is BDN generating the right code for the overhead method that returns Span? |
Thanks! I did later notice that the overheads were close and the difference is in the actual measurement. Still very interesting.
If you have a moment scan the whole table. I pulled out the biggest difference but there are some others that seemed suspicious.
Thanks,
-Bill
…________________________________
From: Adam Sitnik <notifications@github.com>
Sent: Monday, May 27, 2019 12:52:47 PM
To: dotnet/performance
Cc: Bill Wert; Mention
Subject: Re: [dotnet/performance] update BenchmarkDotNet to latest version to take advantage of improvements (#511)
I think that this particular update has not introduced a new problem, but instead revealed an existing one.
This update changes only the way we do the math, not the way we run the benchmarks.
I have put the "Candidate" numbers to excel to show the difference:
Overhead(i) Average Median Workload(i) Old Result New Result New - Old
1,5887 1,597433333 1,5922 1,5905 -0,006933333 -0,0017 0,005233333
1,5865 1,5864 -0,011033333 -0,0058 0,005233333
1,605 1,6099 0,012466667 0,0177 0,005233333
1,5817 1,5913 -0,006133333 -0,0009 0,005233333
1,5941 1,6014 0,003966667 0,0092 0,005233333
1,5899 1,5917 -0,005733333 -0,0005 0,005233333
1,6202 1,5908 -0,006633333 -0,0014 0,005233333
1,6053 1,5912 -0,006233333 -0,001 0,005233333
1,5881 1,5826 -0,014833333 -0,0096 0,005233333
1,5912 1,5853 -0,012133333 -0,0069 0,005233333
1,6187 1,6202 0,022766667 0,028 0,005233333
1,5922 1,6258 0,028366667 0,0336 0,005233333
1,5898 1,6213 0,023866667 0,0291 0,005233333
1,6019 1,6011 0,003666667 0,0089 0,005233333
1,6082 1,5923 -0,005133333 1E-04 0,005233333
As you can see, here there was no outliers and Average is almost same as Median. The difference is very small (0,005233333). The results are not affected.
The same for "Baseline" numbers:
Overhead(i) Average Median Workload(i) Old Result New Result New - Old
1,545 1,65923 1,62955 1,9847 0,32547 0,35515 0,02968
2,0494 2,1327 0,47347 0,50315 0,02968
1,6369 2,0264 0,36717 0,39685 0,02968
1,5491 2,1572 0,49797 0,52765 0,02968
1,5446 2,1798 0,52057 0,55025 0,02968
1,5489 2,1082 0,44897 0,47865 0,02968
1,7461 1,9267 0,26747 0,29715 0,02968
1,7406 2,0997 0,44047 0,47015 0,02968
1,5608 2,1995 0,54027 0,56995 0,02968
1,6222 1,9165 0,25727 0,28695 0,02968
1,6174 1,8243 0,16507 0,19475 0,02968
1,4912 2,063 0,40377 0,43345 0,02968
1,6952 2,1152 0,45597 0,48565 0,02968
1,5576 1,9169 0,25767 0,28735 0,02968
1,6544 1,9433 0,28407 0,31375 0,02968
1,795 2,0397 0,38047 0,41015 0,02968
1,5913 1,9115 0,25227 0,28195 0,02968
1,7122 1,8092 0,14997 0,17965 0,02968
1,7855 1,9437 0,28447 0,31415 0,02968
1,7412 2,03 0,37077 0,40045 0,02968
Here, we had some outliers (bold numbers) and the difference between Average and Median is noticable: 0,02968ns (it's still a little difference). This is the case @AndyAyersMS<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgh.neting.cc%2FAndyAyersMS&data=02%7C01%7Cbillwert%40microsoft.com%7C8b272c8e04a247e2c01f08d6e2dce50b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636945835696934700&sdata=yO6UhhVziZ2kt7V7OHm%2FKcDWMfaMCACdq3V02JuQ8e4%3D&reserved=0> has reported and @AndreyAkinshin<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgh.neting.cc%2FAndreyAkinshin&data=02%7C01%7Cbillwert%40microsoft.com%7C8b272c8e04a247e2c01f08d6e2dce50b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636945835696934700&sdata=Xpwu5NYCAQ4PLx%2FV1Mv9v8qjRfecHEluF%2FTDG3Pr7e4%3D&reserved=0> has fixed by using the Median instead of Average.
I will try to find some time this week and answer the most important question: why do we get zeros? Are the two mentioned benchmarks optimized by JIT to a const? Are they single instruction and BDN can't handle that and we should rewrite them? Are they very dependent on the array alignment and we should rewrite them? Is BDN generating the right code for the overhead method that returns Span?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgh.neting.cc%2Fdotnet%2Fperformance%2Fpull%2F511%3Femail_source%3Dnotifications%26email_token%3DABCQI4BHAI7WEFNUIM4YS2DPXQ3Y7A5CNFSM4HP45RPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWKNVGI%23issuecomment-496294553&data=02%7C01%7Cbillwert%40microsoft.com%7C8b272c8e04a247e2c01f08d6e2dce50b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636945835696944695&sdata=nMrJY3eZ2%2BBOXF40vNXw9vCVK9Uy%2BLUkmBO9VByt1UM%3D&reserved=0>, or mute the thread<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgh.neting.cc%2Fnotifications%2Funsubscribe-auth%2FABCQI4DBY3ZBZXTWSQYBF5LPXQ3Y7ANCNFSM4HP45RPA&data=02%7C01%7Cbillwert%40microsoft.com%7C8b272c8e04a247e2c01f08d6e2dce50b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636945835696944695&sdata=EIQp9VvuKNhfoEMPQchpBKD46WmGZD4dh4J7lBbruG4%3D&reserved=0>.
|
@adamsitnik any updates? |
@billwert the diff that you have uploaded contains a few different kinds of benchmarks:
I did not have the time to go through the list and look at each of them, but when I was doing the 2.2 vs 3.0 comparison and some of them became a real pain I've redesigned them. So the list should be definitely shorter now. Right now I think that the best thing to do would be:
@billwert does it sound like a good plan to you? I'll try to run all the benchmarks again this week and make some progress here. |
@adamsitnik I think this is a good plan. |
@adiaaida to review the yml change and make sure it's inline with the plan. |
This is a short version of the report to unblock the team. I did run the benchmarks twice using exactly the same IL, OS, Hardware and BDN version. Then I created a diff which has shown the list of unstable benchmarks. Some were expected (IO, Multithreaded) some were not (like Parsing the numbers). The details below: summary:
|
We need to add netcoreapp5.0 to the private jobs too (I rearranged the yaml to make more sense to me, but that means that the private builds are further down -- line 192 or so) -- I made this change. |
Then I've run the benchmarks using latest version of BDN and did the diff again. The benchmarks in the diff:
I did not find any BDN bugs or other things that have affected the results in negative way summary:
|
The update contains a lof of minor improvements and bug fixes, but the most important are:
"Use new .NET Core 3.0 API to get the total number of allocated bytes for all threads"
dotnet/BenchmarkDotNet@f54055a So far BDN was using
GC.GetAllocatedBytesForCurrentThread
which returns the value for current thread only. Recently new API was added,GC.GetTotalAllocatedBytes
which returns the value of allocated managed memory for all threads. By using it we get the proper numbers. For multi-threaded benchmarks (few dozens in the enitre portfolio) we are going to get new, bigger numbers. This is a change of reported values, but also a bug fix (cc @stephentoub)."Use Median instead of Mean for overhead calculations"
dotnet/BenchmarkDotNet@d9901ba
One of the techniques BDN uses to allow for accurate nano-benchmarking is overhead deduction. BDN measures the overhead by benchmarking a method of same return type and arguments as the benchmarked method. The difference is that the "overhead" method does nothing for
void
and returnsdefault
for methods returning anything.So far the actual results was:
But as @AndyAyersMS has spotted, using
Average
for deducing the overhead is bad when we have some outliers that affect the Average:The implementation was changed, now BDN uses Median.
How it's going to affect the reported results? The nano-benchmarks are going to get more stable results, we are going to have fewer results affected by outliers.
Power Plans
By default BDN is now enforcing High Performance Power plan when running benchmarks on Windows. dotnet/BenchmarkDotNet@0dfa37e
To make sure it does not affect the results reported in our lab, I have disabled this behaviour in the RecommendedConfig type (
.DontEnforcePowerPlan()
). So this new feature is not going to affect the results, we might consider enabling it in the futre.Other improvements
To see the diff for the update from 0.11.3.1003 to 0.11.5.1090 please click here