-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
@dotnet-bot test Windows_NT x64 Checked corefx_baseline |
@dotnet-bot test Ubuntu x64 Checked corefx_baseline |
Do you have any numbers for this? Throughput, code size? |
Was looking for this https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/benchmarking.md should have them shortly |
Is there a way to run perf tests from CoreFX? There are tests for Dictionary and HashSet there. |
I've never worked out how to run the Microsoft.Xunit.Performance tests |
I haven't done it in a while, but this describes how to run the tests and of course you could build with |
Failures are in Regex tests
|
While its an improvement; its a bit fragile (changing the code a little can loose the improvement) - so I think it needs work
The corefx tests produce odd results; with variations in the same tests at different sizes; feels like GC - also there are no object key tests (are string keys, but that's a special case) I think this might be a longer journey... |
Did you use BDN? |
Yeah, |
Doing a proper evaluation of these kinds of changes is not easy. We don't have anything I know of that would be considered a representative perf test suite for dictionaries (covering different kinds of key types, comparers, load factors, loading history, collision patterns, hash functions, etc). Creating something like this would be very useful, but also a lot of work. I have some estimates for the typical distribution of key types seen in internal Microsoft usage of dictionaries, but this is a static accounting (eg it looks at the memory image of a process, not at how often a dictionary is used and via what API). I'm not all that confident in the data yet, but it appears strings are the dominant key type, at least statically. We have been trying to look at comparer usage and typical load factors and collision chain distributions too, but haven't gotten very far with that yet. So maybe as this gets a bit further and we get some dynamic information fed in we can build a dictionary performance modelling framework. Without that it's not obvious which cases to optimize for: there are almost always tradeoffs to be made, and we need guidance on which cases can become a bit slower without causing undue harm, what the worst-case impact is on any scenario, and whether the wins on some cases are sufficient to offset the losses in others. Also we don't have a cookbook way of assessing whether code size increases we might incur in trying to optimize are justified by the improved performance. I'm not trying to discourage this sort of exploration. It is interesting to know what is possible and how the performance characteristics can be altered. But it is going to be hard to take changes here unless they are unconditionally better in all key aspects (cleaner code, smaller code, faster code, faster jitting). |
Agreed. The issue I'm having is obvious "next step" improvements, don't improve things or regress (though not below baseline) - so the changes feel fragile. So I think the next step is to have better performance tests to validate the changes and iterate from there. Its more complicated to test than I was doing previously as also have pick up the Jit changes; rather than have two sets of Dictionary implementation (e.g. for the devirtualization) |
@benaadams |
Its interesting use of Dictionary from benchmarks game, rather than just testing raw function time; though not specifically just testing dictionary Example implementations in the JIT Performance/CodeQuality directory; though I think I was using an easier iteration of the algorithm. |
If still needed Some perf testsusing System;
using System.Collections.Generic;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
namespace BDN.Tests
{
[Config(typeof(InProcConfig))]
public class CollectiontestPefBenchmarks
{
private Dictionary<int, int> _dict;
private int _key;
const int Iterations = 20000;
[Params(1000, 10_000, 100_000)]
public int Size_Common { get; set; }
#region Add
[IterationSetup(Target=nameof(Add))]
public void IterationSetup_Add()
{
_dict = new Dictionary<int, int>(CreateDictionary(Size_Common));
}
[Benchmark(OperationsPerInvoke = Iterations * 9)]
public void Add()
{
for (int i = 0; i <= Iterations; i++)
{
_dict.Add(i * 10 + 1, 0);
_dict.Add(i * 10 + 2, 0);
_dict.Add(i * 10 + 3, 0);
_dict.Add(i * 10 + 4, 0);
_dict.Add(i * 10 + 5, 0);
_dict.Add(i * 10 + 6, 0);
_dict.Add(i * 10 + 7, 0);
_dict.Add(i * 10 + 8, 0);
_dict.Add(i * 10 + 9, 0);
}
}
#endregion
#region GetItem
[IterationSetup(Target=nameof(GetItem))]
public void IterationSetup_GetItem()
{
_dict = CreateDictionary(Size_Common);
for (int i = 1; i <= 9; i++)
{
_dict.Add(i, 0);
}
}
[Benchmark(OperationsPerInvoke = Iterations * 9)]
public void GetItem()
{
int retrieved;
for (int i = 0; i < Iterations; i++)
{
retrieved = _dict[1];
retrieved = _dict[2];
retrieved = _dict[3];
retrieved = _dict[4];
retrieved = _dict[5];
retrieved = _dict[6];
retrieved = _dict[7];
retrieved = _dict[8];
retrieved = _dict[9];
}
}
#endregion
#region SetItem
[IterationSetup(Target=nameof(SetItem))]
public void IterationSetup_SetItem()
{
_dict = CreateDictionary(Size_Common);
for (int i = 1; i <= 9; i++)
{
_dict.Add(i, 0);
}
}
[Benchmark(OperationsPerInvoke = Iterations * 9)]
public void SetItem()
{
for (int i = 0; i < Iterations; i++)
{
_dict[1] = 0;
_dict[2] = 0;
_dict[3] = 0;
_dict[4] = 0;
_dict[5] = 0;
_dict[6] = 0;
_dict[7] = 0;
_dict[8] = 0;
_dict[9] = 0;
}
}
#endregion
#region TryGetValue
[IterationSetup(Target = nameof(TryGetValue))]
public void IterationSetup_TryGetValue()
{
_dict = CreateDictionary(Size_Common);
// Setup - utils needs a specific seed to prevent key collision with TestData
Random rand = new Random(837322);
_key = rand.Next(0, 400000);
_dict.Add(_key, 12);
}
[Benchmark(OperationsPerInvoke = Iterations * 9)]
public void TryGetValue()
{
int value;
int key = _key;
for (int i = 0; i < Iterations; i++)
{
_dict.TryGetValue(key, out value);
_dict.TryGetValue(key, out value);
_dict.TryGetValue(key, out value);
_dict.TryGetValue(key, out value);
_dict.TryGetValue(key, out value);
_dict.TryGetValue(key, out value);
_dict.TryGetValue(key, out value);
_dict.TryGetValue(key, out value);
_dict.TryGetValue(key, out value);
}
}
#endregion
#region ContainsKey_Int_True
[IterationSetup(Target = nameof(ContainsKey_Int_True))]
public void IterationSetup_ContainsKey_Int_True()
{
_dict = new Dictionary<int, int>();
for (int i = 0; i < Size_Common; i++)
{
_dict.Add(i, i);
}
}
[Benchmark]
public void ContainsKey_Int_True()
{
bool result = false;
int iterations = _dict.Count;
for (int i = 0; i < iterations; i++)
{
result = _dict.ContainsKey(i);
}
}
#endregion
private static Dictionary<int, int> CreateDictionary(int size)
{
Random rand = new Random(837322);
Dictionary<int, int> dict = new Dictionary<int, int>();
while (dict.Count < size)
{
int key = rand.Next(500000, int.MaxValue);
dict.TryAdd(key, 0);
}
return dict;
}
}
class InProcConfig : ManualConfig
{
public InProcConfig()
{
Add(Job.InProcess
//.WithLaunchCount(1)
.WithUnrollFactor(1)
.WithInvocationCount(1)
.WithId("InProcess"));
}
}
} |
4d8b7ac
to
b50667f
Compare
@omariom tests, Plus a straight Add to empty dict and GC before each iteration for stability Before
After
|
Added per iteration GC to cleanup; results more stable Incorporates
Before
After
|
@benaadams Thank you! |
@AndyAyersMS would love to see this in CoreCLR 2.1 and future .NET |
@HFadeel it should be. |
I am sure a lot of code will suddenly get faster |
We'd be quite interested in independent observations on the perf impact of this change, both good and (perhaps especially important) bad. Looks like it may take a few days to get these bits out into the nightly build packages. Once that happens we should perhaps do a bit of advertising.... They should also be in 2.1 preview 2. |
Perf infrastructure is finally back on line. Data shows this change gave roughly 6% improvement on k-nuckeotide-1 and 4.5% on k-nuckeotide-9. It should help close the gap vs Java somewhat, from 1.32x to something more like 1.25x (assuming proportional speedup on those ancient machines). Probably should look at this test anew and see what else we might be able to do. cc @danmosemsft @tannergooding |
Good news! The Benchmarks Game tests also I believe include Jit and startup time; was the extra cost from ArrayPoolEventSource in 2.0? https://github.com/dotnet/coreclr/issues/15954 I assume so; and that has also gone. |
Looking back a bit further at -9, the overall speedup over 2.1 may be more like 1.125x (2720/2415), so vs java maybe now only ~1.17x slower. Seems like enabling software write watch (#16516) may have helped here too. (Actually forgot official runs are on Ubuntu -- speedup over 2.0 is looks similar though 3400/3000 or about 1.13x). We don't measure jit costs in our HW lab (we discard the initial iteration). Maybe Tanner can do some pseudo-official runs... |
The Benchmarks game measures the total time the process runs, including JIT, although the machine is warmed up first. I would love @tannergooding or @ViktorHofer to get a full set of CLBG numbers on the special machine this week. |
PreCondition: I'm not trying to be a smart alec or rude. This improvement for TryGetValue (33%) made me think some very clever algorithm improvement or major bug like looping through twice. I check code change and there is no such change. If I read it correctly I should create my own dictionary for best fastest perf. IntDictionary, LongDictionary, StringDictionary -- I want @mikedn implementation which is even faster. |
There was a change to the Jit #14125 to turn This code change was to take advantage of that Jit change in Dictionary when you don't specify your own custom
You will get this advantage just by using the regular dictionary constructor for var intDict = new Dictionary<int, MyValue>();
var longDict = new Dictionary<long, MyValue>();
If you have a specific scenario for object type keys you want to optimize for you can create a struct wrapper and use that as the key for a regular dictionary for the same behaviour. For example you could create a |
Note that my implementation is specific to the k-nucleotide benchmark. Not to say that it cannot be used for other purposes but it's certainly not a general purpose hashtable. For example, it takes advantage of the fact that the hashcode of an integer is fast to compute so it doesn't store the hash code in the
Yes. Assuming that "best perf" is indeed relevant to your application. If the application spends 5% of its time in hashtable code then creating your own hashtable is quite possible a waste of time. If the application spends 95% of its time in hashtable code then creating your own hashtable (or whatever suitable data structure is required) is something that you may want to consider. |
@mikedn did we ever make use of your implementation for k-nucleotide, i.e. in a nuget package similar to what Java does in their benchmark? |
@ViktorHofer Not that I know of. In fact I'm not even sure where I put that thing, need to search my drive for it. |
100% Real™ code, 100% Real™ data, very Dictionary heavy (mostly <string, object>, and <int, string>)
From profiling, it looks like this change is about half of the improvement between .Net 4.7 & Core 2.1 💯 |
It's like we need partial specialization in C# so we can have Maybe the way forward is for code gen to have special knowledge of Dictionary. It's critical enough that the ickyness may be worthwhile. |
From what I understand; shapes are like the Surface Phone, they are a panacea for all problems, and probably both will be released at the same time |
@ViktorHofer and everyone else who may be interested: My version is here: https://gist.github.com/mikedn/8dc555a867aba65b6e26f39cf3cb66eb On my machine it's about 1.4x faster than the current top C# entry on Benchmarks Game. If this factor translates to Benchmarks Game's environment then it will probably be almost as fast as the current top entry (a C++ version). That said, it goes a bit beyond "just use another hashtable" approach. It has been customized specifically for knucleotide (e.g. the And to be clear: I don't want to encourage people to write their own hashtables because "omg, this is faster". But it's something you can do in very specific cases. |
Rules of the k-nukleotide are that it has to use a hash table but you are not allowed to implement one yourself and can only nuget. My increment method sneaks past this just. |
Yeah, I know the rules, that's what's I'm not really bothering with it (that is, I'm not going to waste my time to create a nuget package for my custom hashtable).
Heh, it's odd that they accepted that because what you did can be equated with a custom hashtable implementation. Well, at least the hashtable lookup part. |
Hmm it is tempting to create such a package if it would be the easiest way to improve perf on this benchmark. It's basically what Java does. Maybe we could put in in corefx lab.. |
I believed that custom packages are also not allowed? The java one was an exception and was criticized in the forum. Am I wrong? |
It has to be a package with a reasonable number of downloads. The Java one is part of something else I think. |
Maybe it could be part of https://github.com/dotnet/corefx/issues/31191 (if it's binary). At least as a general purpose long->int map not something hacked for the benchmark. |
Up to x 1.5 speed up for finding value types, x1.2 for finding object types
EqualityComparer<TKey>.Default
(only from Improve Dictionary<K,T> CQ - Take 2 #15411)_buckets
so the don't be set to-1
on each resizeAdds 5kB (0.16% of base) to
System.Private.CoreLib.ni.dll
#15419 (comment)Before
After
Performance-wise, multipath was faster but that increased code-gen by 27kB #15419 (comment)
Resolves: https://github.com/dotnet/coreclr/issues/16258