Dictionary CQ #15419

benaadams · 2017-12-07T19:35:51Z

Up to x 1.5 speed up for finding value types, x1.2 for finding object types

Improve Dictionary FindEntry CQ Improve Dictionary FindEntry CQ #15460
Improve Dictionary TryInsert CQ Improve Dictionary TryInsert CQ #15462
Use EqualityComparer<TKey>.Default (only from Improve Dictionary<K,T> CQ - Take 2 #15411)
1-bases _buckets so the don't be set to -1 on each resize

Adds 5kB (0.16% of base) to System.Private.CoreLib.ni.dll #15419 (comment)

Before

                                Method |
-------------------------------------- |
    ContainsKeySimplePositive_IntKey   | IEqualityComparer
    ContainsKeySimpleNegative_IntKey   | IEqualityComparer
 ContainsKeySimplePositive_StringKey   | IEqualityComparer
 ContainsKeySimpleNegative_StringKey   | IEqualityComparer
   ContainsKeySimplePositive_ObjectKey | IEqualityComparer
   ContainsKeySimpleNegative_ObjectKey | IEqualityComparer

After

                                Method |
-------------------------------------- |
    ContainsKeySimplePositive_IntKey   | devirtualized
    ContainsKeySimpleNegative_IntKey   | devirtualized
 ContainsKeySimplePositive_StringKey   | IEqualityComparer
 ContainsKeySimpleNegative_StringKey   | IEqualityComparer
   ContainsKeySimplePositive_ObjectKey | virtual
   ContainsKeySimpleNegative_ObjectKey | virtual

                          Method |Items |       Current |   Single func | Change |
-------------------------------- |----- |--------------:|--------------:|-------:|
   ContainsKey_Single_Yes_IntKey | 1000 |     11.231 ns |      7.110 ns | x 1.58 |
    ContainsKey_Single_No_IntKey | 1000 |     10.794 ns |      7.179 ns | x 1.50 |
          ContainsKey_All_IntKey | 1000 | 10,815.405 ns |  7,220.128 ns | x 1.50 |
ContainsKey_Single_Yes_StringKey | 1000 |     20.489 ns |     19.925 ns | x 1.03 |
 ContainsKey_Single_No_StringKey | 1000 |     12.747 ns |     13.414 ns | x 0.95 |
ContainsKey_Single_Yes_ObjectKey | 1000 |     11.233 ns |      9.119 ns | x 1.23 |
 ContainsKey_Single_No_ObjectKey | 1000 |     11.192 ns |      8.560 ns | x 1.31 |
                      Add_IntKey | 1000 | 27,573.689 ns | 22,503.652 ns | x 1.23 |
                   Add_StringKey | 1000 | 76,095.748 ns | 77,253.701 ns | x 0.99 |
                   Add_ObjectKey | 1000 | 68,845.402 ns | 67,529.550 ns | x 1.02 |
              PreSizedAdd_IntKey | 1000 | 11,558.450 ns |  9,658.953 ns | x 1.20 |
           PreSizedAdd_StringKey | 1000 | 57,360.936 ns | 57,710.254 ns | x 0.99 |
           PreSizedAdd_ObjectKey | 1000 | 54,780.942 ns | 50,300.463 ns | x 1.09 |

Performance-wise, multipath was faster but that increased code-gen by 27kB #15419 (comment)

Resolves: https://github.com/dotnet/coreclr/issues/16258

src/mscorlib/shared/System/Collections/Generic/Dictionary.cs

jkotas · 2017-12-07T20:36:45Z

@dotnet-bot test Windows_NT x64 Checked corefx_baseline

jkotas · 2017-12-07T20:36:53Z

@dotnet-bot test Ubuntu x64 Checked corefx_baseline

jkotas · 2017-12-07T21:09:45Z

Do you have any numbers for this? Throughput, code size?

benaadams · 2017-12-07T21:38:18Z

Was looking for this https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/benchmarking.md should have them shortly

omariom · 2017-12-07T22:03:19Z

Is there a way to run perf tests from CoreFX? There are tests for Dictionary and HashSet there.

benaadams · 2017-12-07T22:06:08Z

I've never worked out how to run the Microsoft.Xunit.Performance tests

danmoseley · 2017-12-07T22:10:56Z

I haven't done it in a while, but this describes how to run the tests and of course you could build with /p:CoreCLROverridePath before doing that.

benaadams · 2017-12-07T22:55:40Z

Failures are in Regex tests

System.OutOfMemoryException : Exception of type 'System.OutOfMemoryException' was thrown.

benaadams · 2017-12-08T19:10:25Z

While its an improvement; its a bit fragile (changing the code a little can loose the improvement) - so I think it needs work

             Method |     Mean |    Error |   StdDev |
------------------- |---------:|---------:|---------:|
 Knucleotide (Pre)  | 72.88 ms | 1.493 ms | 1.777 ms |
 Knucleotide (Post) | 67.64 ms | 1.349 ms | 1.847 ms |

The corefx tests produce odd results; with variations in the same tests at different sizes; feels like GC - also there are no object key tests (are string keys, but that's a special case)

I think this might be a longer journey...

omariom · 2017-12-08T19:21:39Z

Did you use BDN?

benaadams · 2017-12-08T19:31:00Z

Did you use BDN?

Yeah, [InProcess] publish self-contained; copy coreclr release tests bin dlls (corefx+coreclr) on top; duplicate folder for each variant; copy Product coreclr dll over for each variant - bit manual; but means don't have to rebuild corefx each time to produce a NetCoreApp package using the right coreclr; which is a very slow iteration step

AndyAyersMS · 2017-12-08T19:36:19Z

Doing a proper evaluation of these kinds of changes is not easy.

We don't have anything I know of that would be considered a representative perf test suite for dictionaries (covering different kinds of key types, comparers, load factors, loading history, collision patterns, hash functions, etc). Creating something like this would be very useful, but also a lot of work.

I have some estimates for the typical distribution of key types seen in internal Microsoft usage of dictionaries, but this is a static accounting (eg it looks at the memory image of a process, not at how often a dictionary is used and via what API). I'm not all that confident in the data yet, but it appears strings are the dominant key type, at least statically. We have been trying to look at comparer usage and typical load factors and collision chain distributions too, but haven't gotten very far with that yet. So maybe as this gets a bit further and we get some dynamic information fed in we can build a dictionary performance modelling framework.

Without that it's not obvious which cases to optimize for: there are almost always tradeoffs to be made, and we need guidance on which cases can become a bit slower without causing undue harm, what the worst-case impact is on any scenario, and whether the wins on some cases are sufficient to offset the losses in others.

Also we don't have a cookbook way of assessing whether code size increases we might incur in trying to optimize are justified by the improved performance.

I'm not trying to discourage this sort of exploration. It is interesting to know what is possible and how the performance characteristics can be altered.

But it is going to be hard to take changes here unless they are unconditionally better in all key aspects (cleaner code, smaller code, faster code, faster jitting).

benaadams · 2017-12-08T19:56:40Z

Agreed. The issue I'm having is obvious "next step" improvements, don't improve things or regress (though not below baseline) - so the changes feel fragile.

So I think the next step is to have better performance tests to validate the changes and iterate from there. Its more complicated to test than I was doing previously as also have pick up the Jit changes; rather than have two sets of Dictionary implementation (e.g. for the devirtualization)

omariom · 2017-12-08T19:59:24Z

@benaadams
What is Knucleotide?
I started porting Perf.Dictionary.cs but stuck at set_Item - there was a mysterious degradation, though with manual nano-benchmarking there was improvement.
I will publish them as gist tomorrow.

benaadams · 2017-12-08T20:37:47Z

What is Knucleotide?

Its interesting use of Dictionary from benchmarks game, rather than just testing raw function time; though not specifically just testing dictionary

Example implementations in the JIT Performance/CodeQuality directory; though I think I was using an easier iteration of the algorithm.

omariom · 2017-12-11T09:47:17Z

@benaadams

If still needed

Some perf tests

using System;
using System.Collections.Generic;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;

namespace BDN.Tests
{
    
    [Config(typeof(InProcConfig))]
    public class CollectiontestPefBenchmarks
    {
        private Dictionary<int, int> _dict;
        private int _key;
        
        const int Iterations = 20000;
        
        [Params(1000, 10_000, 100_000)]
        public int Size_Common { get; set; }
        
        #region Add

        [IterationSetup(Target=nameof(Add))]
        public void IterationSetup_Add()
        {
            _dict = new Dictionary<int, int>(CreateDictionary(Size_Common));
        }

        [Benchmark(OperationsPerInvoke = Iterations * 9)]
        public void Add()
        {
            for (int i = 0; i <= Iterations; i++)
            {
                _dict.Add(i * 10 + 1, 0);
                _dict.Add(i * 10 + 2, 0);
                _dict.Add(i * 10 + 3, 0);
                _dict.Add(i * 10 + 4, 0);
                _dict.Add(i * 10 + 5, 0);
                _dict.Add(i * 10 + 6, 0);
                _dict.Add(i * 10 + 7, 0);
                _dict.Add(i * 10 + 8, 0);
                _dict.Add(i * 10 + 9, 0);
            }
        }

        #endregion

        #region GetItem

        [IterationSetup(Target=nameof(GetItem))]
        public void IterationSetup_GetItem()
        {
            _dict = CreateDictionary(Size_Common);

            for (int i = 1; i <= 9; i++)
            {
                _dict.Add(i, 0);
            }
        }

        [Benchmark(OperationsPerInvoke = Iterations * 9)]
        public void GetItem()
        {
            int retrieved;

            for (int i = 0; i < Iterations; i++)
            {
                retrieved = _dict[1];
                retrieved = _dict[2];
                retrieved = _dict[3];
                retrieved = _dict[4];
                retrieved = _dict[5];
                retrieved = _dict[6];
                retrieved = _dict[7];
                retrieved = _dict[8];
                retrieved = _dict[9];
            }
        }

        #endregion

        #region SetItem

        [IterationSetup(Target=nameof(SetItem))]
        public void IterationSetup_SetItem()
        {
            _dict = CreateDictionary(Size_Common);

            for (int i = 1; i <= 9; i++)
            {
                _dict.Add(i, 0);
            }
        }

        [Benchmark(OperationsPerInvoke = Iterations * 9)]
        public void SetItem()
        {
            for (int i = 0; i < Iterations; i++)
            {
                _dict[1] = 0;
                _dict[2] = 0;
                _dict[3] = 0;
                _dict[4] = 0;
                _dict[5] = 0;
                _dict[6] = 0;
                _dict[7] = 0;
                _dict[8] = 0;
                _dict[9] = 0;
            }
        }

        #endregion

        #region TryGetValue

        [IterationSetup(Target = nameof(TryGetValue))]
        public void IterationSetup_TryGetValue()
        {
            _dict = CreateDictionary(Size_Common);
            // Setup - utils needs a specific seed to prevent key collision with TestData
            Random rand = new Random(837322);
            _key = rand.Next(0, 400000);
            _dict.Add(_key, 12);
        }

        [Benchmark(OperationsPerInvoke = Iterations * 9)]
        public void TryGetValue()
        {
            int value;
            int key = _key;

            for (int i = 0; i < Iterations; i++)
            {
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
            }
        }

        #endregion

        #region ContainsKey_Int_True

        [IterationSetup(Target = nameof(ContainsKey_Int_True))]
        public void IterationSetup_ContainsKey_Int_True()
        {
            _dict = new Dictionary<int, int>();

            for (int i = 0; i < Size_Common; i++)
            {
                _dict.Add(i, i);
            }
        }

        [Benchmark]
        public void ContainsKey_Int_True()
        {
            bool result = false;
            int iterations = _dict.Count;

            for (int i = 0; i < iterations; i++)
            {
                result = _dict.ContainsKey(i);
            }
        }

        #endregion

        private static Dictionary<int, int> CreateDictionary(int size)
        {
            Random rand = new Random(837322);
            Dictionary<int, int> dict = new Dictionary<int, int>();
            while (dict.Count < size)
            {
                int key = rand.Next(500000, int.MaxValue);
                dict.TryAdd(key, 0);
            }
            return dict;
        }
    }

    class InProcConfig : ManualConfig
    {
        public InProcConfig()
        {
            Add(Job.InProcess
                //.WithLaunchCount(1)
                .WithUnrollFactor(1)
                .WithInvocationCount(1)
                .WithId("InProcess"));
        }
    }
}

benaadams · 2017-12-16T01:17:27Z

@omariom tests, Plus a straight Add to empty dict and GC before each iteration for stability

Before

                     Method | Size_Common |            Mean |           Error |          Median |
--------------------------- |------------ |----------------:|----------------:|----------------:|
                        Add |        1000 |        40.08 ns |       4.3242 ns |        33.62 ns |
                    GetItem |        1000 |        10.96 ns |       0.3272 ns |        10.69 ns |
                    SetItem |        1000 |        11.62 ns |       0.1644 ns |        11.56 ns |
                TryGetValue |        1000 |        11.69 ns |       0.1441 ns |        11.67 ns |
 ContainsKey_Int_True_Total |        1000 |    10,826.17 ns |     631.9406 ns |    10,500.50 ns |
                  Add_Total |        1000 |    27,086.82 ns |   4,350.4026 ns |    25,560.43 ns |
                        Add |       10000 |        53.30 ns |       1.4740 ns |        54.26 ns |
                    GetItem |       10000 |        10.82 ns |       0.1663 ns |        10.81 ns |
                    SetItem |       10000 |        12.25 ns |       0.2434 ns |        12.26 ns |
                TryGetValue |       10000 |        12.09 ns |       0.2394 ns |        11.99 ns |
 ContainsKey_Int_True_Total |       10000 |   104,565.16 ns |   2,645.6341 ns |   103,465.48 ns |
                  Add_Total |       10000 |   307,192.85 ns |   5,521.5687 ns |   305,067.24 ns |
                        Add |      100000 |        60.51 ns |       1.3262 ns |        60.09 ns |
                    GetItem |      100000 |        10.74 ns |       0.1687 ns |        10.69 ns |
                    SetItem |      100000 |        11.72 ns |       0.2996 ns |        11.56 ns |
                TryGetValue |      100000 |        11.65 ns |       0.2297 ns |        11.58 ns |
 ContainsKey_Int_True_Total |      100000 | 1,037,571.45 ns |     457.3348 ns | 1,037,753.65 ns |
                  Add_Total |      100000 | 3,825,088.47 ns | 224,668.2170 ns | 3,367,917.89 ns |

After

                     Method | Size_Common |            Mean |           Error |          Median |
--------------------------- |------------ |----------------:|----------------:|----------------:|
                        Add |        1000 |        34.28 ns |       2.6295 ns |        32.36 ns |
                    GetItem |        1000 |         7.54 ns |       0.1193 ns |         7.56 ns |
                    SetItem |        1000 |         7.52 ns |       0.0662 ns |         7.49 ns |
                TryGetValue |        1000 |         8.10 ns |       0.1818 ns |         8.07 ns |
 ContainsKey_Int_True_Total |        1000 |     5,487.10 ns |     111.5480 ns |     5,388.42 ns |
                  Add_Total |        1000 |    23,512.64 ns |     513.0279 ns |    23,389.28 ns |
                        Add |       10000 |        67.47 ns |       5.9138 ns |        57.77 ns |
                    GetItem |       10000 |         7.52 ns |       0.1439 ns |         7.43 ns |
                    SetItem |       10000 |         7.67 ns |       0.0222 ns |         7.67 ns |
                TryGetValue |       10000 |         8.04 ns |       0.1263 ns |         7.98 ns |
 ContainsKey_Int_True_Total |       10000 |    53,739.41 ns |     658.6621 ns |    53,607.83 ns |
                  Add_Total |       10000 |   286,839.07 ns |   5,628.6049 ns |   282,980.66 ns |
                        Add |      100000 |        66.64 ns |       1.2882 ns |        66.88 ns |
                    GetItem |      100000 |         7.50 ns |       0.1166 ns |         7.44 ns |
                    SetItem |      100000 |         7.50 ns |       0.0293 ns |         7.49 ns |
                TryGetValue |      100000 |         8.10 ns |       0.2361 ns |         7.98 ns |
 ContainsKey_Int_True_Total |      100000 |   534,637.44 ns |     984.6944 ns |   534,440.07 ns |
                  Add_Total |      100000 | 3,590,623.28 ns | 191,663.7678 ns | 3,195,646.47 ns |

benaadams · 2017-12-16T01:39:49Z

Added per iteration GC to cleanup; results more stable

Incorporates

Improve Dictionary FindEntry CQ Improve Dictionary FindEntry CQ #15460
Improve Dictionary TryInsert CQ Improve Dictionary TryInsert CQ #15462

Before

                           Method |      Mean |     Error |    StdDev |
--------------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive_Int | 10.457 ns | 0.0080 ns | 0.0067 ns | IEqualityComparer
    ContainsKeySimpleNegative_Int |  7.387 ns | 0.0100 ns | 0.0093 ns | IEqualityComparer
 ContainsKeyCollisionPositive_Int | 21.294 ns | 0.1610 ns | 0.1257 ns | IEqualityComparer
 ContainsKeyCollisionNegative_Int | 20.041 ns | 0.0380 ns | 0.0337 ns | IEqualityComparer
 ContainsKeySimplePositive_String | 21.382 ns | 0.0796 ns | 0.0744 ns | IEqualityComparer
 ContainsKeySimpleNegative_String | 17.249 ns | 0.0302 ns | 0.0282 ns | IEqualityComparer
   ContainsKeySimplePositive_Type | 16.377 ns | 0.0159 ns | 0.0133 ns | IEqualityComparer
   ContainsKeySimpleNegative_Type | 10.821 ns | 0.0243 ns | 0.0227 ns | IEqualityComparer

After

                           Method |      Mean |     Error |    StdDev |
--------------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive_Int |  5.980 ns | 0.0117 ns | 0.0109 ns | x 1.74 devirtualized
    ContainsKeySimpleNegative_Int |  4.846 ns | 0.0073 ns | 0.0069 ns | x 1.52 devirtualized
 ContainsKeyCollisionPositive_Int | 15.208 ns | 0.0289 ns | 0.0271 ns | x 1.40 devirtualized
 ContainsKeyCollisionNegative_Int | 15.247 ns | 0.0330 ns | 0.0292 ns | x 1.31 devirtualized
 ContainsKeySimplePositive_String | 20.636 ns | 0.0324 ns | 0.0270 ns | x 1.03 IEqualityComparer
 ContainsKeySimpleNegative_String | 17.536 ns | 0.0174 ns | 0.0163 ns | x 0.98 IEqualityComparer
   ContainsKeySimplePositive_Type | 16.330 ns | 0.0199 ns | 0.0176 ns | x 1.00 virtual
   ContainsKeySimpleNegative_Type |  7.676 ns | 0.0080 ns | 0.0074 ns | x 1.41 virtual

jkotas · 2018-03-01T22:07:12Z

@benaadams Thank you!

HFadeel · 2018-03-02T05:34:48Z

@AndyAyersMS would love to see this in CoreCLR 2.1 and future .NET

danmoseley · 2018-03-02T05:48:20Z

@HFadeel it should be.

omariom · 2018-03-02T20:53:21Z

I am sure a lot of code will suddenly get faster

AndyAyersMS · 2018-03-02T21:00:27Z

We'd be quite interested in independent observations on the perf impact of this change, both good and (perhaps especially important) bad.

Looks like it may take a few days to get these bits out into the nightly build packages. Once that happens we should perhaps do a bit of advertising....

They should also be in 2.1 preview 2.

AndyAyersMS · 2018-03-11T20:05:51Z

Perf infrastructure is finally back on line. Data shows this change gave roughly 6% improvement on k-nuckeotide-1 and 4.5% on k-nuckeotide-9. It should help close the gap vs Java somewhat, from 1.32x to something more like 1.25x (assuming proportional speedup on those ancient machines).

Probably should look at this test anew and see what else we might be able to do.

cc @danmosemsft @tannergooding

benaadams · 2018-03-11T20:28:50Z

Good news!

The Benchmarks Game tests also I believe include Jit and startup time; was the extra cost from ArrayPoolEventSource in 2.0? https://github.com/dotnet/coreclr/issues/15954

I assume so; and that has also gone.

AndyAyersMS · 2018-03-11T22:55:11Z

Looking back a bit further at -9, the overall speedup over 2.1 may be more like 1.125x (2720/2415), so vs java maybe now only ~1.17x slower. Seems like enabling software write watch (#16516) may have helped here too.

(Actually forgot official runs are on Ubuntu -- speedup over 2.0 is looks similar though 3400/3000 or about 1.13x).

We don't measure jit costs in our HW lab (we discard the initial iteration). Maybe Tanner can do some pseudo-official runs...

danmoseley · 2018-03-11T23:02:12Z

The Benchmarks game measures the total time the process runs, including JIT, although the machine is warmed up first.

I would love @tannergooding or @ViktorHofer to get a full set of CLBG numbers on the special machine this week.

simplejackcoder · 2018-07-04T06:03:39Z

PreCondition: I'm not trying to be a smart alec or rude.

This improvement for TryGetValue (33%) made me think some very clever algorithm improvement or major bug like looping through twice.

I check code change and there is no such change.

If I read it correctly I should create my own dictionary for best fastest perf. IntDictionary, LongDictionary, StringDictionary -- I want @mikedn implementation which is even faster.

benaadams · 2018-07-04T11:09:10Z

This improvement for TryGetValue (33%) made me think some very clever algorithm improvement or major bug like looping through twice.

I check code change and there is no such change.

There was a change to the Jit #14125 to turn EqualityComparer<T>.Default.Equals into an intrinsic. So if you have a value type T (struct) that implements IEquatable<T> then it will devitualize it. For example for int the Equals then becomes a single cpu instruction.

This code change was to take advantage of that Jit change in Dictionary when you don't specify your own custom IEqualityComparer<TKey> comparer in the constructor.

If I read it correctly I should create my own dictionary for best fastest perf. IntDictionary, LongDictionary, StringDictionary

You will get this advantage just by using the regular dictionary constructor for int and long and not using the overload that specifies an IEqualityComparer<TKey>
e.g.

var intDict = new Dictionary<int, MyValue>();
var longDict = new Dictionary<long, MyValue>();

string is a bit different as it also has DoS protection that kicks in if you start getting too many key collisions by moving to a randomised HashCode generator which is more expensive but reduces the collisions for a net performance gain. This is doubly important as string Dictionary keys can be externally controlled (like entries in a forms, route or query string collection for a webserver)

If you have a specific scenario for object type keys you want to optimize for you can create a struct wrapper and use that as the key for a regular dictionary for the same behaviour. For example you could create a TypeKey and use that to create a TypeDictionary that is optimized for ReferenceEquality on the types.

mikedn · 2018-07-05T17:51:20Z

I want @mikedn implementation which is even faster.

Note that my implementation is specific to the k-nucleotide benchmark. Not to say that it cannot be used for other purposes but it's certainly not a general purpose hashtable. For example, it takes advantage of the fact that the hashcode of an integer is fast to compute so it doesn't store the hash code in the Entry object. That saves 8 bytes per entry and considering that the hashtable is pretty large that's pretty good. And AFAIR I customized the hash code computation as well.

If I read it correctly I should create my own dictionary for best fastest perf. IntDictionary, LongDictionary, StringDictionary

Yes. Assuming that "best perf" is indeed relevant to your application. If the application spends 5% of its time in hashtable code then creating your own hashtable is quite possible a waste of time. If the application spends 95% of its time in hashtable code then creating your own hashtable (or whatever suitable data structure is required) is something that you may want to consider.

ViktorHofer · 2018-07-05T17:55:58Z

@mikedn did we ever make use of your implementation for k-nucleotide, i.e. in a nuget package similar to what Java does in their benchmark?

mikedn · 2018-07-05T18:12:58Z

@ViktorHofer Not that I know of. In fact I'm not even sure where I put that thing, need to search my drive for it.

Zhentar · 2018-07-11T23:21:39Z

100% Real™ code, 100% Real™ data, very Dictionary heavy (mostly <string, object>, and <int, string>)

Method	Runtime	Toolchain	Mean	Error	StdDev
ParseLargeFile	Clr	Default	6.679 s	0.3603 s	0.1600 s
ParseLargeFile	Core	.NET Core 2.0	6.208 s	0.2307 s	0.0599 s
ParseLargeFile	Core	.NET Core 2.1	5.657 s	0.3788 s	0.1682 s

From profiling, it looks like this change is about half of the improvement between .Net 4.7 & Core 2.1 💯

danmoseley · 2018-07-11T23:44:05Z

It's like we need partial specialization in C# so we can have Dictionary<int, T>, but from reading this discussion it's not clear to me whether that's even possible for generics, or if it is, whether it would have non zero costs (shapes implemented with interfaces?)

Maybe the way forward is for code gen to have special knowledge of Dictionary. It's critical enough that the ickyness may be worthwhile.

benaadams · 2018-07-11T23:51:56Z

Dictionary<int, T> is covered by this PR; is where TKey is an object type that its not.

From what I understand; shapes are like the Surface Phone, they are a panacea for all problems, and probably both will be released at the same time IEnumerable<T> devirualizes and becomes non-allocating 😉

mikedn · 2018-07-23T05:23:55Z

@ViktorHofer and everyone else who may be interested: My version is here: https://gist.github.com/mikedn/8dc555a867aba65b6e26f39cf3cb66eb

On my machine it's about 1.4x faster than the current top C# entry on Benchmarks Game. If this factor translates to Benchmarks Game's environment then it will probably be almost as fast as the current top entry (a C++ version).

That said, it goes a bit beyond "just use another hashtable" approach. It has been customized specifically for knucleotide (e.g. the hashBits = 18 thing). If you're trying to make a more generic hashtable then performance might suffer (maybe 1.2x instead of 1.4x, I don't remember).

And to be clear: I don't want to encourage people to write their own hashtables because "omg, this is faster". But it's something you can do in very specific cases.

danmoseley · 2018-07-23T05:48:58Z

@AnthonyLloyd

AnthonyLloyd · 2018-07-23T07:59:24Z

Rules of the k-nukleotide are that it has to use a hash table but you are not allowed to implement one yourself and can only nuget.

My increment method sneaks past this just.

mikedn · 2018-07-23T08:21:19Z

Yeah, I know the rules, that's what's I'm not really bothering with it (that is, I'm not going to waste my time to create a nuget package for my custom hashtable).

My increment method sneaks past this just.

Heh, it's odd that they accepted that because what you did can be equated with a custom hashtable implementation. Well, at least the hashtable lookup part.

danmoseley · 2018-07-23T15:41:06Z

Hmm it is tempting to create such a package if it would be the easiest way to improve perf on this benchmark. It's basically what Java does. Maybe we could put in in corefx lab..

ViktorHofer · 2018-07-23T15:49:54Z

I believed that custom packages are also not allowed? The java one was an exception and was criticized in the forum. Am I wrong?

AnthonyLloyd · 2018-07-23T16:00:04Z

It has to be a package with a reasonable number of downloads. The Java one is part of something else I think.

danmoseley · 2018-07-23T17:36:53Z

Maybe it could be part of https://github.com/dotnet/corefx/issues/31191 (if it's binary). At least as a general purpose long->int map not something hacked for the benchmark.

jkotas reviewed Dec 7, 2017

View reviewed changes

src/mscorlib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

jkotas reviewed Dec 7, 2017

View reviewed changes

src/mscorlib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

omariom reviewed Dec 7, 2017

View reviewed changes

src/mscorlib/shared/System/Collections/Generic/Dictionary.cs Outdated Show resolved Hide resolved

benaadams changed the title ~~Dictionary - use EqualityComparer<TKey>.Default directly~~ Dictionary - use EqualityComparer<TKey>.Default devirtualization Dec 7, 2017

benaadams changed the title ~~Dictionary - use EqualityComparer<TKey>.Default devirtualization~~ Dictionary - EqualityComparer<TKey>.Default devirtualization Dec 7, 2017

benaadams changed the title ~~Dictionary - EqualityComparer<TKey>.Default devirtualization~~ [WIP] Dictionary - EqualityComparer<TKey>.Default devirtualization Dec 8, 2017

benaadams mentioned this pull request Dec 8, 2017

Dictionary - Add scope prefix for private vars #15441

Merged

benaadams force-pushed the dict-default-comparer branch from 4d8b7ac to b50667f Compare December 15, 2017 22:25

benaadams changed the title ~~[WIP] Dictionary - EqualityComparer<TKey>.Default devirtualization~~ Dictionary - EqualityComparer<TKey>.Default devirtualization Dec 15, 2017

benaadams closed this Dec 15, 2017

benaadams reopened this Dec 15, 2017

benaadams mentioned this pull request Dec 16, 2017

Improve Dictionary FindEntry CQ #15460

Closed

benaadams deleted the dict-default-comparer branch March 2, 2018 21:15

danmoseley mentioned this pull request Jul 30, 2018

Proposal: Provide specialized collections aka Power Collections dotnet/corefxlab#2406

Closed

danmoseley mentioned this pull request Aug 14, 2018

Faster dictionary for k-Nucleotide dotnet/corefxlab#2428

Closed

benaadams mentioned this pull request Jan 31, 2020

[CI Failure] Assert failed Linux System.Globalization.CompareInfo.IndexOfCore dotnet/runtime#9707

Closed

AndyAyersMS mentioned this pull request Nov 3, 2020

JIT: see if guarded devirtualization for EqualityComparer methods pays off dotnet/runtime#9028

Closed

Dictionary CQ #15419

Dictionary CQ #15419

Conversation

benaadams commented Dec 7, 2017 • edited Loading

jkotas commented Dec 7, 2017

jkotas commented Dec 7, 2017

jkotas commented Dec 7, 2017

benaadams commented Dec 7, 2017

omariom commented Dec 7, 2017 • edited Loading

benaadams commented Dec 7, 2017 • edited Loading

danmoseley commented Dec 7, 2017

benaadams commented Dec 7, 2017

benaadams commented Dec 8, 2017

omariom commented Dec 8, 2017

benaadams commented Dec 8, 2017

AndyAyersMS commented Dec 8, 2017

benaadams commented Dec 8, 2017

omariom commented Dec 8, 2017

benaadams commented Dec 8, 2017

omariom commented Dec 11, 2017

benaadams commented Dec 16, 2017

benaadams commented Dec 16, 2017 • edited Loading

jkotas commented Mar 1, 2018

HFadeel commented Mar 2, 2018

danmoseley commented Mar 2, 2018

omariom commented Mar 2, 2018

AndyAyersMS commented Mar 2, 2018

AndyAyersMS commented Mar 11, 2018

benaadams commented Mar 11, 2018

AndyAyersMS commented Mar 11, 2018

danmoseley commented Mar 11, 2018

simplejackcoder commented Jul 4, 2018

benaadams commented Jul 4, 2018

mikedn commented Jul 5, 2018

ViktorHofer commented Jul 5, 2018

mikedn commented Jul 5, 2018

Zhentar commented Jul 11, 2018

danmoseley commented Jul 11, 2018

benaadams commented Jul 11, 2018

mikedn commented Jul 23, 2018

danmoseley commented Jul 23, 2018

AnthonyLloyd commented Jul 23, 2018

mikedn commented Jul 23, 2018

danmoseley commented Jul 23, 2018

ViktorHofer commented Jul 23, 2018

AnthonyLloyd commented Jul 23, 2018

danmoseley commented Jul 23, 2018

benaadams commented Dec 7, 2017 •

edited

Loading

omariom commented Dec 7, 2017 •

edited

Loading

benaadams commented Dec 7, 2017 •

edited

Loading

benaadams commented Dec 16, 2017 •

edited

Loading