Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Dictionary CQ #15419

Merged
merged 4 commits into from
Mar 1, 2018
Merged

Dictionary CQ #15419

merged 4 commits into from
Mar 1, 2018

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Dec 7, 2017

Up to x 1.5 speed up for finding value types, x1.2 for finding object types

Adds 5kB (0.16% of base) to System.Private.CoreLib.ni.dll #15419 (comment)

Before

                                Method |
-------------------------------------- |
    ContainsKeySimplePositive_IntKey   | IEqualityComparer
    ContainsKeySimpleNegative_IntKey   | IEqualityComparer
 ContainsKeySimplePositive_StringKey   | IEqualityComparer
 ContainsKeySimpleNegative_StringKey   | IEqualityComparer
   ContainsKeySimplePositive_ObjectKey | IEqualityComparer
   ContainsKeySimpleNegative_ObjectKey | IEqualityComparer

After

                                Method |
-------------------------------------- |
    ContainsKeySimplePositive_IntKey   | devirtualized
    ContainsKeySimpleNegative_IntKey   | devirtualized
 ContainsKeySimplePositive_StringKey   | IEqualityComparer
 ContainsKeySimpleNegative_StringKey   | IEqualityComparer
   ContainsKeySimplePositive_ObjectKey | virtual
   ContainsKeySimpleNegative_ObjectKey | virtual
                          Method |Items |       Current |   Single func | Change |
-------------------------------- |----- |--------------:|--------------:|-------:|
   ContainsKey_Single_Yes_IntKey | 1000 |     11.231 ns |      7.110 ns | x 1.58 |
    ContainsKey_Single_No_IntKey | 1000 |     10.794 ns |      7.179 ns | x 1.50 |
          ContainsKey_All_IntKey | 1000 | 10,815.405 ns |  7,220.128 ns | x 1.50 |
ContainsKey_Single_Yes_StringKey | 1000 |     20.489 ns |     19.925 ns | x 1.03 |
 ContainsKey_Single_No_StringKey | 1000 |     12.747 ns |     13.414 ns | x 0.95 |
ContainsKey_Single_Yes_ObjectKey | 1000 |     11.233 ns |      9.119 ns | x 1.23 |
 ContainsKey_Single_No_ObjectKey | 1000 |     11.192 ns |      8.560 ns | x 1.31 |
                      Add_IntKey | 1000 | 27,573.689 ns | 22,503.652 ns | x 1.23 |
                   Add_StringKey | 1000 | 76,095.748 ns | 77,253.701 ns | x 0.99 |
                   Add_ObjectKey | 1000 | 68,845.402 ns | 67,529.550 ns | x 1.02 |
              PreSizedAdd_IntKey | 1000 | 11,558.450 ns |  9,658.953 ns | x 1.20 |
           PreSizedAdd_StringKey | 1000 | 57,360.936 ns | 57,710.254 ns | x 0.99 |
           PreSizedAdd_ObjectKey | 1000 | 54,780.942 ns | 50,300.463 ns | x 1.09 |

Performance-wise, multipath was faster but that increased code-gen by 27kB #15419 (comment)

Resolves: https://github.com/dotnet/coreclr/issues/16258

@benaadams benaadams changed the title Dictionary - use EqualityComparer<TKey>.Default directly Dictionary - use EqualityComparer<TKey>.Default devirtualization Dec 7, 2017
@benaadams benaadams changed the title Dictionary - use EqualityComparer<TKey>.Default devirtualization Dictionary - EqualityComparer<TKey>.Default devirtualization Dec 7, 2017
@jkotas
Copy link
Member

jkotas commented Dec 7, 2017

@dotnet-bot test Windows_NT x64 Checked corefx_baseline

@jkotas
Copy link
Member

jkotas commented Dec 7, 2017

@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@jkotas
Copy link
Member

jkotas commented Dec 7, 2017

Do you have any numbers for this? Throughput, code size?

@benaadams
Copy link
Member Author

Was looking for this https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/benchmarking.md should have them shortly

@omariom
Copy link

omariom commented Dec 7, 2017

Is there a way to run perf tests from CoreFX? There are tests for Dictionary and HashSet there.

@benaadams
Copy link
Member Author

benaadams commented Dec 7, 2017

I've never worked out how to run the Microsoft.Xunit.Performance tests

@danmoseley
Copy link
Member

I haven't done it in a while, but this describes how to run the tests and of course you could build with /p:CoreCLROverridePath before doing that.

@benaadams
Copy link
Member Author

Failures are in Regex tests

System.OutOfMemoryException : Exception of type 'System.OutOfMemoryException' was thrown.

@benaadams benaadams changed the title Dictionary - EqualityComparer<TKey>.Default devirtualization [WIP] Dictionary - EqualityComparer<TKey>.Default devirtualization Dec 8, 2017
@benaadams
Copy link
Member Author

While its an improvement; its a bit fragile (changing the code a little can loose the improvement) - so I think it needs work

             Method |     Mean |    Error |   StdDev |
------------------- |---------:|---------:|---------:|
 Knucleotide (Pre)  | 72.88 ms | 1.493 ms | 1.777 ms |
 Knucleotide (Post) | 67.64 ms | 1.349 ms | 1.847 ms |

The corefx tests produce odd results; with variations in the same tests at different sizes; feels like GC - also there are no object key tests (are string keys, but that's a special case)

I think this might be a longer journey...

@omariom
Copy link

omariom commented Dec 8, 2017

Did you use BDN?

@benaadams
Copy link
Member Author

Did you use BDN?

Yeah, [InProcess] publish self-contained; copy coreclr release tests bin dlls (corefx+coreclr) on top; duplicate folder for each variant; copy Product coreclr dll over for each variant - bit manual; but means don't have to rebuild corefx each time to produce a NetCoreApp package using the right coreclr; which is a very slow iteration step

@AndyAyersMS
Copy link
Member

Doing a proper evaluation of these kinds of changes is not easy.

We don't have anything I know of that would be considered a representative perf test suite for dictionaries (covering different kinds of key types, comparers, load factors, loading history, collision patterns, hash functions, etc). Creating something like this would be very useful, but also a lot of work.

I have some estimates for the typical distribution of key types seen in internal Microsoft usage of dictionaries, but this is a static accounting (eg it looks at the memory image of a process, not at how often a dictionary is used and via what API). I'm not all that confident in the data yet, but it appears strings are the dominant key type, at least statically. We have been trying to look at comparer usage and typical load factors and collision chain distributions too, but haven't gotten very far with that yet. So maybe as this gets a bit further and we get some dynamic information fed in we can build a dictionary performance modelling framework.

Without that it's not obvious which cases to optimize for: there are almost always tradeoffs to be made, and we need guidance on which cases can become a bit slower without causing undue harm, what the worst-case impact is on any scenario, and whether the wins on some cases are sufficient to offset the losses in others.

Also we don't have a cookbook way of assessing whether code size increases we might incur in trying to optimize are justified by the improved performance.

I'm not trying to discourage this sort of exploration. It is interesting to know what is possible and how the performance characteristics can be altered.

But it is going to be hard to take changes here unless they are unconditionally better in all key aspects (cleaner code, smaller code, faster code, faster jitting).

@benaadams
Copy link
Member Author

Agreed. The issue I'm having is obvious "next step" improvements, don't improve things or regress (though not below baseline) - so the changes feel fragile.

So I think the next step is to have better performance tests to validate the changes and iterate from there. Its more complicated to test than I was doing previously as also have pick up the Jit changes; rather than have two sets of Dictionary implementation (e.g. for the devirtualization)

@omariom
Copy link

omariom commented Dec 8, 2017

@benaadams
What is Knucleotide?
I started porting Perf.Dictionary.cs but stuck at set_Item - there was a mysterious degradation, though with manual nano-benchmarking there was improvement.
I will publish them as gist tomorrow.

@benaadams
Copy link
Member Author

What is Knucleotide?

Its interesting use of Dictionary from benchmarks game, rather than just testing raw function time; though not specifically just testing dictionary

Example implementations in the JIT Performance/CodeQuality directory; though I think I was using an easier iteration of the algorithm.

@omariom
Copy link

omariom commented Dec 11, 2017

@benaadams

If still needed

Some perf tests
using System;
using System.Collections.Generic;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;

namespace BDN.Tests
{
    
    [Config(typeof(InProcConfig))]
    public class CollectiontestPefBenchmarks
    {
        private Dictionary<int, int> _dict;
        private int _key;
        
        const int Iterations = 20000;
        
        [Params(1000, 10_000, 100_000)]
        public int Size_Common { get; set; }
        
        #region Add

        [IterationSetup(Target=nameof(Add))]
        public void IterationSetup_Add()
        {
            _dict = new Dictionary<int, int>(CreateDictionary(Size_Common));
        }

        [Benchmark(OperationsPerInvoke = Iterations * 9)]
        public void Add()
        {
            for (int i = 0; i <= Iterations; i++)
            {
                _dict.Add(i * 10 + 1, 0);
                _dict.Add(i * 10 + 2, 0);
                _dict.Add(i * 10 + 3, 0);
                _dict.Add(i * 10 + 4, 0);
                _dict.Add(i * 10 + 5, 0);
                _dict.Add(i * 10 + 6, 0);
                _dict.Add(i * 10 + 7, 0);
                _dict.Add(i * 10 + 8, 0);
                _dict.Add(i * 10 + 9, 0);
            }
        }

        #endregion

        #region GetItem

        [IterationSetup(Target=nameof(GetItem))]
        public void IterationSetup_GetItem()
        {
            _dict = CreateDictionary(Size_Common);

            for (int i = 1; i <= 9; i++)
            {
                _dict.Add(i, 0);
            }
        }

        [Benchmark(OperationsPerInvoke = Iterations * 9)]
        public void GetItem()
        {
            int retrieved;

            for (int i = 0; i < Iterations; i++)
            {
                retrieved = _dict[1];
                retrieved = _dict[2];
                retrieved = _dict[3];
                retrieved = _dict[4];
                retrieved = _dict[5];
                retrieved = _dict[6];
                retrieved = _dict[7];
                retrieved = _dict[8];
                retrieved = _dict[9];
            }
        }

        #endregion

        #region SetItem

        [IterationSetup(Target=nameof(SetItem))]
        public void IterationSetup_SetItem()
        {
            _dict = CreateDictionary(Size_Common);

            for (int i = 1; i <= 9; i++)
            {
                _dict.Add(i, 0);
            }
        }

        [Benchmark(OperationsPerInvoke = Iterations * 9)]
        public void SetItem()
        {
            for (int i = 0; i < Iterations; i++)
            {
                _dict[1] = 0;
                _dict[2] = 0;
                _dict[3] = 0;
                _dict[4] = 0;
                _dict[5] = 0;
                _dict[6] = 0;
                _dict[7] = 0;
                _dict[8] = 0;
                _dict[9] = 0;
            }
        }

        #endregion

        #region TryGetValue

        [IterationSetup(Target = nameof(TryGetValue))]
        public void IterationSetup_TryGetValue()
        {
            _dict = CreateDictionary(Size_Common);
            // Setup - utils needs a specific seed to prevent key collision with TestData
            Random rand = new Random(837322);
            _key = rand.Next(0, 400000);
            _dict.Add(_key, 12);
        }

        [Benchmark(OperationsPerInvoke = Iterations * 9)]
        public void TryGetValue()
        {
            int value;
            int key = _key;

            for (int i = 0; i < Iterations; i++)
            {
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
                _dict.TryGetValue(key, out value);
            }
        }

        #endregion

        #region ContainsKey_Int_True

        [IterationSetup(Target = nameof(ContainsKey_Int_True))]
        public void IterationSetup_ContainsKey_Int_True()
        {
            _dict = new Dictionary<int, int>();

            for (int i = 0; i < Size_Common; i++)
            {
                _dict.Add(i, i);
            }
        }

        [Benchmark]
        public void ContainsKey_Int_True()
        {
            bool result = false;
            int iterations = _dict.Count;

            for (int i = 0; i < iterations; i++)
            {
                result = _dict.ContainsKey(i);
            }
        }

        #endregion

        private static Dictionary<int, int> CreateDictionary(int size)
        {
            Random rand = new Random(837322);
            Dictionary<int, int> dict = new Dictionary<int, int>();
            while (dict.Count < size)
            {
                int key = rand.Next(500000, int.MaxValue);
                dict.TryAdd(key, 0);
            }
            return dict;
        }
    }

    class InProcConfig : ManualConfig
    {
        public InProcConfig()
        {
            Add(Job.InProcess
                //.WithLaunchCount(1)
                .WithUnrollFactor(1)
                .WithInvocationCount(1)
                .WithId("InProcess"));
        }
    }
}

@benaadams benaadams force-pushed the dict-default-comparer branch from 4d8b7ac to b50667f Compare December 15, 2017 22:25
@benaadams benaadams changed the title [WIP] Dictionary - EqualityComparer<TKey>.Default devirtualization Dictionary - EqualityComparer<TKey>.Default devirtualization Dec 15, 2017
@benaadams benaadams closed this Dec 15, 2017
@benaadams benaadams reopened this Dec 15, 2017
@benaadams
Copy link
Member Author

@omariom tests, Plus a straight Add to empty dict and GC before each iteration for stability

Before

                     Method | Size_Common |            Mean |           Error |          Median |
--------------------------- |------------ |----------------:|----------------:|----------------:|
                        Add |        1000 |        40.08 ns |       4.3242 ns |        33.62 ns |
                    GetItem |        1000 |        10.96 ns |       0.3272 ns |        10.69 ns |
                    SetItem |        1000 |        11.62 ns |       0.1644 ns |        11.56 ns |
                TryGetValue |        1000 |        11.69 ns |       0.1441 ns |        11.67 ns |
 ContainsKey_Int_True_Total |        1000 |    10,826.17 ns |     631.9406 ns |    10,500.50 ns |
                  Add_Total |        1000 |    27,086.82 ns |   4,350.4026 ns |    25,560.43 ns |
                        Add |       10000 |        53.30 ns |       1.4740 ns |        54.26 ns |
                    GetItem |       10000 |        10.82 ns |       0.1663 ns |        10.81 ns |
                    SetItem |       10000 |        12.25 ns |       0.2434 ns |        12.26 ns |
                TryGetValue |       10000 |        12.09 ns |       0.2394 ns |        11.99 ns |
 ContainsKey_Int_True_Total |       10000 |   104,565.16 ns |   2,645.6341 ns |   103,465.48 ns |
                  Add_Total |       10000 |   307,192.85 ns |   5,521.5687 ns |   305,067.24 ns |
                        Add |      100000 |        60.51 ns |       1.3262 ns |        60.09 ns |
                    GetItem |      100000 |        10.74 ns |       0.1687 ns |        10.69 ns |
                    SetItem |      100000 |        11.72 ns |       0.2996 ns |        11.56 ns |
                TryGetValue |      100000 |        11.65 ns |       0.2297 ns |        11.58 ns |
 ContainsKey_Int_True_Total |      100000 | 1,037,571.45 ns |     457.3348 ns | 1,037,753.65 ns |
                  Add_Total |      100000 | 3,825,088.47 ns | 224,668.2170 ns | 3,367,917.89 ns |

After

                     Method | Size_Common |            Mean |           Error |          Median |
--------------------------- |------------ |----------------:|----------------:|----------------:|
                        Add |        1000 |        34.28 ns |       2.6295 ns |        32.36 ns |
                    GetItem |        1000 |         7.54 ns |       0.1193 ns |         7.56 ns |
                    SetItem |        1000 |         7.52 ns |       0.0662 ns |         7.49 ns |
                TryGetValue |        1000 |         8.10 ns |       0.1818 ns |         8.07 ns |
 ContainsKey_Int_True_Total |        1000 |     5,487.10 ns |     111.5480 ns |     5,388.42 ns |
                  Add_Total |        1000 |    23,512.64 ns |     513.0279 ns |    23,389.28 ns |
                        Add |       10000 |        67.47 ns |       5.9138 ns |        57.77 ns |
                    GetItem |       10000 |         7.52 ns |       0.1439 ns |         7.43 ns |
                    SetItem |       10000 |         7.67 ns |       0.0222 ns |         7.67 ns |
                TryGetValue |       10000 |         8.04 ns |       0.1263 ns |         7.98 ns |
 ContainsKey_Int_True_Total |       10000 |    53,739.41 ns |     658.6621 ns |    53,607.83 ns |
                  Add_Total |       10000 |   286,839.07 ns |   5,628.6049 ns |   282,980.66 ns |
                        Add |      100000 |        66.64 ns |       1.2882 ns |        66.88 ns |
                    GetItem |      100000 |         7.50 ns |       0.1166 ns |         7.44 ns |
                    SetItem |      100000 |         7.50 ns |       0.0293 ns |         7.49 ns |
                TryGetValue |      100000 |         8.10 ns |       0.2361 ns |         7.98 ns |
 ContainsKey_Int_True_Total |      100000 |   534,637.44 ns |     984.6944 ns |   534,440.07 ns |
                  Add_Total |      100000 | 3,590,623.28 ns | 191,663.7678 ns | 3,195,646.47 ns |

@benaadams
Copy link
Member Author

benaadams commented Dec 16, 2017

Added per iteration GC to cleanup; results more stable

Incorporates

Before

                           Method |      Mean |     Error |    StdDev |
--------------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive_Int | 10.457 ns | 0.0080 ns | 0.0067 ns | IEqualityComparer
    ContainsKeySimpleNegative_Int |  7.387 ns | 0.0100 ns | 0.0093 ns | IEqualityComparer
 ContainsKeyCollisionPositive_Int | 21.294 ns | 0.1610 ns | 0.1257 ns | IEqualityComparer
 ContainsKeyCollisionNegative_Int | 20.041 ns | 0.0380 ns | 0.0337 ns | IEqualityComparer
 ContainsKeySimplePositive_String | 21.382 ns | 0.0796 ns | 0.0744 ns | IEqualityComparer
 ContainsKeySimpleNegative_String | 17.249 ns | 0.0302 ns | 0.0282 ns | IEqualityComparer
   ContainsKeySimplePositive_Type | 16.377 ns | 0.0159 ns | 0.0133 ns | IEqualityComparer
   ContainsKeySimpleNegative_Type | 10.821 ns | 0.0243 ns | 0.0227 ns | IEqualityComparer

After

                           Method |      Mean |     Error |    StdDev |
--------------------------------- |----------:|----------:|----------:|
    ContainsKeySimplePositive_Int |  5.980 ns | 0.0117 ns | 0.0109 ns | x 1.74 devirtualized
    ContainsKeySimpleNegative_Int |  4.846 ns | 0.0073 ns | 0.0069 ns | x 1.52 devirtualized
 ContainsKeyCollisionPositive_Int | 15.208 ns | 0.0289 ns | 0.0271 ns | x 1.40 devirtualized
 ContainsKeyCollisionNegative_Int | 15.247 ns | 0.0330 ns | 0.0292 ns | x 1.31 devirtualized
 ContainsKeySimplePositive_String | 20.636 ns | 0.0324 ns | 0.0270 ns | x 1.03 IEqualityComparer
 ContainsKeySimpleNegative_String | 17.536 ns | 0.0174 ns | 0.0163 ns | x 0.98 IEqualityComparer
   ContainsKeySimplePositive_Type | 16.330 ns | 0.0199 ns | 0.0176 ns | x 1.00 virtual
   ContainsKeySimpleNegative_Type |  7.676 ns | 0.0080 ns | 0.0074 ns | x 1.41 virtual

@jkotas
Copy link
Member

jkotas commented Mar 1, 2018

@benaadams Thank you!

@HFadeel
Copy link

HFadeel commented Mar 2, 2018

@AndyAyersMS would love to see this in CoreCLR 2.1 and future .NET

@danmoseley
Copy link
Member

@HFadeel it should be.

@omariom
Copy link

omariom commented Mar 2, 2018

I am sure a lot of code will suddenly get faster

@AndyAyersMS
Copy link
Member

We'd be quite interested in independent observations on the perf impact of this change, both good and (perhaps especially important) bad.

Looks like it may take a few days to get these bits out into the nightly build packages. Once that happens we should perhaps do a bit of advertising....

They should also be in 2.1 preview 2.

@benaadams benaadams deleted the dict-default-comparer branch March 2, 2018 21:15
@AndyAyersMS
Copy link
Member

Perf infrastructure is finally back on line. Data shows this change gave roughly 6% improvement on k-nuckeotide-1 and 4.5% on k-nuckeotide-9. It should help close the gap vs Java somewhat, from 1.32x to something more like 1.25x (assuming proportional speedup on those ancient machines).

Probably should look at this test anew and see what else we might be able to do.

cc @danmosemsft @tannergooding

image

@benaadams
Copy link
Member Author

Good news!

The Benchmarks Game tests also I believe include Jit and startup time; was the extra cost from ArrayPoolEventSource in 2.0? https://github.com/dotnet/coreclr/issues/15954

I assume so; and that has also gone.

@AndyAyersMS
Copy link
Member

Looking back a bit further at -9, the overall speedup over 2.1 may be more like 1.125x (2720/2415), so vs java maybe now only ~1.17x slower. Seems like enabling software write watch (#16516) may have helped here too.

(Actually forgot official runs are on Ubuntu -- speedup over 2.0 is looks similar though 3400/3000 or about 1.13x).

We don't measure jit costs in our HW lab (we discard the initial iteration). Maybe Tanner can do some pseudo-official runs...

@danmoseley
Copy link
Member

The Benchmarks game measures the total time the process runs, including JIT, although the machine is warmed up first.

I would love @tannergooding or @ViktorHofer to get a full set of CLBG numbers on the special machine this week.

@simplejackcoder
Copy link

PreCondition: I'm not trying to be a smart alec or rude.

This improvement for TryGetValue (33%) made me think some very clever algorithm improvement or major bug like looping through twice.

I check code change and there is no such change.

If I read it correctly I should create my own dictionary for best fastest perf. IntDictionary, LongDictionary, StringDictionary -- I want @mikedn implementation which is even faster.

@benaadams
Copy link
Member Author

This improvement for TryGetValue (33%) made me think some very clever algorithm improvement or major bug like looping through twice.

I check code change and there is no such change.

There was a change to the Jit #14125 to turn EqualityComparer<T>.Default.Equals into an intrinsic. So if you have a value type T (struct) that implements IEquatable<T> then it will devitualize it. For example for int the Equals then becomes a single cpu instruction.

This code change was to take advantage of that Jit change in Dictionary when you don't specify your own custom IEqualityComparer<TKey> comparer in the constructor.

If I read it correctly I should create my own dictionary for best fastest perf. IntDictionary, LongDictionary, StringDictionary

You will get this advantage just by using the regular dictionary constructor for int and long and not using the overload that specifies an IEqualityComparer<TKey>
e.g.

var intDict = new Dictionary<int, MyValue>();
var longDict = new Dictionary<long, MyValue>();

string is a bit different as it also has DoS protection that kicks in if you start getting too many key collisions by moving to a randomised HashCode generator which is more expensive but reduces the collisions for a net performance gain. This is doubly important as string Dictionary keys can be externally controlled (like entries in a forms, route or query string collection for a webserver)

If you have a specific scenario for object type keys you want to optimize for you can create a struct wrapper and use that as the key for a regular dictionary for the same behaviour. For example you could create a TypeKey and use that to create a TypeDictionary that is optimized for ReferenceEquality on the types.

@mikedn
Copy link

mikedn commented Jul 5, 2018

I want @mikedn implementation which is even faster.

Note that my implementation is specific to the k-nucleotide benchmark. Not to say that it cannot be used for other purposes but it's certainly not a general purpose hashtable. For example, it takes advantage of the fact that the hashcode of an integer is fast to compute so it doesn't store the hash code in the Entry object. That saves 8 bytes per entry and considering that the hashtable is pretty large that's pretty good. And AFAIR I customized the hash code computation as well.

If I read it correctly I should create my own dictionary for best fastest perf. IntDictionary, LongDictionary, StringDictionary

Yes. Assuming that "best perf" is indeed relevant to your application. If the application spends 5% of its time in hashtable code then creating your own hashtable is quite possible a waste of time. If the application spends 95% of its time in hashtable code then creating your own hashtable (or whatever suitable data structure is required) is something that you may want to consider.

@ViktorHofer
Copy link
Member

@mikedn did we ever make use of your implementation for k-nucleotide, i.e. in a nuget package similar to what Java does in their benchmark?

@mikedn
Copy link

mikedn commented Jul 5, 2018

@ViktorHofer Not that I know of. In fact I'm not even sure where I put that thing, need to search my drive for it.

@Zhentar
Copy link

Zhentar commented Jul 11, 2018

100% Real™ code, 100% Real™ data, very Dictionary heavy (mostly <string, object>, and <int, string>)

Method Runtime Toolchain Mean Error StdDev
ParseLargeFile Clr Default 6.679 s 0.3603 s 0.1600 s
ParseLargeFile Core .NET Core 2.0 6.208 s 0.2307 s 0.0599 s
ParseLargeFile Core .NET Core 2.1 5.657 s 0.3788 s 0.1682 s

From profiling, it looks like this change is about half of the improvement between .Net 4.7 & Core 2.1 💯

@danmoseley
Copy link
Member

It's like we need partial specialization in C# so we can have Dictionary<int, T>, but from reading this discussion it's not clear to me whether that's even possible for generics, or if it is, whether it would have non zero costs (shapes implemented with interfaces?)

Maybe the way forward is for code gen to have special knowledge of Dictionary. It's critical enough that the ickyness may be worthwhile.

@benaadams
Copy link
Member Author

Dictionary<int, T> is covered by this PR; is where TKey is an object type that its not.

From what I understand; shapes are like the Surface Phone, they are a panacea for all problems, and probably both will be released at the same time IEnumerable<T> devirualizes and becomes non-allocating 😉

@mikedn
Copy link

mikedn commented Jul 23, 2018

@ViktorHofer and everyone else who may be interested: My version is here: https://gist.github.com/mikedn/8dc555a867aba65b6e26f39cf3cb66eb

On my machine it's about 1.4x faster than the current top C# entry on Benchmarks Game. If this factor translates to Benchmarks Game's environment then it will probably be almost as fast as the current top entry (a C++ version).

That said, it goes a bit beyond "just use another hashtable" approach. It has been customized specifically for knucleotide (e.g. the hashBits = 18 thing). If you're trying to make a more generic hashtable then performance might suffer (maybe 1.2x instead of 1.4x, I don't remember).

And to be clear: I don't want to encourage people to write their own hashtables because "omg, this is faster". But it's something you can do in very specific cases.

@danmoseley
Copy link
Member

@AnthonyLloyd

@AnthonyLloyd
Copy link

Rules of the k-nukleotide are that it has to use a hash table but you are not allowed to implement one yourself and can only nuget.

My increment method sneaks past this just.

@mikedn
Copy link

mikedn commented Jul 23, 2018

Yeah, I know the rules, that's what's I'm not really bothering with it (that is, I'm not going to waste my time to create a nuget package for my custom hashtable).

My increment method sneaks past this just.

Heh, it's odd that they accepted that because what you did can be equated with a custom hashtable implementation. Well, at least the hashtable lookup part.

@danmoseley
Copy link
Member

Hmm it is tempting to create such a package if it would be the easiest way to improve perf on this benchmark. It's basically what Java does. Maybe we could put in in corefx lab..

@ViktorHofer
Copy link
Member

I believed that custom packages are also not allowed? The java one was an exception and was criticized in the forum. Am I wrong?

@AnthonyLloyd
Copy link

It has to be a package with a reasonable number of downloads. The Java one is part of something else I think.

@danmoseley
Copy link
Member

Maybe it could be part of https://github.com/dotnet/corefx/issues/31191 (if it's binary). At least as a general purpose long->int map not something hacked for the benchmark.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.