Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

DictionarySlim backport improvements: remove null check on insert #22599

Closed
wants to merge 1 commit into from
Closed

DictionarySlim backport improvements: remove null check on insert #22599

wants to merge 1 commit into from

Conversation

MarcoRossignoli
Copy link
Member

@MarcoRossignoli MarcoRossignoli commented Feb 14, 2019

contributes to https://github.com/dotnet/corefx/issues/33392

@danmosemsft sorry for delay I had to learn new funny tools.
I thought to tackle backporting one point at time.

Due diligence:
Asm before:

G_M9942_IG02:
       test     rdi, rdi
       je       G_M9942_IG31

G_M9942_IG03:
       cmp      gword ptr [rsi+8], 0
       jne      SHORT G_M9942_IG04
       mov      rcx, rsi
       xor      edx, edx
       call     Dictionary`2:Initialize(int):int:this

G_M9942_IG04:
       mov      r14, gword ptr [rsi+16]
       mov      r15, gword ptr [rsi+24]
       test     r15, r15
       je       SHORT G_M9942_IG06
       mov      rcx, qword ptr [rsi]
       mov      rdx, qword ptr [rcx+48]
       mov      rdx, qword ptr [rdx]
       mov      r11, qword ptr [rdx+48]
       test     r11, r11
       jne      SHORT G_M9942_IG05
...
     jmp      SHORT G_M9942_IG26

G_M9942_IG24:
       mov      r14d, dword ptr [rsi+48]
       mov      r9d, dword ptr [rsp+58H]
       cmp      r9d, r14d
       jne      SHORT G_M9942_IG25
       mov      ecx, dword ptr [rsi+48]
       call     HashHelpers:ExpandPrime(int):int
       mov      edx, eax
       mov      rcx, rsi
       xor      r8d, r8d
       call     Dictionary`2:Resize(int,bool):this
       mov      rcx, gword ptr [rsi+8]
; Total bytes of code 1069, prolog size 33 for method Dictionary`2:TryInsert(ref,ref,ubyte):bool:this

After

G_M9942_IG02:
       test     rdi, rdi
       je       G_M9942_IG31

G_M9942_IG03:   <-- NO MORE INIT CALL
       mov      r14, gword ptr [rsi+16]
       mov      r15, gword ptr [rsi+24]
       test     r15, r15
       je       SHORT G_M9942_IG05
       mov      rcx, qword ptr [rsi]
       mov      rdx, qword ptr [rcx+48]
       mov      rdx, qword ptr [rdx]
       mov      r11, qword ptr [rdx+64]
       test     r11, r11
       jne      SHORT G_M9942_IG04
...
  jmp      SHORT G_M9942_IG26

G_M9942_IG23:
       mov      r14d, dword ptr [rsi+48]
       mov      r9d, dword ptr [rsp+58H]
       cmp      r9d, r14d
       je       SHORT G_M9942_IG24
       cmp      r9d, 1 <-- NEW COMPARE I tried also with something like (count % entries.Length == 0) to avoid branching but it emits CDQ+IDIV 
       jne      SHORT G_M9942_IG25

G_M9942_IG24:
       mov      ecx, dword ptr [rsi+48]
       call     HashHelpers:ExpandPrime(int):int
       mov      edx, eax
       mov      rcx, rsi
       xor      r8d, r8d
       call     Dictionary`2:Resize(int,bool):this
       mov      rcx, gword ptr [rsi+8]
; Total bytes of code 1058, prolog size 33 for method Dictionary`2:TryInsert(ref,ref,ubyte):bool:this

I run some perf test and added one more to avoid ctor(not foundamental but one more test is better than nothing, thank's @benaadams for your help with tools and for test idea)
Perf before

BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17134.590 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2727537 Hz, Resolution=366.6311 ns, Timer=TSC
.NET Core SDK=3.0.100-preview-010184
  [Host]     : .NET Core 3.0.0-preview-27324-5 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7311), 64bit RyuJIT
  Job-PVCEHZ : .NET Core c5d17b79-c72f-4766-bfda-9090f2cf214d (CoreCLR 4.6.27511.0, CoreFX 4.7.19.11201), 64bit RyuJIT

Runtime=Core  Toolchain=CoreRun  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Type Method Count Size Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
TryAddDefaultSize<Int32> Dictionary 512 ? 35.54 us 0.2546 us 0.2126 us 35.51 us 35.23 us 35.97 us 6.3631 3.1109 - 34496 B
TryAddDefaultSize<String> Dictionary 512 ? 55.15 us 0.6758 us 0.5643 us 55.01 us 54.56 us 56.80 us 7.6220 3.7021 - 48096 B
TryAddGiventSize<Int32> Dictionary 512 ? 40.25 us 0.6686 us 0.6254 us 40.14 us 39.39 us 41.52 us 8.2645 4.1322 - 44968 B
TryAddGiventSize<String> Dictionary 512 ? 65.10 us 2.4010 us 2.7650 us 65.55 us 59.30 us 69.94 us 9.8180 4.7893 - 62736 B
AddDefaultSize<Int32> Dictionary 512 ? 36.45 us 0.7291 us 0.7487 us 36.62 us 35.34 us 37.87 us 6.3685 3.1102 - 34496 B
AddDefaultSize<String> Dictionary 512 ? 55.15 us 0.9617 us 0.8996 us 55.03 us 54.14 us 56.78 us 7.6220 3.7021 - 48096 B
AddGivenSize<Int32> Dictionary ? 512 41.97 us 1.0489 us 1.2079 us 41.59 us 40.35 us 44.72 us 10.6348 0.1636 - 44968 B
AddGivenSize<String> Dictionary ? 512 61.57 us 2.3745 us 2.6393 us 60.97 us 58.90 us 67.27 us 9.8315 4.9157 - 62736 B
BackPorting<Int32> Dictionary 100000000 ? 2,243,635.83 us 35,500.5906 us 33,207.2745 us 2,266,143.41 us 2,196,831.43 us 2,272,582.19 us - - - -
BackPorting<String> Dictionary 100000000 ? 3,535,378.28 us 67,017.5894 us 59,409.3474 us 3,533,304.04 us 3,468,903.26 us 3,661,420.91 us - - - -

After

BenchmarkDotNet=v0.11.3.1003-nightly, OS=Windows 10.0.17134.590 (1803/April2018Update/Redstone4)
Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores
Frequency=2727537 Hz, Resolution=366.6311 ns, Timer=TSC
.NET Core SDK=3.0.100-preview-010184
  [Host]     : .NET Core 3.0.0-preview-27324-5 (CoreCLR 4.6.27322.0, CoreFX 4.7.19.7311), 64bit RyuJIT
  Job-WYQKEP : .NET Core 2d4ba7fe-43e8-4855-a9ca-12422f23e7e2 (CoreCLR 4.6.27511.0, CoreFX 4.7.19.11201), 64bit RyuJIT

Runtime=Core  Toolchain=CoreRun  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1  
Type Method Count Size Mean Error StdDev Median Min Max Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
TryAddDefaultSize<Int32> Dictionary 512 ? 36.01 us 0.6730 us 0.6911 us 35.81 us 35.18 us 37.66 us 6.3073 3.1537 - 34496 B
TryAddDefaultSize<String> Dictionary 512 ? 55.47 us 0.6674 us 0.6242 us 55.36 us 54.39 us 56.60 us 7.5088 3.7544 - 48096 B
TryAddGiventSize<Int32> Dictionary 512 ? 40.37 us 0.7075 us 0.6618 us 40.43 us 39.39 us 41.50 us 8.1941 4.0167 - 44968 B
TryAddGiventSize<String> Dictionary 512 ? 60.70 us 0.9041 us 0.8015 us 60.67 us 59.23 us 62.47 us 10.0402 5.0201 - 62736 B
AddDefaultSize<Int32> Dictionary 512 ? 35.94 us 0.2816 us 0.2496 us 35.84 us 35.60 us 36.42 us 6.2929 3.1465 - 34496 B
AddDefaultSize<String> Dictionary 512 ? 55.12 us 0.7530 us 0.7043 us 55.01 us 54.22 us 56.53 us 7.7025 3.7412 - 48096 B
AddGivenSize<Int32> Dictionary ? 512 40.73 us 0.5031 us 0.4460 us 40.57 us 40.18 us 41.71 us 8.3442 4.0903 - 44968 B
AddGivenSize<String> Dictionary ? 512 61.19 us 1.1967 us 1.2289 us 61.67 us 59.21 us 62.44 us 10.0098 4.8828 - 62736 B
BackPorting<Int32> Dictionary 100000000 ? 2,212,302.33 us 22,274.7043 us 18,600.3870 us 2,206,063.93 us 2,190,344.99 us 2,239,182.09 us - - - -
BackPorting<String> Dictionary 100000000 ? 3,522,100.13 us 61,572.7120 us 54,582.6054 us 3,502,981.81 us 3,466,822.63 us 3,665,386.02 us - - - -

Comparer result thank's to @adamsitnik
No Slower results for the provided threshold = 1% and noise filter = 0.3ns.

Faster base/diff Base Median (ns) Diff Median (ns) Modality
System.Collections.TryAddGiventSize.Dictionary(Count: 512) 1.08 65551.31 60667.73
System.Collections.AddGivenSize.Dictionary(Size: 512) 1.03 41593.31 40571.00

BTW it's not easy compare as expected, sometimes I get better result and other worste.

/cc @AnthonyLloyd @Zhentar

@jkotas
Copy link
Member

jkotas commented Feb 14, 2019

it's not easy compare as expected, sometimes I get better result and other worste.

It is explainable that you are not able to measure any throughput gains from this change. The one extra null check you are removing will be very well predicted when TryInsert is called in the loop because of it is rarely taken. It means that it will costs next to nothing, especially given the size and complexity of the method that it is part of.

On the other hand, this increases Dictionary static footprint (adds static constructor, adds a static field, adds extra or more checks to multiple other places).

Thanks for giving it a try, but this change does not look like an improvement to me.

@MarcoRossignoli
Copy link
Member Author

Thanks for giving it a try, but this change does not look like an improvement to me.

I agree with all your analysis but I have to show that I did my homework ☺️
@danmosemsft if also you agree, could you draw a straight line on third point on issue?I'll go on with other.


// Default initialization to avoid null check for every insert
// Reads will fail and first add cause a resize into a real array
if (_entries is null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it always be null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If capacity passed is > 0 we call Initialize(capacity)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then wouldn't this be an else off of that? Or initialize these first and then do that check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

@@ -53,6 +52,7 @@ private struct Entry
private IEqualityComparer<TKey> _comparer;
private KeyCollection _keys;
private ValueCollection _values;
private static readonly Entry[] InitialEntries = new Entry[1] { new Entry() { hashCode = -1 } };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_initialEntries

int currentCapacity = _entries == null ? 0 : _entries.Length;

// _entries.Length == 1 is dummy entries
int currentCapacity = _entries.Length == 1 ? 0 : _entries.Length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's removing a null check but adding in one or more _entries.Length == 1 checks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was remove check null for main method that adds so here we don't have null anymore but dummy len. BTW I agree with @jkotas and this improvement is not good for Dictionary, but as I said I published my thought/result

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#22599 (comment) do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you agree?

Yes. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm exploring if some improvement done on DictionartSlim(corefxlab) can improve Dictionary.

@stephentoub
Copy link
Member

Thanks for the attempt.

@stephentoub
Copy link
Member

could you draw a straight line on third point on issue?

Done

@MarcoRossignoli
Copy link
Member Author

Thank's for your time!

@danmoseley
Copy link
Member

Thanks for trying. BTW this was not to improve throughput but reduce footprint of empty dictionaries.

@MarcoRossignoli
Copy link
Member Author

Thanks for trying. BTW this was not to improve throughput but reduce footprint of empty dictionaries.

Ah ok from issue I misunderstood Eliminate _buckets == null check on every access anyway seems there isn't room for improve here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants