Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jiterpreter performance issues around Dictionary and GenericEqualityComparer #82948

Closed
kg opened this issue Mar 3, 2023 · 7 comments · Fixed by #85528
Closed

Jiterpreter performance issues around Dictionary and GenericEqualityComparer #82948

kg opened this issue Mar 3, 2023 · 7 comments · Fixed by #85528
Labels
arch-wasm WebAssembly architecture area-Codegen-Interpreter-mono tenet-performance Performance related issue
Milestone

Comments

@kg
Copy link
Member

kg commented Mar 3, 2023

Description

The way Dictionary and GenericEqualityComparer interact causes performance issues in the Jiterpreter that would be hard to fix without changes to the interpreter and/or the BCL. For more details see dotnet/perf-autofiling-issues#12762 (comment) but to summarize:

The Dictionary<int, int>.ContainsValue benchmark regressed once the jiterpreter was enabled.

  • Dictionary.ContainsValue is written assuming multiple optimizations will happen, and very few (or none) of them happen in this configuration
  • GenericEqualityComparer<int> is used for the benchmark that regressed and it is also very inefficient in this configuration
  • The result is slow mono interpreter opcodes that also hit some of the jiterpreter's worst case scenarios
  • It is difficult (perhaps impossible) to fix this on the jiterpreter side, the regression can probably be made smaller but eliminating it would require blocking compilation for Dictionary and perhaps GenericEqualityComparer. I am unable to imagine a heuristic that would detect this reliably.

Configuration

This applies to WebAssembly non-AOT (mono interpreter) with the jiterpreter enabled.

Regression?

Performance regresses for this code with the jiterpreter enabled

Data

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu%2018.04_CompliationMode=wasm_RunKind=micro/System.Collections.Tests.Perf_Dictionary.ContainsValue(Items%3a%203000).html

Analysis

Candidate changes that might entirely make this go away, while also just making these code paths faster in the mono interpreter:

  • Turn typeof(T).IsValueType into a constant when generating code in the interpreter (does this happen already? I can't tell for sure), then do DCE based on it to reduce the amount of code in the method. This doesn't seem easy to me.
  • Move the comparison against null to only happen after we've checked whether T is a ValueType. The comparisons against null cause the value to be boxed.
  • Introduce a GenericValueTypeEqualityComparer. Its equals method will not contain boxing (much faster!) and will be smaller, which might cause it to get inlined. It does already have AggressiveInlining applied to it, but it's not inlined in this case.
  • Define a special comparer for int, like we already did for byte. This might also allow the comparison to get inlined here, and int is a pretty common type to use as a key or value in containers so it makes sense to have a dedicated comparer for it.
@kg kg added arch-wasm WebAssembly architecture tenet-performance Performance related issue labels Mar 3, 2023
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 3, 2023
@ghost
Copy link

ghost commented Mar 3, 2023

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

The way Dictionary and GenericEqualityComparer interact causes performance issues in the Jiterpreter that would be hard to fix without changes to the interpreter and/or the BCL. For more details see dotnet/perf-autofiling-issues#12762 (comment) but to summarize:

The Dictionary<int, int>.ContainsValue benchmark regressed once the jiterpreter was enabled.

  • Dictionary.ContainsValue is written assuming multiple optimizations will happen, and very few (or none) of them happen in this configuration
  • GenericEqualityComparer<int> is used for the benchmark that regressed and it is also very inefficient in this configuration
  • The result is slow mono interpreter opcodes that also hit some of the jiterpreter's worst case scenarios
  • It is difficult (perhaps impossible) to fix this on the jiterpreter side, the regression can probably be made smaller but eliminating it would require blocking compilation for Dictionary and perhaps GenericEqualityComparer. I am unable to imagine a heuristic that would detect this reliably.

Configuration

This applies to WebAssembly non-AOT (mono interpreter) with the jiterpreter enabled.

Regression?

Performance regresses for this code with the jiterpreter enabled

Data

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu%2018.04_CompliationMode=wasm_RunKind=micro/System.Collections.Tests.Perf_Dictionary.ContainsValue(Items%3a%203000).html

Analysis

Candidate changes that might entirely make this go away, while also just making these code paths faster in the mono interpreter:

  • Turn typeof(T).IsValueType into a constant when generating code in the interpreter (does this happen already? I can't tell for sure), then do DCE based on it to reduce the amount of code in the method. This doesn't seem easy to me.
  • Move the comparison against null to only happen after we've checked whether T is a ValueType. The comparisons against null cause the value to be boxed.
  • Introduce a GenericValueTypeEqualityComparer. Its equals method will not contain boxing (much faster!) and will be smaller, which might cause it to get inlined. It does already have AggressiveInlining applied to it, but it's not inlined in this case.
  • Define a special comparer for int, like we already did for byte. This might also allow the comparison to get inlined here, and int is a pretty common type to use as a key or value in containers so it makes sense to have a dedicated comparer for it.
Author: kg
Assignees: -
Labels:

arch-wasm, tenet-performance

Milestone: -

@ghost
Copy link

ghost commented Mar 3, 2023

Tagging subscribers to this area: @BrzVlad, @kotlarmilos
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

The way Dictionary and GenericEqualityComparer interact causes performance issues in the Jiterpreter that would be hard to fix without changes to the interpreter and/or the BCL. For more details see dotnet/perf-autofiling-issues#12762 (comment) but to summarize:

The Dictionary<int, int>.ContainsValue benchmark regressed once the jiterpreter was enabled.

  • Dictionary.ContainsValue is written assuming multiple optimizations will happen, and very few (or none) of them happen in this configuration
  • GenericEqualityComparer<int> is used for the benchmark that regressed and it is also very inefficient in this configuration
  • The result is slow mono interpreter opcodes that also hit some of the jiterpreter's worst case scenarios
  • It is difficult (perhaps impossible) to fix this on the jiterpreter side, the regression can probably be made smaller but eliminating it would require blocking compilation for Dictionary and perhaps GenericEqualityComparer. I am unable to imagine a heuristic that would detect this reliably.

Configuration

This applies to WebAssembly non-AOT (mono interpreter) with the jiterpreter enabled.

Regression?

Performance regresses for this code with the jiterpreter enabled

Data

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_ubuntu%2018.04_CompliationMode=wasm_RunKind=micro/System.Collections.Tests.Perf_Dictionary.ContainsValue(Items%3a%203000).html

Analysis

Candidate changes that might entirely make this go away, while also just making these code paths faster in the mono interpreter:

  • Turn typeof(T).IsValueType into a constant when generating code in the interpreter (does this happen already? I can't tell for sure), then do DCE based on it to reduce the amount of code in the method. This doesn't seem easy to me.
  • Move the comparison against null to only happen after we've checked whether T is a ValueType. The comparisons against null cause the value to be boxed.
  • Introduce a GenericValueTypeEqualityComparer. Its equals method will not contain boxing (much faster!) and will be smaller, which might cause it to get inlined. It does already have AggressiveInlining applied to it, but it's not inlined in this case.
  • Define a special comparer for int, like we already did for byte. This might also allow the comparison to get inlined here, and int is a pretty common type to use as a key or value in containers so it makes sense to have a dedicated comparer for it.
Author: kg
Assignees: -
Labels:

arch-wasm, tenet-performance, untriaged, area-Codegen-Interpreter-mono

Milestone: -

@lambdageek
Copy link
Member

I think regardless of anything else we choose to do, we should make the interpreter recognize these patterns (sharplab.io sample)

        IL_0000: ldtoken !!T
        IL_0005: call class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
        IL_000a: call instance bool [System.Runtime]System.Type::get_IsValueType()

and

        IL_0001: box !!T
        IL_0006: ldnull
        IL_0007: ceq

@pavelsavara
Copy link
Member

System.Collections.Immutable.Tests are very slow test suite. Is that somehow related ?

Log

for example
System.Collections.Frozen.Tests.FrozenSet_Generic_Tests_string_NonDefault.ComparingWithOtherSets(size: 5000) takes 16 seconds on V8 version: 11.3.203
There are many slow tests in the suite.

@pavelsavara pavelsavara added this to the 8.0.0 milestone Apr 26, 2023
@pavelsavara pavelsavara removed the untriaged New issue has not been triaged by the area owner label Apr 26, 2023
@BrzVlad
Copy link
Member

BrzVlad commented Apr 26, 2023

I'll take a look to see if there are any easy optimizations to be done here

@kg
Copy link
Member Author

kg commented Apr 27, 2023

System.Collections.Immutable.Tests are very slow test suite. Is that somehow related ?

Log

for example System.Collections.Frozen.Tests.FrozenSet_Generic_Tests_string_NonDefault.ComparingWithOtherSets(size: 5000) takes 16 seconds on V8 version: 11.3.203 There are many slow tests in the suite.

We could do a test run with the jiterpreter disabled to see if it gets faster. There are corner cases where jiterpreter traces are slower than interp (due to branch prediction, I believe) but I wouldn't expect to see a large time penalty.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 28, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 23, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-Codegen-Interpreter-mono tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants