Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InlineArray in generic class #106129

Closed
iSazonov opened this issue Aug 7, 2024 · 9 comments · Fixed by #106185
Closed

InlineArray in generic class #106129

iSazonov opened this issue Aug 7, 2024 · 9 comments · Fixed by #106185
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@iSazonov
Copy link
Contributor

iSazonov commented Aug 7, 2024

Steps to Reproduce:

This example code does nothing with the parametric type value except to read and return as a result.
Therefore, it was surprising to find a strong difference in the code size and performance for value and reference types.
Is this expected?

See sharplab

Expected Behavior:

Code for value types and reference types is almost the same, performance is almost the same.

Actual Behavior:

Code for reference types is much larger and works 2-3 times slower.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Aug 7, 2024
@jaredpar
Copy link
Member

jaredpar commented Aug 8, 2024

Moving to runtime as this is a question about the ASM code generation.

@jaredpar jaredpar transferred this issue from dotnet/roslyn Aug 8, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 8, 2024
@vcsjones vcsjones added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 8, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jakobbotsch
Copy link
Member

The generic type is used for InlineArrayAsSpan calls generated under-the-hood by Roslyn. The JIT does not manage to remove all those runtime lookups as some of them feed control flow (in the end, there is no difference in taking the true and false path for these tests, but the JIT's DCE is not able to reason that far).
We could clean this up if we taught the early liveness passes how to DCE other things than stores, since then we can do it before we expand the runtime lookups into control flow here.

@jakobbotsch jakobbotsch added this to the 10.0.0 milestone Aug 8, 2024
@jakobbotsch jakobbotsch removed the untriaged New issue has not been triaged by the area owner label Aug 8, 2024
@iSazonov
Copy link
Contributor Author

iSazonov commented Aug 9, 2024

For current 9.0 milestone I'd ask to add a warning in docs "Don't use InlineArray with reference types for performance reason". This will save someone a lot of time searching for the reasons why their code is slow.

@jakobbotsch
Copy link
Member

Can you share a benchmark that shows a perf problem warranting such a statement? Your code here always throws NullReferenceException.

@iSazonov
Copy link
Contributor Author

iSazonov commented Aug 9, 2024

Yes, sharplab example is minimal asm demo.

I attached working benchmark project for you.
genericlist.zip

@jakobbotsch
Copy link
Member

jakobbotsch commented Aug 9, 2024

I get the following results:

Method Mean Error StdDev Ratio RatioSD Code Size
Array 3.570 ns 0.0861 ns 0.0763 ns 1.00 0.00 71 B
Array2 5.888 ns 0.0634 ns 0.0593 ns 1.65 0.05 43 B
IPAddress1 6.196 ns 0.1363 ns 0.1208 ns 1.74 0.06 145 B
IPAddress21 3.590 ns 0.0596 ns 0.0558 ns 1.01 0.03 53 B

In general we expect shared generic code to be slightly slower than unshared code. For example, you will also see a slowdown for List<string>.set_Item compared to List<int>.set_Item. We don't try to document this anywhere, so I'm not sure it's warranted to do it for this case either, especially considering that I expect we can improve on this (either by improving our DCE, or by implementing #105586).

Thoughts @dotnet/jit-contrib @jkotas?

jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Aug 9, 2024
Currently only the backend liveness pass will DCE calls without side
effects when their result is unused. However, unused runtime lookups can
end up expanded to control flow that we will be unable to DCE. This
teaches the middle-end liveness pass to DCE dead calls as well.

Fix dotnet#106129
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Aug 9, 2024
@iSazonov
Copy link
Contributor Author

iSazonov commented Aug 9, 2024

In general we expect shared generic code to be slightly slower than unshared code. For example, you will also see a slowdown for List.set_Item compared to List.set_Item.

While I was researching this problem, I measured this example too and the differences there are scanty, there is an understanding of this and it is acceptable.

If we compare 1 and 3 lines (InlineArray code - value vs referense) - diff is 2.47!

| Method      | Mean      | Error     | StdDev    | Ratio | RatioSD | Code Size |
|------------ |----------:|----------:|----------:|------:|--------:|----------:|
| Array       | 13.817 ns | 0.1581 ns | 0.1479 ns |  1.00 |    0.00 |      81 B |
| Array2      |  9.376 ns | 0.2055 ns | 0.1922 ns |  0.68 |    0.02 |      47 B |
| IPAddress1  | 34.252 ns | 0.4945 ns | 0.4129 ns |  2.47 |    0.04 |     184 B |
| IPAddress21 | 11.089 ns | 0.2526 ns | 0.2363 ns |  0.80 |    0.02 |      54 B |

I don't insist on documenting. But the problem is that there are intuitive expectations that the new InlineArray is designed for high-performance code, although it works worse than the usual language constructs.

jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue Aug 9, 2024
`gtNodeHasSideEffects` is meant to check if a node has side effects when
you exclude its children. However, it was checking arguments of calls
which is more conservative than expected.

The actual reason we were doing that seems to be `gtTreeHasSideEffects`
that sometimes wants to ignore `GTF_CALL` on pure helper calls. It was
relying on this check in `gtNodeHasSideEffect`; instead move it to
`gtTreeHasSideEffects` where it belongs.

This is an alternative fix for dotnet#106129; there we leave a
`COMMA(CORINFO_HELP_RUNTIMELOOKUP_METHOD, ...)` around because
extracting side effects from op1 does not end up getting rid of the
call.

Fix dotnet#106129
@jakobbotsch
Copy link
Member

I think we can get a fix for this into .NET 9. #106183 and #106185 both fix the performance disparity here.

@jakobbotsch jakobbotsch modified the milestones: 10.0.0, 9.0.0 Aug 9, 2024
@jakobbotsch jakobbotsch self-assigned this Aug 9, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Sep 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
4 participants