Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate why crossgen works slower with TieredCompilation/PGO #83112

Open
EgorBo opened this issue Mar 7, 2023 · 7 comments
Open

Investigate why crossgen works slower with TieredCompilation/PGO #83112

EgorBo opened this issue Mar 7, 2023 · 7 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Mar 7, 2023

I was measuring crossgen2.exe -O SPC.dll (actually, the exact command we use for build Clr.NativeCoreLib -c Release) and noticed a few problems:

Mode Time to prejit SPC.dll, seconds
TC=1 (Default) 4.81
TC=0 4.25
TC=1, CCDelayMS=0 3.78
TC=1, PGO=1 5.29
TC=1, PGO=1, CCDelayMS=0 3.93

Legend:

  • TC - DOTNET_TieredCompilation (1 by default)
  • PGO - DOTNET_TieredPGO (0 by default)
  • CCDelayMS - DOTNET_TC_CallCountThreshold (100 by default)

The difference is quite noticeable so worth investigating - numbers are quite stable across multiple runs.
Judging by the effect from DOTNET_TC_CallCountThreshold we're having some contention for call counting stub installation/promotion to tier1.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 7, 2023
@EgorBo EgorBo self-assigned this Mar 7, 2023
@EgorBo EgorBo added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 7, 2023
@ghost
Copy link

ghost commented Mar 7, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

I was measuring crossgen2.exe -O SPC.dll (actually, the exact command we use for build Clr.NativeCoreLib -c Release) and noticed a few problems:

Mode Time to prejit SPC.dll, seconds
TC=0 4.23
TC=1 4.71
TC=1, PGO=1 5.29

The difference is quite noticeable so worth investigating.
Few observations so far - it seems there is a huge benefit from increasing call counting threshold for R2R'd code to e.g 1000 - will file a PR with that since @davidwrighton made IsReadyToRun(PCODE) VM API cheap now. It's needed because we don't want to re-jit R2R to InstrumentedTier too early.

Investigating this in VTune now, e.g. here is a VTune comparison for TC=1,PGO=1 vs TC=1,PGO=0:

image

Author: EgorBo
Assignees: EgorBo
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo EgorBo added this to the 8.0.0 milestone Mar 7, 2023
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Mar 7, 2023
@EgorBo
Copy link
Member Author

EgorBo commented Mar 7, 2023

Fun-fact: DOTNET_TC_CallCountingDelayMs=1 makes TC=1 (default) faster than TC=0. So apparently there is a huge contention to install call counting stubs

@EgorBo
Copy link
Member Author

EgorBo commented Mar 8, 2023

cc @noahfalk @kouvel

@AndyAyersMS
Copy link
Member

As mentioned offline we also ought to start measuring with the NAOT'd crossgen2.

@MichalStrehovsky
Copy link
Member

#89489 disabled tiering to work around (matching the workaround used in ILC already) so if we still do TP measurements in the non-shipping configuration of crossgen2, there's going to be an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

4 participants