-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate why crossgen works slower with TieredCompilation/PGO #83112
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsI was measuring
The difference is quite noticeable so worth investigating. Investigating this in VTune now, e.g. here is a VTune comparison for
|
Fun-fact: |
The regression is seen after enabling dynamic PGO in #86225 |
As mentioned offline we also ought to start measuring with the NAOT'd crossgen2. |
#89489 disabled tiering to work around (matching the workaround used in ILC already) so if we still do TP measurements in the non-shipping configuration of crossgen2, there's going to be an improvement. |
I was measuring
crossgen2.exe -O SPC.dll
(actually, the exact command we use forbuild Clr.NativeCoreLib -c Release
) and noticed a few problems:Legend:
TC
-DOTNET_TieredCompilation
(1
by default)PGO
-DOTNET_TieredPGO
(0
by default)CCDelayMS
-DOTNET_TC_CallCountThreshold
(100
by default)The difference is quite noticeable so worth investigating - numbers are quite stable across multiple runs.
Judging by the effect from
DOTNET_TC_CallCountThreshold
we're having some contention for call counting stub installation/promotion to tier1.The text was updated successfully, but these errors were encountered: