Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected PGO schema divergence in a method that's both instrumented and optimized #85799

Closed
AndyAyersMS opened this issue May 4, 2023 · 2 comments · Fixed by #85805
Closed
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented May 4, 2023

With dynamic PGO we generally expect that the schema the jit produces when instrumenting will exactly match the one we expect to see when optimizing. I happened across this case in the asp.net collection where this isn't the case.

;; 98458,System.DateTime:get_Kind,"int32 get_Kind()",  DEBUG_INFO FROZEN_ALLOC_ALLOWED SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_DYNAMIC_PROFILE, 0

*************** Inline @[000001] Starting PHASE Profile incorporation
Have Dynamic PGO: 3 schema records (schema at 000001B79CAA5458, data at 000001B79CA7C178)
Profile summary: 1 runs, 0 block probes, 3 edge probes, 0 class profiles, 0 method profiles, 0 other records

Reconstructing block counts from sparse edge instrumentation
... adding known edge BB04 -> BB07: weight 0
... adding known edge BB06 -> BB07: weight 0
... adding known edge BB07 -> BB01: weight 237472

New BlockSet epoch 1, # of blocks (including unused BB00): 8, bitset array size: 1 (short)
 ... unknown edge BB01 -> BB04
 ... unknown edge BB01 -> BB02
 ... unknown edge BB02 -> BB05
 ... unknown edge BB02 -> BB03
 ... unknown edge BB03 -> BB06
Did not expect tree edge BB06 -> BB07 to be present in the schema (key 00000020, 00000022)
 ... pseudo  edge BB07 -> BB01
Schema is missing non-tree edge BB05 -> BB07, will presume zero
 ... known   edge BB05 -> BB07
 ... known   edge BB04 -> BB07
... not solving because of the mismatch
... discarding profile count data: PGO data available, but IL did not match
Computing inlinee profile scale:
   ... no callee profile data, will use non-pgo weight to scale
   call site count 100 callee entry count 100 scale 1
Scaling inlinee blocks
Writing out flow graph after phase Profile incorporation

Here the jit is surprised to see that the schema edges (non-tree edges) don't match its own non-tree edges, and so it throws away all the profile data.

Digging back in the SPMI collection I found the compilation that produced that schema. It read in an existing static schema and then created a divergent dynamic schema:

;;; 96562,System.DateTime:get_Kind,"int32 get_Kind()",  DEBUG_INFO FROZEN_ALLOC_ALLOWED SKIP_VERIFICATION BBINSTR BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_STATIC_PROFILE, 0

*************** Starting PHASE Profile incorporation
Have Static PGO: 3 schema records (schema at 000002C500F66578, data at 000002C500F3E360)
Profile summary: 1 runs, 0 block probes, 3 edge probes, 0 class profiles, 0 method profiles, 0 other records

Reconstructing block counts from sparse edge instrumentation
... adding known edge BB04 -> BB07: weight 8872
... adding known edge BB05 -> BB07: weight 8880
... adding known edge BB07 -> BB01: weight 17456

*************** Starting PHASE Profile instrumentation prep
Using edge profiling

EfficientEdgeCountInstrumentor: preparing for instrumentation
[0] New probe for BB07 -> BB01 [source]
[1] New probe for BB06 -> BB07 [source]
[2] New probe for BB04 -> BB07 [source]
7 blocks, 3 probes (0 on critical edges)
Writing out flow graph after phase Profile instrumentation prep

The dynamic schema takes priority and any further jitting of this method then fails to incorporate the data.

The issue seems to be that the spanning tree formation is impacted by the profile data that gets incorporated, in particular this bit of code:

if (block->isRunRarely() || !target->isRunRarely())
{
continue;
}

is reacting to the fact that the static profile has marked certain blocks as run rarely.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 4, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 4, 2023
@ghost
Copy link

ghost commented May 4, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

With dynamic PGO we generally expect that the schema the jit produces when instrumenting will exactly match the one we expect to see when optimizing. I happened across this case in the asp.net collection where this isn't the case.

;; 98458,System.DateTime:get_Kind,"int32 get_Kind()",  DEBUG_INFO FROZEN_ALLOC_ALLOWED SKIP_VERIFICATION BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_DYNAMIC_PROFILE, 0

*************** Inline @[000001] Starting PHASE Profile incorporation
Have Dynamic PGO: 3 schema records (schema at 000001B79CAA5458, data at 000001B79CA7C178)
Profile summary: 1 runs, 0 block probes, 3 edge probes, 0 class profiles, 0 method profiles, 0 other records

Reconstructing block counts from sparse edge instrumentation
... adding known edge BB04 -> BB07: weight 0
... adding known edge BB06 -> BB07: weight 0
... adding known edge BB07 -> BB01: weight 237472

New BlockSet epoch 1, # of blocks (including unused BB00): 8, bitset array size: 1 (short)
 ... unknown edge BB01 -> BB04
 ... unknown edge BB01 -> BB02
 ... unknown edge BB02 -> BB05
 ... unknown edge BB02 -> BB03
 ... unknown edge BB03 -> BB06
Did not expect tree edge BB06 -> BB07 to be present in the schema (key 00000020, 00000022)
 ... pseudo  edge BB07 -> BB01
Schema is missing non-tree edge BB05 -> BB07, will presume zero
 ... known   edge BB05 -> BB07
 ... known   edge BB04 -> BB07
... not solving because of the mismatch
... discarding profile count data: PGO data available, but IL did not match
Computing inlinee profile scale:
   ... no callee profile data, will use non-pgo weight to scale
   call site count 100 callee entry count 100 scale 1
Scaling inlinee blocks
Writing out flow graph after phase Profile incorporation

Here the jit is surprised to see that the schema edges (non-tree edges) don't match its own non-tree edges, and so it throws away all the profile dat.

Digging back in the SPMI collection I found the compilation that produced that schema. It read in an existing static schema and then created a divergent dynamic schema:

;;; 96562,System.DateTime:get_Kind,"int32 get_Kind()",  DEBUG_INFO FROZEN_ALLOC_ALLOWED SKIP_VERIFICATION BBINSTR BBOPT TIER1 HAS_PGO HAS_EDGE_PROFILE HAS_STATIC_PROFILE, 0

*************** Starting PHASE Profile incorporation
Have Static PGO: 3 schema records (schema at 000002C500F66578, data at 000002C500F3E360)
Profile summary: 1 runs, 0 block probes, 3 edge probes, 0 class profiles, 0 method profiles, 0 other records

Reconstructing block counts from sparse edge instrumentation
... adding known edge BB04 -> BB07: weight 8872
... adding known edge BB05 -> BB07: weight 8880
... adding known edge BB07 -> BB01: weight 17456

*************** Starting PHASE Profile instrumentation prep
Using edge profiling

EfficientEdgeCountInstrumentor: preparing for instrumentation
[0] New probe for BB07 -> BB01 [source]
[1] New probe for BB06 -> BB07 [source]
[2] New probe for BB04 -> BB07 [source]
7 blocks, 3 probes (0 on critical edges)
Writing out flow graph after phase Profile instrumentation prep

The dynamic schema takes priority and any further jitting of this method then fails to incorporate the data.

The issue seems to be that the spanning tree formation is impacted by the profile data that gets incorporated, in particular this bit of code:

if (block->isRunRarely() || !target->isRunRarely())
{
continue;
}

is reacting to the fact that the static profile has marked certain blocks as run rarely.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@AndyAyersMS AndyAyersMS self-assigned this May 4, 2023
@AndyAyersMS AndyAyersMS added this to the 8.0.0 milestone May 4, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label May 4, 2023
@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label May 4, 2023
@AndyAyersMS AndyAyersMS changed the title Unexpected PGO schema divergence in a method that's both instrumented an optimized Unexpected PGO schema divergence in a method that's both instrumented and optimized May 4, 2023
@AndyAyersMS
Copy link
Member Author

Conceptually simple fix, just reorder the instrumentation prep and incorporate profile phases. However they both use the bbSparseCountInfo field of blocks and so reordering doesn't work as expected.

We could have two fields, one for reading and one for writing, or try and remove the dependence on profile data when building a schema (that could break reading in current static profile data, but also might be the right fix).

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue May 5, 2023
Otherwise the spanning tree we generate may be biased by the profile data
and not match the spanning tree we generated in Tier0.

Fixes dotnet#85799.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 5, 2023
AndyAyersMS added a commit that referenced this issue May 5, 2023
…85805)

Otherwise the spanning tree we generate may be biased by the profile data
and not match the spanning tree we generated in Tier0.

Fixes #85799.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 5, 2023
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue May 8, 2023
When the profile data comes from dynamic PGO, the spanning tree encoded in the
schema produced by an earlier tier should exactly match the spanning tree for
the current jit attempt, since the JIT and method IL are identical.

(This is not the case for static PGO; that schema may have come from a different
JIT and/or different version of IL).

Note in release modes we won't assert; instead, we will silently throw the PGO
data away.

Follow-on change to dotnet#85805 to catch more issues like dotnet#85799.
AndyAyersMS added a commit that referenced this issue May 8, 2023
When the profile data comes from dynamic PGO, the spanning tree encoded in the
schema produced by an earlier tier should exactly match the spanning tree for
the current jit attempt, since the JIT and method IL are identical.

(This is not the case for static PGO; that schema may have come from a different
JIT and/or different version of IL).

Note in release modes we won't assert; instead, we will silently throw the PGO
data away.

Follow-on change to #85805 to catch more issues like #85799.
@ghost ghost locked as resolved and limited conversation to collaborators Jun 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant