Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: loop inversion doing some questionable profile maintenance #84319

Closed
AndyAyersMS opened this issue Apr 4, 2023 · 3 comments · Fixed by #85265
Closed

JIT: loop inversion doing some questionable profile maintenance #84319

AndyAyersMS opened this issue Apr 4, 2023 · 3 comments · Fixed by #85265
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

From the PerfLabTests.CastingPerf:CheckArrayIsInterfaceNo benchmark, running with TieredPGO, during an OSR recompile:

Note how BB04's profile data gets trashed

****** START compiling PerfLabTests.CastingPerf:CheckArrayIsInterfaceNo():bool:this (MethodHash=a46d9cd6)
Generating code for Windows x64
OPTIONS: Tier-1 compilation
OPTIONS: OSR variant with entry point 0x18
...

*************** Starting PHASE Invert loops
-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight       IBC  lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004]  1                             1      48701    [???..???)-> BB04 (always)                     i internal IBC 
BB02 [0000]  0                             0.00       1    [000..006)-> BB04 (always)                     i IBC 
BB03 [0001]  1       BB04                100.00 4870143    [006..018)                                     i hascall bwd bwd-target IBC 
BB04 [0002]  3       BB01,BB02,BB03      100    4870144    [018..020)-> BB03 ( cond )                     i bwd bwd-src IBC  osr-entry
BB05 [0003]  1       BB04                  0          0    [020..022)        (return)                     i rare IBC 
-----------------------------------------------------------------------------------------------------------------------------------------

***************  Exception Handling table is empty
=============== No blocks renumbered!

Duplication of loop condition [000010] is performed, because the cost of duplication (17) is less or equal than 136,
   loopIterations = 4870143.000, optInvertTotalInfo.sharedStaticHelperCount >= 0, validProfileWeights = true
New Basic Block BB06 [0005] created.
Setting edge weights for BB02 -> BB06 to [0 .. 1]
Setting edge weights for BB06 -> BB03 to [0 .. 1]
Setting edge weights for BB06 -> BB05 to [0 .. 0]
Redirecting non-loop BB01 -> BB04 to BB01 -> BB06
Setting edge weights for BB01 -> BB06 to [0 .. 1]
Reducing profile weight of BB04 from 4870144 to 4870143
Setting weight of BB04 -> BB03 to 4870142 (iterate loop)
Setting weight of BB04 -> BB05 to 0.9999998 (exit loop)
Setting edge weights for BB04 -> BB03 to [4870142 .. 4870142]
Setting edge weights for BB04 -> BB05 to [0.9999998 .. 0.9999998]
Profile weight of BB02 remains unchanged at 1
Setting weight of BB06 -> BB03 to 0.9999998 (enter loop)
Setting weight of BB06 -> BB05 to 2.053327e-07 (avoid loop)
Setting edge weights for BB06 -> BB03 to [0.9999998 .. 0.9999998]
Setting edge weights for BB06 -> BB05 to [2.053327e-07 .. 2.053327e-07]

Duplicated loop exit block at BB06 for loop (BB03 - BB04)
Estimated code size expansion is 17

------------ BB06 [???..???) -> BB05 (cond), preds={BB01,BB02} succs={BB03,BB05}

***** BB06
STMT00006 ( 0x018[E-] ... ??? )
     ( 11, 17) [000028] ----G------                         *  JTRUE     void  
     (  9, 15) [000029] J---G--N---                         \--*  GE        int   
     (  3,  2) [000030] -----------                            +--*  LCL_VAR   int    V02 loc1         
     (  5, 12) [000031] n---G------                            \--*  IND       int   
     (  3, 10) [000032] H----------                               \--*  CNS_INT(h) long   0x7ffefe7b4cb0 static Fseq[InnerIterationCount]

------------ BB04 [018..020) -> BB03 (cond), preds={BB03} succs={BB05,BB03}

***** BB04
STMT00002 ( 0x018[E-] ... 0x01E )
     ( 11, 17) [000010] ----G------                         *  JTRUE     void  
     (  9, 15) [000009] J---G--N---                         \--*  LT        int   
     (  3,  2) [000006] -----------                            +--*  LCL_VAR   int    V02 loc1         
     (  5, 12) [000008] n---G------                            \--*  IND       int   
     (  3, 10) [000007] H----------                               \--*  CNS_INT(h) long   0x7ffefe7b4cb0 static Fseq[InnerIterationCount]

*************** Finishing PHASE Invert loops
Trees after Invert loops

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight       IBC  lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004]  1                             1      48701    [???..???)-> BB06 (always)                     i internal IBC 
BB02 [0000]  0                             0.00       1    [000..006)                                     i IBC 
BB06 [0005]  2       BB01,BB02             0.00       1    [???..???)-> BB05 ( cond )                     internal IBC 
BB03 [0001]  2       BB04,BB06           100.00 4870143    [006..018)                                     i hascall bwd bwd-target IBC 
BB04 [0002]  1       BB03                  0.00       1    [018..020)-> BB03 ( cond )                     i bwd bwd-src IBC  osr-entry
BB05 [0003]  2       BB04,BB06             0          0    [020..022)        (return)                     i rare IBC 
-----------------------------------------------------------------------------------------------------------------------------------------
@AndyAyersMS AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 4, 2023
@AndyAyersMS AndyAyersMS added this to the 8.0.0 milestone Apr 4, 2023
@AndyAyersMS AndyAyersMS self-assigned this Apr 4, 2023
@ghost
Copy link

ghost commented Apr 4, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

From the PerfLabTests.CastingPerf:CheckArrayIsInterfaceNo benchmark, running with TieredPGO, during an OSR recompile:

Note how BB04's profile data gets trashed

****** START compiling PerfLabTests.CastingPerf:CheckArrayIsInterfaceNo():bool:this (MethodHash=a46d9cd6)
Generating code for Windows x64
OPTIONS: Tier-1 compilation
OPTIONS: OSR variant with entry point 0x18
...

*************** Starting PHASE Invert loops
-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight       IBC  lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004]  1                             1      48701    [???..???)-> BB04 (always)                     i internal IBC 
BB02 [0000]  0                             0.00       1    [000..006)-> BB04 (always)                     i IBC 
BB03 [0001]  1       BB04                100.00 4870143    [006..018)                                     i hascall bwd bwd-target IBC 
BB04 [0002]  3       BB01,BB02,BB03      100    4870144    [018..020)-> BB03 ( cond )                     i bwd bwd-src IBC  osr-entry
BB05 [0003]  1       BB04                  0          0    [020..022)        (return)                     i rare IBC 
-----------------------------------------------------------------------------------------------------------------------------------------

***************  Exception Handling table is empty
=============== No blocks renumbered!

Duplication of loop condition [000010] is performed, because the cost of duplication (17) is less or equal than 136,
   loopIterations = 4870143.000, optInvertTotalInfo.sharedStaticHelperCount >= 0, validProfileWeights = true
New Basic Block BB06 [0005] created.
Setting edge weights for BB02 -> BB06 to [0 .. 1]
Setting edge weights for BB06 -> BB03 to [0 .. 1]
Setting edge weights for BB06 -> BB05 to [0 .. 0]
Redirecting non-loop BB01 -> BB04 to BB01 -> BB06
Setting edge weights for BB01 -> BB06 to [0 .. 1]
Reducing profile weight of BB04 from 4870144 to 4870143
Setting weight of BB04 -> BB03 to 4870142 (iterate loop)
Setting weight of BB04 -> BB05 to 0.9999998 (exit loop)
Setting edge weights for BB04 -> BB03 to [4870142 .. 4870142]
Setting edge weights for BB04 -> BB05 to [0.9999998 .. 0.9999998]
Profile weight of BB02 remains unchanged at 1
Setting weight of BB06 -> BB03 to 0.9999998 (enter loop)
Setting weight of BB06 -> BB05 to 2.053327e-07 (avoid loop)
Setting edge weights for BB06 -> BB03 to [0.9999998 .. 0.9999998]
Setting edge weights for BB06 -> BB05 to [2.053327e-07 .. 2.053327e-07]

Duplicated loop exit block at BB06 for loop (BB03 - BB04)
Estimated code size expansion is 17

------------ BB06 [???..???) -> BB05 (cond), preds={BB01,BB02} succs={BB03,BB05}

***** BB06
STMT00006 ( 0x018[E-] ... ??? )
     ( 11, 17) [000028] ----G------                         *  JTRUE     void  
     (  9, 15) [000029] J---G--N---                         \--*  GE        int   
     (  3,  2) [000030] -----------                            +--*  LCL_VAR   int    V02 loc1         
     (  5, 12) [000031] n---G------                            \--*  IND       int   
     (  3, 10) [000032] H----------                               \--*  CNS_INT(h) long   0x7ffefe7b4cb0 static Fseq[InnerIterationCount]

------------ BB04 [018..020) -> BB03 (cond), preds={BB03} succs={BB05,BB03}

***** BB04
STMT00002 ( 0x018[E-] ... 0x01E )
     ( 11, 17) [000010] ----G------                         *  JTRUE     void  
     (  9, 15) [000009] J---G--N---                         \--*  LT        int   
     (  3,  2) [000006] -----------                            +--*  LCL_VAR   int    V02 loc1         
     (  5, 12) [000008] n---G------                            \--*  IND       int   
     (  3, 10) [000007] H----------                               \--*  CNS_INT(h) long   0x7ffefe7b4cb0 static Fseq[InnerIterationCount]

*************** Finishing PHASE Invert loops
Trees after Invert loops

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight       IBC  lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004]  1                             1      48701    [???..???)-> BB06 (always)                     i internal IBC 
BB02 [0000]  0                             0.00       1    [000..006)                                     i IBC 
BB06 [0005]  2       BB01,BB02             0.00       1    [???..???)-> BB05 ( cond )                     internal IBC 
BB03 [0001]  2       BB04,BB06           100.00 4870143    [006..018)                                     i hascall bwd bwd-target IBC 
BB04 [0002]  1       BB03                  0.00       1    [018..020)-> BB03 ( cond )                     i bwd bwd-src IBC  osr-entry
BB05 [0003]  2       BB04,BB06             0          0    [020..022)        (return)                     i rare IBC 
-----------------------------------------------------------------------------------------------------------------------------------------


<table>
  <tr>
    <th align="left">Author:</th>
    <td>AndyAyersMS</td>
  </tr>
  <tr>
    <th align="left">Assignees:</th>
    <td>AndyAyersMS</td>
  </tr>
  <tr>
    <th align="left">Labels:</th>
    <td>

`area-CodeGen-coreclr`

</td>
  </tr>
  <tr>
    <th align="left">Milestone:</th>
    <td>8.0.0</td>
  </tr>
</table>
</details>

@AndyAyersMS
Copy link
Member Author

Seems like optInvertWhileLoop is not doing sufficiently rigorous screening. It matches on the BB02 -> BB04 -> BB03 pattern thinking this is the hallmark of an un-rotated loop, but in actuality the loop is BB01 -> BB04 -> BB03 and BB02 is irrelevant.

Simple fix is to not rotate when block is unreferenced; or possibly better, that bTest has only one predecessor that is above the evident top of the loop.

@AndyAyersMS
Copy link
Member Author

The problem more generally is that profile updates to the bTest block are just wrong if it has more than two predecessors. Here's a different example. On the left we have the flowgraph before inversion; on the right; after (unfortunately BBs get renumbered, but it's not hard to figure out how they match up).

Inversion is going duplicate the code in BB14 on left into a new block (also called BB14, on right) to handle the path that might enter the loop. That new block has weight 1.0 as it should. But the original BB14 (now BB12)'s profile is too small, it should be 1.99 and not 1.00. The backedge weight is also off.

image image

The fix I have been proposing is for bTest to inherit the profile weight of the top block in the loop. This fixes this issue but seemingly leads to others. One challenge is that doing profile math when profiles are inconsistent is tricky.

I'm going to set this aside for now as I don't know of any specific perf problems this causes, though I expect there are sone.

Provisional changes here: main...AndyAyersMS:runtime:FixLoopInversion

@AndyAyersMS AndyAyersMS modified the milestones: 8.0.0, Future Apr 5, 2023
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Apr 24, 2023
If the loop test block has multiple predecessors we will not do proper
profile updates. This can lead to downstream problems with block layout
(say leaving a cold block in a loop).

Fix by changing how we compute the amount of profile that should remain
in the test block.

Fixes dotnet#84319.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 24, 2023
AndyAyersMS added a commit that referenced this issue Apr 25, 2023
If the loop test block has multiple predecessors we will not do proper
profile updates. This can lead to downstream problems with block layout
(say leaving a cold block in a loop).

Fix by changing how we compute the amount of profile that should remain
in the test block.

Fixes #84319.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 25, 2023
@ghost ghost locked as resolved and limited conversation to collaborators May 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant