Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: change loop inversion edge weight updates and add phase #48364

Merged
merged 7 commits into from
Feb 18, 2021

Conversation

AndyAyersMS
Copy link
Member

Rename fgOptWhileLoop to optInvertWhileLoop (using terminology from
Muchnick). Split off this transformation into a new phase so it is easier
to see its impact. Make the block updates / reorderings that follow into
a proper phase as well.

Use the test block exit likelihoods to update the profile weights for the
edges involved in loop inversion. Because edge weight updates are now
consistent we no longer need to recompute edge weights afterwards.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 16, 2021
@AndyAyersMS
Copy link
Member Author

@BruceForstall PTAL
cc @dotnet/jit-contrib

SPMI seems to be missing collections for latest hash. Believe this should be no diff w/o profile data. I see a handful of diffs on a local collection with tiered PGO that I'll post shortly.

@AndyAyersMS
Copy link
Member Author

Local diffs (on a collection with tiered PGO enabled)

Total bytes of base: 9428
Total bytes of diff: 9217
Total bytes of delta: -211 (-2.24% of base)
    diff is an improvement.


Top file regressions (bytes):
          17 : 74628.dasm (1.66% of base)
          13 : 74645.dasm (0.98% of base)
          11 : 74629.dasm (1.37% of base)
           3 : 74646.dasm (0.34% of base)
           3 : 74651.dasm (0.68% of base)
           3 : 74779.dasm (1.00% of base)
           2 : 4794.dasm (0.68% of base)

Top file improvements (bytes):
        -236 : 74789.dasm (-30.81% of base)
         -12 : 74606.dasm (-1.37% of base)
         -12 : 74580.dasm (-1.41% of base)
          -3 : 4852.dasm (-0.74% of base)

11 total files with Code Size differences (4 improved, 7 regressed), 2 unchanged.

Top method regressions (bytes):
          17 ( 1.66% of base) : 74628.dasm - AssignJagged:first_assignments(System.Int32[][],System.Int16[][]):int
          13 ( 0.98% of base) : 74645.dasm - AssignRect:first_assignments(System.Int32[,],System.Int16[,]):int
          11 ( 1.37% of base) : 74629.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][])
           3 ( 0.34% of base) : 74646.dasm - AssignRect:second_assignments(System.Int32[,],System.Int16[,])
           3 ( 0.68% of base) : 74651.dasm - AssignRect:calc_minimum_costs(System.Int32[,])
           3 ( 1.00% of base) : 74779.dasm - Neural:DoNNetIteration(long):long
           2 ( 0.68% of base) : 4794.dasm - System.Collections.Concurrent.ConcurrentDictionary`2[__Canon,Boolean][System.__Canon,System.Boolean]:TryGetValueInternal(System.__Canon,int,byref):bool:this

Top method improvements (bytes):
        -236 (-30.81% of base) : 74789.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[])
         -12 (-1.37% of base) : 74606.dasm - EMFloatClass:DivideInternalFPF(InternalFPF,InternalFPF,InternalFPF)
         -12 (-1.41% of base) : 74580.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
          -3 (-0.74% of base) : 4852.dasm - Internal.TypeSystem.MethodSignature:Equals(Internal.TypeSystem.MethodSignature,bool):bool:this

Top method regressions (percentages):
          17 ( 1.66% of base) : 74628.dasm - AssignJagged:first_assignments(System.Int32[][],System.Int16[][]):int
          11 ( 1.37% of base) : 74629.dasm - AssignJagged:second_assignments(System.Int32[][],System.Int16[][])
           3 ( 1.00% of base) : 74779.dasm - Neural:DoNNetIteration(long):long
          13 ( 0.98% of base) : 74645.dasm - AssignRect:first_assignments(System.Int32[,],System.Int16[,]):int
           3 ( 0.68% of base) : 74651.dasm - AssignRect:calc_minimum_costs(System.Int32[,])
           2 ( 0.68% of base) : 4794.dasm - System.Collections.Concurrent.ConcurrentDictionary`2[__Canon,Boolean][System.__Canon,System.Boolean]:TryGetValueInternal(System.__Canon,int,byref):bool:this
           3 ( 0.34% of base) : 74646.dasm - AssignRect:second_assignments(System.Int32[,],System.Int16[,])

Top method improvements (percentages):
        -236 (-30.81% of base) : 74789.dasm - LUDecomp:lubksb(System.Double[][],int,System.Int32[],System.Double[])
         -12 (-1.41% of base) : 74580.dasm - EMFloat:DivideInternalFPF(byref,byref,byref)
         -12 (-1.37% of base) : 74606.dasm - EMFloatClass:DivideInternalFPF(InternalFPF,InternalFPF,InternalFPF)
          -3 (-0.74% of base) : 4852.dasm - Internal.TypeSystem.MethodSignature:Equals(Internal.TypeSystem.MethodSignature,bool):bool:this

11 total methods with Code Size differences (4 improved, 7 regressed), 2 unchanged.

Suspect the diffs in lubksb are not ideal -- downstream of the loop inversion profile data gets compromised, and we lose track of one loop. I plan to fix that in a subsequent change, but can look into fixing it in this PR too.

@AndyAyersMS
Copy link
Member Author

Impact of the transformation on lubksb. A copy of what's in BB10 was fused into BB01, and the edge weights on BB01 reflect the likelihood of entering the loop or zero-tripping.

;; note these are raw counts; charts below show normalized counts

Reducing profile weight of BB10 from 11934.000000 to 11817.000000
Setting weight of BB10 -> BB02 to 11701.146484 (iterate loop)
Setting weight of BB10 -> BB11 to 115.853287 (exit loop)
Profile weight of BB01 remains unchanged at 117.000000
Setting weight of BB01 -> BB02 to 115.852936 (enter loop)
Setting weight of BB01 -> BB11 to 1.147062 (avoid loop)

3 other loops get inverted in this example...

BeforeAfter

@AndyAyersMS AndyAyersMS mentioned this pull request Feb 16, 2021
54 tasks
@AndyAyersMS
Copy link
Member Author

After the subsequent optOptimizeLayout there are some obvious edge weights wrong (eg BB03's outgoing edges):

image - 2021-02-16T132807 350

Rename `fgOptWhileLoop` to `optInvertWhileLoop` (using terminology from
Muchnick). Split off this transformation into a new phase so it is easier
to see its impact. Make the block updates / reorderings that follow into
a proper phase as well.

Use the test block exit likelihoods to update the profile weights for the
edges involved in loop inversion. Because edge weight updates are now
consistent we no longer need to recompute edge weights afterwards.
@AndyAyersMS
Copy link
Member Author

Ran SPMI diffs, found 157 methods with diffs; at a glance they all seem to be from assemblies with IBC.

Am seeing some failures in CI that will require investigation.

@AndyAyersMS AndyAyersMS added this to the 6.0.0 milestone Feb 17, 2021
src/coreclr/jit/compiler.h Outdated Show resolved Hide resolved
@AndyAyersMS
Copy link
Member Author

optOptimizeLoops no longer optimizes anything. I can't resist fixing this. One more push in the works.

@AndyAyersMS AndyAyersMS merged commit 841fc98 into dotnet:master Feb 18, 2021
@AndyAyersMS AndyAyersMS deleted the FixFgOptWhileLoopProfileUpdate branch February 18, 2021 20:34
@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants