Generalize Arm64 ldr/str to ldp/stp optimization #81278

BruceForstall · 2023-01-27T17:25:39Z

#77540 introduced a JIT peephole optimization to convert consecutive ldr/str instructions to ldp/stp. It was limited to avoiding this optimization when either of the two ldr/str represented lclvar, which requires more work to properly handle the GC effects.

See #77540 (review) for more discussion.

This issue tracks generalizing the optimization to handle the lclvar cases.

Also, the optimization was restricted to not work in prologs or epilogs, to avoid affecting unwind codes. There are two potential improvements here:

Make sure the unwind codes work with the optimization.
Allow the optimization in the non-OS (not unwindable) part of the prolog/epilog.

ghost · 2023-01-27T17:25:48Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

#77540 introduced a JIT peephole optimization to convert consecutive ldr/str instructions to ldp/stp. It was limited to avoiding this optimization when either of the two ldr/str represented lclvar, which requires more work to properly handle the GC effects.

See #77540 (review) for more discussion.

This issue tracks generalizing the optimization to handle the lclvar cases.

Author:	BruceForstall
Assignees:	-
Labels:	`arch-arm64`, `area-CodeGen-coreclr`
Milestone:	8.0.0

BruceForstall · 2023-02-15T00:23:12Z

Regarding the lclvar case, emitIns_S_S_R_R() might be a place to look.

kunalspathak · 2023-04-27T06:14:02Z

#85032 fixes the general code except prolog/epilog and unwind code. I will leave this issue open.

kunalspathak · 2023-05-01T13:08:28Z

Just to experiment, I tried to comment below lines and see lot of methods improved for benchmarks collection.

runtime/src/coreclr/jit/emitarm64.cpp

Lines 16648 to 16653 in 049acec

    
           // Don't remove instructions whilst in prologs or epilogs, as these contain  "unwindable" 
        
           // parts, where we need to report unwind codes to the OS, 
        
           if (emitIGisInProlog(emitCurIG) || emitIGisInEpilog(emitCurIG)) 
        
           { 
        
               return eRO_none; 
        
           }

2,387 contexts with diffs (2,387 improvements, 0 regressions, 0 same size)
  -12,172 bytes

BruceForstall added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Jan 27, 2023

BruceForstall added this to the 8.0.0 milestone Jan 27, 2023

BruceForstall mentioned this issue Jan 27, 2023

Replace successive "ldr" and "str" instructions with "ldp" and "stp" #77540

Merged

kunalspathak self-assigned this Feb 15, 2023

This was referenced Mar 27, 2023

[Arm64] Peephole optimization opportunities #55365

Closed

Use ldp/stp with SIMD registers on Arm64 #84135

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 30, 2023

kunalspathak closed this as completed in #84135 Mar 31, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 31, 2023

kunalspathak reopened this Apr 3, 2023

kunalspathak mentioned this issue Apr 6, 2023

Perform ldr to ldp peephole optimization #84399

Merged

kunalspathak mentioned this issue Apr 19, 2023

[Arm64] Replace pairs of str with stp #85032

Merged

kunalspathak mentioned this issue May 2, 2023

Skip stp/ldp only for unwind portion of prolog/epilog #85657

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label May 2, 2023

kunalspathak closed this as completed in #85657 May 3, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 3, 2023

ghost locked as resolved and limited conversation to collaborators Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize Arm64 ldr/str to ldp/stp optimization #81278

Generalize Arm64 ldr/str to ldp/stp optimization #81278

BruceForstall commented Jan 27, 2023 •

edited

Loading

ghost commented Jan 27, 2023

BruceForstall commented Feb 15, 2023

kunalspathak commented Apr 27, 2023

kunalspathak commented May 1, 2023

Generalize Arm64 ldr/str to ldp/stp optimization #81278

Generalize Arm64 ldr/str to ldp/stp optimization #81278

Comments

BruceForstall commented Jan 27, 2023 • edited Loading

ghost commented Jan 27, 2023

BruceForstall commented Feb 15, 2023

kunalspathak commented Apr 27, 2023

kunalspathak commented May 1, 2023

BruceForstall commented Jan 27, 2023 •

edited

Loading