R2R: Cache target of indirect cell address to optimize redundant cell address loading #38890

kunalspathak · 2020-07-07T19:35:46Z

Today we don't CSE loading the target of indirect cell address because during CSE we don't have that information in the IR. It happens in later phase like lower.

Consider the following code pattern:

        ...
        9000000B          adrp    x11, [RELOC #0x1de756537c0]
        9100016B          add     x11, x11, #0
        F9400160          ldr     x0, [x11]
        D63F0000          blr     x0
        ...
        9000000B          adrp    x11, [RELOC #0x1de756537c0]
        9100016B          add     x11, x11, #0
        F9400160          ldr     x0, [x11]
        D63F0000          blr     x0
        AA0003EF          mov     x15, x0
       ...

If we can optimize it using peephole or more some ambitious final instructions scanner phase to something like this to:

        ...
        9000000B          adrp    x11, [RELOC #0x1de756537c0]
        9100016B          add     x11, x11, #0
        F9400160          ldr     x0, [x11]
                          mov     xR, x0     ; store x0 in some register xR
        D63F0000          blr     x0
        ...
                          mov     x0, xR     ; retrieve xR into x0
        D63F0000          blr     x0
        AA0003EF          mov     x15, x0
       ...

With this, we can get an improvement of 8 bytes + 1 elimination of memory access.
I wrote an analyzer asm to find out how many addresses are CSE candidates and the number is huge. From what I noticed, it would by little over 2MB of size reduction.

Processed 191816 methods. Found 29246 methods containing 259123 groups.

Details: cse-candidates.txt

category:cq
theme:cse
skill-level:expert
cost:large
impact:large

The text was updated successfully, but these errors were encountered:

jakobbotsch · 2021-09-28T13:38:37Z

Note that the contents of the indirection cell is not invariant. In the example here the second time the call happens the indirection cell will be pointing to the R2R code address and later to tier-1 code, while the first time around it might be pointing to an import thunk.

With that said it might still be ok to do this optimization as the import thunk/delay load helper should be idempotent so calling it a second time in rare cases should be ok (though with loops we should consider avoiding it).

EgorBo · 2023-07-20T12:09:36Z

Together with @jakobbotsch we've just checked that it seems that we already do the right thing here, e.g.:

The address is hoisted, the ldr is not (as expected)

kunalspathak added arch-arm64 tenet-performance Performance related issue labels Jul 7, 2020

Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Jul 7, 2020

kunalspathak mentioned this issue Jul 7, 2020

Improving ARM64 Performance in .NET 5.0 – Closing the gap with x64 #35853

Closed

BruceForstall removed the untriaged New issue has not been triaged by the area owner label Jul 8, 2020

BruceForstall added this to the Future milestone Jul 8, 2020

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 7, 2020

EgorBo closed this as completed Jul 20, 2023

ghost locked as resolved and limited conversation to collaborators Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R2R: Cache target of indirect cell address to optimize redundant cell address loading #38890

R2R: Cache target of indirect cell address to optimize redundant cell address loading #38890

kunalspathak commented Jul 7, 2020 •

edited by BruceForstall

Loading

jakobbotsch commented Sep 28, 2021

EgorBo commented Jul 20, 2023

R2R: Cache target of indirect cell address to optimize redundant cell address loading #38890

R2R: Cache target of indirect cell address to optimize redundant cell address loading #38890

Comments

kunalspathak commented Jul 7, 2020 • edited by BruceForstall Loading

jakobbotsch commented Sep 28, 2021

EgorBo commented Jul 20, 2023

kunalspathak commented Jul 7, 2020 •

edited by BruceForstall

Loading