Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R2R: Cache target of indirect cell address to optimize redundant cell address loading #38890

Closed
kunalspathak opened this issue Jul 7, 2020 · 2 comments
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@kunalspathak
Copy link
Member

kunalspathak commented Jul 7, 2020

Today we don't CSE loading the target of indirect cell address because during CSE we don't have that information in the IR. It happens in later phase like lower.

Consider the following code pattern:

        ...
        9000000B          adrp    x11, [RELOC #0x1de756537c0]
        9100016B          add     x11, x11, #0
        F9400160          ldr     x0, [x11]
        D63F0000          blr     x0
        ...
        9000000B          adrp    x11, [RELOC #0x1de756537c0]
        9100016B          add     x11, x11, #0
        F9400160          ldr     x0, [x11]
        D63F0000          blr     x0
        AA0003EF          mov     x15, x0
       ...

If we can optimize it using peephole or more some ambitious final instructions scanner phase to something like this to:

        ...
        9000000B          adrp    x11, [RELOC #0x1de756537c0]
        9100016B          add     x11, x11, #0
        F9400160          ldr     x0, [x11]
                          mov     xR, x0     ; store x0 in some register xR
        D63F0000          blr     x0
        ...
                          mov     x0, xR     ; retrieve xR into x0
        D63F0000          blr     x0
        AA0003EF          mov     x15, x0
       ...

With this, we can get an improvement of 8 bytes + 1 elimination of memory access.
I wrote an analyzer asm to find out how many addresses are CSE candidates and the number is huge. From what I noticed, it would by little over 2MB of size reduction.

Processed 191816 methods. Found 29246 methods containing 259123 groups.

Details: cse-candidates.txt

category:cq
theme:cse
skill-level:expert
cost:large
impact:large

@kunalspathak kunalspathak added arch-arm64 tenet-performance Performance related issue labels Jul 7, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Jul 7, 2020
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label Jul 8, 2020
@BruceForstall BruceForstall added this to the Future milestone Jul 8, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 7, 2020
@jakobbotsch
Copy link
Member

Note that the contents of the indirection cell is not invariant. In the example here the second time the call happens the indirection cell will be pointing to the R2R code address and later to tier-1 code, while the first time around it might be pointing to an import thunk.

With that said it might still be ok to do this optimization as the import thunk/delay load helper should be idempotent so calling it a second time in rare cases should be ok (though with loops we should consider avoiding it).

@EgorBo
Copy link
Member

EgorBo commented Jul 20, 2023

Together with @jakobbotsch we've just checked that it seems that we already do the right thing here, e.g.:

image

The address is hoisted, the ldr is not (as expected)

@EgorBo EgorBo closed this as completed Jul 20, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants