Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64: Virtual stub call produces redundant address load for R2R and JIT #36700

Closed
kunalspathak opened this issue May 19, 2020 · 2 comments · Fixed by #36817
Closed

ARM64: Virtual stub call produces redundant address load for R2R and JIT #36700

kunalspathak opened this issue May 19, 2020 · 2 comments · Fixed by #36817
Labels
arch-arm32 arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@kunalspathak
Copy link
Member

While working on #35675, I noticed that there are scenarios when we do virtual stub call, we generate redundant code to load the stub address. This is true in R2R as well as JIT.

public class B : I
{
    int I.F() => 33;
}

public class D : B, I
{
    int I.F() => 44;
}

public class E : B, I
{
    int I.F() => 55;
}

public long Call(TestInput testInput)
{
  long sum = 0;
  for (int i = 0; i < input.Length; i++)
  {
    sum += ((I)input[i]).F();
   }
  return sum;
}

For this code, we still generate duplicate adrp/add pair and can be optimized similar to done in #35675.

        9000000B          adrp    x11, [RELOC #0x231e87739b0]
        9100016B          add     x11, x11, #0
        90000001          adrp    x1, [RELOC #0x231e87739b0]
        91000021          add     x1, x1, #0
        F9400021          ldr     x1, [x1]
        D63F0020          blr     x1

Here is the JIT code that we generate today:

        D2800A0B          movz    x11, #80
        F2BB538B          movk    x11, #0xda9c LSL #16
        F2CFFF6B          movk    x11, #0x7ffb LSL #32
        D2800A01          movz    x1, #80
        F2BB5381          movk    x1, #0xda9c LSL #16
        F2CFFF61          movk    x1, #0x7ffb LSL #32
        F9400021          ldr     x1, [x1]
        D63F0020          blr     x1
        93407C00          sxtw    x0, w0
        8B140014          add     x20, x0, x20
        110006F7          add     w23, w23, #1
        6B17031F          cmp     w24, w23
        54FFFE0C          bgt     G_M49262_IG04
                                                ;; bbWeight=16    PerfScore 240.00
G_M49262_IG05:
        110006B5          add     w21, w21, #1
        5290D400          movz    w0, #0x86a0

Similar issue but for different scenario: #35108

@kunalspathak kunalspathak added arch-arm32 arch-arm64 tenet-performance Performance related issue area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels May 19, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label May 19, 2020
@kunalspathak
Copy link
Member Author

//cc @BruceForstall , @CarolEidt

@kunalspathak
Copy link
Member Author

So after we fix #35675, here are the number of remaining adrp/add groups that we should optimize. These corresponds to ~20% of adrp/add groups that we saw originally as mentioned in #35108 (comment). I am not sure if all of them correspond to Virtual stub call, but we can double check once we address this issue.

Processed 191779 methods. Found 44508 methods containing 132193 groups.

@BruceForstall BruceForstall added this to the Future milestone May 19, 2020
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label May 19, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm32 arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants