Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64] HFA register arguments pushed to stack #35635

Closed
BruceForstall opened this issue Apr 29, 2020 · 2 comments
Closed

[Arm64] HFA register arguments pushed to stack #35635

BruceForstall opened this issue Apr 29, 2020 · 2 comments
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@BruceForstall
Copy link
Member

BruceForstall commented Apr 29, 2020

It appears that HFA register arguments are always pushed to the stack.

Example:

using System;

namespace hfaargs
{
    struct hfa1
    {
        public float f1, f2;
    }

    class Program
    {
        static float f(hfa1 a)
        {
            return a.f1 + a.f2;
        }

        static void Main(string[] args)
        {
            hfa1 h = new hfa1();
            h.f1 = 1.0F;
            h.f2 = 2.0F;
            float ret = f(h);

            Console.WriteLine("{0}", ret);
        }
    }
}

generates:

;  V00 arg0         [V00    ] (  4,  4   )  struct ( 8) [fp+0x18]   HFA(float)  do-not-enreg[XSFA] multireg-arg addr-exposed
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 16

G_M4231_IG01:
        A9BE7BFD          stp     fp, lr, [sp,#-32]!
        910003FD          mov     fp, sp
        BD001BA0          str     s0, [fp,#24]
        BD001FA1          str     s1, [fp,#28]
G_M4231_IG02:
        BD401BA0          ldr     s0, [fp,#24]
G_M4231_IG03:
        BD401FB0          ldr     s16, [fp,#28]
G_M4231_IG04:
        1E302800          fadd    s0, s0, s16
G_M4231_IG05:
        A8C27BFD          ldp     fp, lr, [sp],#32
        D65F03C0          ret     lr

This particular case could just be:

        stp     fp, lr, [sp,#-16]!
        mov     fp, sp
        fadd    s0, s0, s1          ; only one non-prolog/epilog instruction; no stack space needed for the args
        ldp     fp, lr, [sp],#16
        ret     lr

Related: #35631

category:cq
theme:register-allocator
skill-level:expert
cost:medium

@BruceForstall BruceForstall added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization labels Apr 29, 2020
@BruceForstall BruceForstall added this to the Future milestone Apr 29, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Apr 29, 2020
@BruceForstall
Copy link
Member Author

cc @kunalspathak @CarolEidt

@BruceForstall
Copy link
Member Author

This appears to be fixed. The HFA is marked as struct promoted. (Repro with PMI and altjit):

; Assembly listing for method hfaargs.Program:f(hfaargs.hfa1):float
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; optimized code
; fp based frame
; partially interruptible
; invoked as altjit
; Final local variable assignments
;
;* V00 arg0         [V00    ] (  0,  0   )  struct ( 8) zero-ref    HFA(float)  multireg-arg
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;  V02 tmp1         [V02,T00] (  2,  2   )   float  ->   d0         V00.f1(offs=0x00) P-INDEP "field V00.f1 (fldOffset=0x0)"
;  V03 tmp2         [V03,T01] (  2,  2   )   float  ->   d1         V00.f2(offs=0x04) P-INDEP "field V00.f2 (fldOffset=0x4)"
;
; Lcl frame size = 0

G_M4231_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
                        ;; bbWeight=1    PerfScore 1.50
G_M4231_IG02:              ;; offset=0008H
        1E212800          fadd    s0, s0, s1
                        ;; bbWeight=1    PerfScore 3.00
G_M4231_IG03:              ;; offset=000CH
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr

@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

2 participants