Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISC-V] Put scalar stack args with sign extension #97662

Merged
merged 2 commits into from
Jan 30, 2024

Conversation

t-mustafin
Copy link
Contributor

According to Integer calling convention arguments on stack narrower than XLEN bits are stored on stack with sign extension to XLEN according to theirs type.

Part of #84834
cc @clamp03 @tomeksowi @gbalykov @denis-paranichev @ashaurtaev

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 29, 2024
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 29, 2024
@ghost
Copy link

ghost commented Jan 29, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

According to Integer calling convention arguments on stack narrower than XLEN bits are stored on stack with sign extension to XLEN according to theirs type.

Part of #84834
cc @clamp03 @tomeksowi @gbalykov @denis-paranichev @ashaurtaev

Author: t-mustafin
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@am11 am11 added the arch-riscv Related to the RISC-V architecture label Jan 29, 2024
Formatting patch
@ryujit-bot
Copy link

Diff results for #97662

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.osx.arm64.checked.mch -0.01%

Details here


@HJLeee
Copy link
Contributor

HJLeee commented Jan 29, 2024

Reproduction info.

  • Sample Code, runtime need to be built with Clang-16+
using System.IO;
using System.Xml.Serialization;

namespace Oobe.Language.Model
{
    public class LanguageInfo
    {
        [XmlAttribute("code")]
        public string? Code { get; set; }
    }

    public class LanguageManger
    {
        public static void Main(string[] args)
        {
            var xs = new XmlSerializer(typeof(LanguageInfo));
            System.Console.WriteLine(xs);
        }
    }
}
  • OOM callstack
#0  __GI___libc_malloc (bytes=6493990551552) at malloc.c:3270
#1  0x0000003fa00fab2a in ClrMalloc (size=6493990551552) at /usr/src/debug/coreclr-8.0.0-0.riscv64/src/coreclr/utilcode/clrhost_nodependencies.cpp:249
#2  operator new[] (n=6493990551552) at /usr/src/debug/coreclr-8.0.0-0.riscv64/src/coreclr/utilcode/clrhost_nodependencies.cpp:329
#3  0x0000003f9ffa1adc in NSQuickBytesHelper::_AllocBytes<1>::Invoke (iItems=6493990551552) at /usr/src/debug/coreclr-8.0.0-0.riscv64/src/coreclr/inc/corhlprpriv.h:50
#4  CQuickMemoryBase<512ul, 128ul>::_Alloc<0, 1> (this=0x3fe1f458e0, iItems=6493990551552) at /usr/src/debug/coreclr-8.0.0-0.riscv64/src/coreclr/inc/corhlprpriv.h:120
#5  CQuickMemoryBase<512ul, 128ul>::AllocThrows (this=0x3fe1f458e0, iItems=6493990551552) at /usr/src/debug/coreclr-8.0.0-0.riscv64/src/coreclr/inc/corhlprpriv.h:172
#6  CQuickArrayBase<tagCOR_ILMETHOD_SECT_EH_CLAUSE_FAT>::AllocThrows (this=0x3fe1f458e0, iItems=270582939648) at /usr/src/debug/coreclr-8.0.0-0.riscv64/src/coreclr/inc/corhlprpriv.h:389
#7  TypeBuilder_SetMethodIL (pModule=..., tk=100663297, fIsInitLocal=<optimized out>, pBody=0x37c6430360 "\002(\002", cbBody=<optimized out>, pLocalSig=<optimized out>, sigLength=2, maxStackSize=<optimized out>, pExceptions=0x0, numExceptions=0, pTokenFixups=0x37c64303c0, numTokenFixups=<optimized out>)
    at /usr/src/debug/coreclr-8.0.0-0.riscv64/src/coreclr/vm/comdynamic.cpp:357
#8  0x0000003f214ce1b8 in ?? ()
-O3 assembler of `TypeBuilder_SetMethodIL`
(gdb) bt 4
#0  CQuickArrayBase<tagCOR_ILMETHOD_SECT_EH_CLAUSE_FAT>::AllocThrows (this=0x3fffffcbe0, iItems=0x3f00000000) at /usr/src/debug/coreclr-8.0.0-0.riscv64/src/coreclr/inc/corhlprpriv.h:389
#1  TypeBuilder_SetMethodIL (pModule=..., tk=0x6000001, fIsInitLocal=0x1, pBody=0xaad433f40 "\002(\002", cbBody=<optimized out>, pLocalSig=<optimized out>, sigLength=0x2, maxStackSize=<optimized out>, pExceptions=0x0, numExceptions=0x0, pTokenFixups=0xaad433fa0, numTokenFixups=<optimized out>) at /usr/src/debug/coreclr-8.0.0-0.riscv64/src/coreclr/vm/comdynamic.cpp:357
#2  0x0000003f78cf31e8 in ?? ()
(gdb) i r ra
ra             0x3ff777fab0     0x3ff777fab0 <TypeBuilder_SetMethodIL(QCall::ModuleHandle, INT32, BOOL, LPCBYTE, INT32, LPCBYTE, INT32, UINT16, ExceptionInstance*, INT32, INT32*, INT32)+264>
(gdb) disassemble 
Dump of assembler code for function TypeBuilder_SetMethodIL(QCall::ModuleHandle, INT32, BOOL, LPCBYTE, INT32, LPCBYTE, INT32, UINT16, ExceptionInstance*, INT32, INT32*, INT32):
   0x0000003ff777f9a8 <+0>:     addi    sp,sp,-1328
   0x0000003ff777f9ac <+4>:     sd      ra,1320(sp)
   0x0000003ff777f9b0 <+8>:     sd      s0,1312(sp)
   0x0000003ff777f9b4 <+12>:    sd      s1,1304(sp)
   0x0000003ff777f9b8 <+16>:    sd      s2,1296(sp)
   0x0000003ff777f9bc <+20>:    sd      s3,1288(sp)
   0x0000003ff777f9c0 <+24>:    sd      s4,1280(sp)
   0x0000003ff777f9c4 <+28>:    sd      s5,1272(sp)
   0x0000003ff777f9c8 <+32>:    sd      s6,1264(sp)
   0x0000003ff777f9cc <+36>:    sd      s7,1256(sp)
   0x0000003ff777f9d0 <+40>:    sd      s8,1248(sp)
   0x0000003ff777f9d4 <+44>:    sd      s9,1240(sp)
   0x0000003ff777f9d8 <+48>:    sd      s10,1232(sp)
   0x0000003ff777f9dc <+52>:    sd      s11,1224(sp)
   0x0000003ff777f9e0 <+56>:    addi    s0,sp,1328
   0x0000003ff777f9e4 <+60>:    auipc   a0,0x46b
   0x0000003ff777f9e8 <+64>:    ld      s9,532(a0) # 0x3ff7beabf8
   0x0000003ff777f9ec <+68>:    ld      a0,0(s9)
   0x0000003ff777f9f0 <+72>:    mv      s1,a7
   0x0000003ff777f9f2 <+74>:    mv      s6,a6
   0x0000003ff777f9f4 <+76>:    mv      s5,a5
   0x0000003ff777f9f6 <+78>:    mv      s11,a4
   0x0000003ff777f9f8 <+80>:    mv      s2,a3
   0x0000003ff777f9fa <+82>:    mv      s4,a2
   0x0000003ff777f9fc <+84>:    mv      s10,a1
   0x0000003ff777f9fe <+86>:    sd      a0,-120(s0)
   0x0000003ff777fa02 <+90>:    sd      zero,-1248(s0)
   0x0000003ff777fa06 <+94>:    sd      zero,-1256(s0)
   0x0000003ff777fa0a <+98>:    li      a0,-1
   0x0000003ff777fa0c <+100>:   sd      a0,-1240(s0)
   0x0000003ff777fa10 <+104>:   sd      zero,-1232(s0)
   0x0000003ff777fa14 <+108>:   sh      zero,-1216(s0)
   0x0000003ff777fa18 <+112>:   sd      zero,-1208(s0)
   0x0000003ff777fa1c <+116>:   sd      zero,-1200(s0)
   0x0000003ff777fa20 <+120>:   auipc   a0,0x46b
   0x0000003ff777fa24 <+124>:   addi    a0,a0,-576 # 0x3ff7bea7e0
   0x0000003ff777fa28 <+128>:   jal     ra,0x3ff76813f0 <__tls_get_addr@plt>
   0x0000003ff777fa2c <+132>:   ld      a0,0(a0)
   0x0000003ff777fa2e <+134>:   ld      a0,16(a0)
   0x0000003ff777fa30 <+136>:   sd      a0,-1328(s0)
   0x0000003ff777fa34 <+140>:   ld      s3,1536(s10)
   0x0000003ff777fa38 <+144>:   li      a0,2
   0x0000003ff777fa3a <+146>:   sd      s4,-1312(s0)
   0x0000003ff777fa3e <+150>:   bne     s1,a0,0x3ff777fa50 <TypeBuilder_SetMethodIL(QCall::ModuleHandle, INT32, BOOL, LPCBYTE, INT32, LPCBYTE, INT32, UINT16, ExceptionInstance*, INT32, INT32*, INT32)+168>
   0x0000003ff777fa42 <+154>:   lbu     a0,0(s6)
   0x0000003ff777fa46 <+158>:   bnez    a0,0x3ff777fa50 <TypeBuilder_SetMethodIL(QCall::ModuleHandle, INT32, BOOL, LPCBYTE, INT32, LPCBYTE, INT32, UINT16, ExceptionInstance*, INT32, INT32*, INT32)+168>
   0x0000003ff777fa48 <+160>:   lbu     a0,1(s6)
   0x0000003ff777fa4c <+164>:   beqz    a0,0x3ff777ff04 <TypeBuilder_SetMethodIL(QCall::ModuleHandle, INT32, BOOL, LPCBYTE, INT32, LPCBYTE, INT32, UINT16, ExceptionInstance*, INT32, INT32*, INT32)+1372>
   0x0000003ff777fa50 <+168>:   ld      a0,0(s3)
   0x0000003ff777fa54 <+172>:   ld      a1,0(a0)
   0x0000003ff777fa56 <+174>:   ld      a4,184(a1)
   0x0000003ff777fa58 <+176>:   addi    a3,s0,-1260
   0x0000003ff777fa5c <+180>:   mv      a1,s6
   0x0000003ff777fa5e <+182>:   mv      a2,s1
   0x0000003ff777fa60 <+184>:   jalr    a4
   0x0000003ff777fa62 <+186>:   bltz    a0,0x3ff777ff32 <TypeBuilder_SetMethodIL(QCall::ModuleHandle, INT32, BOOL, LPCBYTE, INT32, LPCBYTE, INT32, UINT16, ExceptionInstance*, INT32, INT32*, INT32)+1418>
   0x0000003ff777fa66 <+190>:   lbu     a1,-1271(s0)
   0x0000003ff777fa6a <+194>:   lw      a0,-1260(s0)
   0x0000003ff777fa6e <+198>:   ld      s8,16(s0)
   0x0000003ff777fa72 <+202>:   ld      a2,0(s0)
   0x0000003ff777fa74 <+204>:   snez    a3,s2
   0x0000003ff777fa78 <+208>:   slliw   a3,a3,0x4
   0x0000003ff777fa7c <+212>:   sb      a3,-1272(s0)
   0x0000003ff777fa80 <+216>:   andi    a1,a1,240
   0x0000003ff777fa84 <+220>:   sb      a1,-1271(s0)
   0x0000003ff777fa88 <+224>:   sh      a2,-1270(s0)
   0x0000003ff777fa8c <+228>:   sw      a0,-1264(s0)
   0x0000003ff777fa90 <+232>:   sw      s5,-1268(s0)
   0x0000003ff777fa94 <+236>:   snez    s7,s8
   0x0000003ff777fa98 <+240>:   beqz    s8,0x3ff777faa2 <TypeBuilder_SetMethodIL(QCall::ModuleHandle, INT32, BOOL, LPCBYTE, INT32, LPCBYTE, INT32, UINT16, ExceptionInstance*, INT32, INT32*, INT32)+250>
   0x0000003ff777fa9c <+244>:   addiw   s5,s5,3
   0x0000003ff777fa9e <+246>:   andi    s5,s5,-4
   0x0000003ff777faa2 <+250>:   addi    a0,s0,-1272
   0x0000003ff777faa6 <+254>:   mv      a1,s7
   0x0000003ff777faa8 <+256>:   auipc   ra,0x161
   0x0000003ff777faac <+260>:   jalr    -1404(ra) # 0x3ff78e052c <IlmethodSize(COR_ILMETHOD_FAT*, BOOL)>
   0x0000003ff777fab0 <+264>:   mv      s6,a0
   0x0000003ff777fab2 <+266>:   sd      zero,-648(s0)
   0x0000003ff777fab6 <+270>:   sd      zero,-656(s0)
   0x0000003ff777faba <+274>:   li      a0,512
   0x0000003ff777fabe <+278>:   sd      a0,-640(s0)
   0x0000003ff777fac2 <+282>:   blez    s8,0x3ff777fc62 <TypeBuilder_SetMethodIL(QCall::ModuleHandle, INT32, BOOL, LPCBYTE, INT32, LPCBYTE, INT32, UINT16, ExceptionInstance*, INT32, INT32*, INT32)+698>
   0x0000003ff777fac6 <+286>:   li      a0,24
   0x0000003ff777fac8 <+288>:   li      a1,22
=> 0x0000003ff777faca <+290>:   mul     s2,s8,a0
   0x0000003ff777face <+294>:   bltu    s8,a1,0x3ff777fae4 <TypeBuilder_SetMethodIL(QCall::ModuleHandle, INT32, BOOL, LPCBYTE, INT32, LPCBYTE, INT32, UINT16, ExceptionInstance*, INT32, INT32*, INT32)+316>
   0x0000003ff777fad2 <+298>:   mv      a0,s2
   0x0000003ff777fad4 <+300>:   auipc   ra,0x159
   0x0000003ff777fad8 <+304>:   jalr    70(ra) # 0x3ff78d8b1a <_Znam>
   0x0000003ff777fadc <+308>:   sd      a0,-656(s0)
   0x0000003ff777fae0 <+312>:   sd      s2,-640(s0)
   0x0000003ff777fae4 <+316>:   li      t4,0
   0x0000003ff777fae6 <+318>:   li      t3,0
   0x0000003ff777fae8 <+320>:   ld      t0,8(s0)
   0x0000003ff777faec <+324>:   sd      s2,-648(s0)
   0x0000003ff777faf0 <+328>:   addi    t2,s0,-632
   0x0000003ff777faf4 <+332>:   li      t1,2
...
(gdb) p &iItems
Address requested for identifier "iItems" which is in register $s8
(gdb) i r s8
s8             0x3f00000000     0x3f00000000

Copy link
Contributor

@HJLeee HJLeee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified.

@clamp03 clamp03 requested a review from jakobbotsch January 30, 2024 01:44
@jakobbotsch jakobbotsch merged commit 558cde6 into dotnet:main Jan 30, 2024
129 checks passed
@t-mustafin t-mustafin deleted the riscv_stack_argx_sign_extension branch January 30, 2024 12:38
@brucehoult
Copy link

brucehoult commented Feb 29, 2024

It should not be necessary to do anything to support this requirement, because all values in the program should follow these rules and be in canonical form at all times.

There are only 64 bit compare-and-branch operations, and they depend on the values they compare being correctly in canonical form.

C rules on all platforms require narrower values to be extended to int size according to their sign, and lb, lbu, lh, lhu automatically do this (and in fact widen correctly to the full 64 bits). Nothing strange there.

The only RISC-V specific thing is that lw should be used to load both signed and unsigned 32 bit values (using lwu only if the value is loaded and immediately cast to uint64).

All arithmetic operations automatically maintain the "32-bit values are sign-extended to 64 bits" property -- the *W operations for ADDI, ADD, SUB, MUL, DIV, REM, and the shifts. Normal 64 bit AND, OR, and XOR naturally retain the correct property in the result if the operands are correct.

If the correct load instructions and correct arithmetic instructions are used, then no special action is required to put function arguments into the correct form -- they already will be.

@tomeksowi
Copy link
Contributor

It should not be necessary to do anything to support this requirement, because all values in the program should follow these rules and be in canonical form at all times.

This is about call site construction, we need sd to pass ints smaller than 64 bits on the stack, otherwise the stack slot will be partly undefined.

If the correct load instructions and correct arithmetic instructions are used, then no special action is required to put function arguments into the correct form -- they already will be.

The callee can and does load int arguments shorter than 64 bits with ld, since the int argument on the stack needs to be sign-extended to 64 bits, it's legal to do so. This bug came up with making interop calls, the callee being native.

@brucehoult
Copy link

Perhaps it is the PR title and the comment that confused me. Certainly you need to store the appropriately-extended value. So you need to use sd, yes. But the title and comment should just be "Store all 64 bits of shorter arguments passed on the stack". Nothing to do with sign-extending them, as they already will have been by whatever generated them

e.g. see function foo() in https://godbolt.org/z/78h8a3rnM

Code change LGTM, just the comment is confusing.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-riscv Related to the RISC-V architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants