Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SystemZ stack temporary overflow #48666

Closed
cuviper opened this issue Feb 22, 2021 · 11 comments
Closed

SystemZ stack temporary overflow #48666

cuviper opened this issue Feb 22, 2021 · 11 comments
Assignees
Labels
backend:SystemZ bugzilla Issues migrated from bugzilla

Comments

@cuviper
Copy link
Member

cuviper commented Feb 22, 2021

Bugzilla Link 49322
Resolution FIXED
Resolved on Jun 21, 2021 11:00
Version trunk
OS Linux
Blocks #48661 #50032
CC @alex,@asb,@JonPsson,@tstellar,@uweigand
Fixed by commit(s) 52bbbf4 0193a7d

Extended Description

This bug was reduced from one of the failures in Rust #​80810:
rust-lang/rust#80810

When a large integer argument on s390x is converted to indirect, but the type is not a multiple of 64 bits, the writes to the stack are all still in 64-bit chunks and may clobber neighboring values on the stack.

arg-i96.ll

target datalayout = "E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64"
target triple = "s390x-unknown-linux-gnu"

declare hidden void @​fn1(i96) unnamed_addr

define hidden i32 @​fn2() unnamed_addr {
start:
%0 = alloca i32, align 4
store i32 -1, i32* %0, align 4
call void @​fn1(i96 0)
%1 = load i32, i32* %0, align 4
ret i32 %1
}

llc -O0

.text
.file	"arg-i96.ll"
.hidden	fn2                             # -- Begin function fn2
.globl	fn2
.p2align	4
.type	fn2,@function

fn2: # @​fn2
.cfi_startproc

%bb.0: # %start

stmg	%r14, %r15, 112(%r15)
.cfi_offset %r14, -48
.cfi_offset %r15, -40
aghi	%r15, -176
.cfi_def_cfa_offset 336
mvhi	172(%r15), -1
mvghi	168(%r15), 0
mvghi	160(%r15), 0
la	%r2, 160(%r15)
brasl	%r14, fn1@PLT
l	%r2, 172(%r15)
lmg	%r14, %r15, 288(%r15)
br	%r14

.Lfunc_end0:
.size fn2, .Lfunc_end0-fn2
.cfi_endproc
# -- End function
.hidden fn1
.section ".note.GNU-stack","",@progbits

In this reproducer, the 32-bit store to %0 -- mvhi 172(%r15), -1 -- is immediately overwritten by the overflowing 64-bit store to the end of %1 -- mvghi 168(%r15), 0.

With --print-after-all, you can also see the 12-byte (96-bit) frame allocation with two 8-byte writes.

*** IR Dump After Finalize ISel and expand pseudo-instructions ***:

Machine code for function fn2: IsSSA, TracksLiveness

Frame Objects:
fi#0: size=4, align=4, at location [SP]
fi#1: size=12, align=8, at location [SP]

bb.0.start:
MVHI %stack.0, 0, -1 :: (store 4 into %ir.0)
ADJCALLSTACKDOWN 0, 0
MVGHI %stack.1, 8, 0 :: (store 8 into %stack.1)
MVGHI %stack.1, 0, 0 :: (store 8 into %stack.1)
%0:gr64bit = LA %stack.1, 0, $noreg
$r2d = COPY %0:gr64bit
CallBRASL @​fn1, $r2d, <regmask $f8d $f9d $f10d $f11d $f12d $f13d $f14d $f15d $f8q $f9q $f12q $f13q $f8s $f9s $f10s $f11s $f12s $f13s $f14s $f15s $r6d $r7d $r8d $r9d $r10d $r11d $r12d $r13d $r14d $r15d $r6h $r7h $r8h and 22 more...>, implicit-def dead $r14d, implicit-def dead $cc, implicit $fpc
ADJCALLSTACKUP 0, 0
%1:gr32bit = L %stack.0, 0, $noreg :: (dereferenceable load 4 from %ir.0)
$r2l = COPY %1:gr32bit
Return implicit $r2l

End machine code for function fn2.

@cuviper
Copy link
Member Author

cuviper commented Feb 22, 2021

assigned to @JonPsson

@cuviper
Copy link
Member Author

cuviper commented Feb 22, 2021

In this reproducer, the 32-bit store to %0 -- mvhi 172(%r15), -1 -- is immediately overwritten by the overflowing 64-bit store to the end of %1 -- mvghi 168(%r15), 0.

Sorry, I misspoke -- the latter move is part of the spilled argument for the call, not %1.

@cuviper
Copy link
Member Author

cuviper commented Feb 24, 2021

The 12-byte allocation comes from SystemZTargetLowering::LowerCall here:

if (VA.getLocInfo() == CCValAssign::Indirect) {
  // Store the argument in a stack slot and pass its address.
  SDValue SpillSlot = DAG.CreateStackTemporary(Outs[I].ArgVT);

I looked around for other targets that do this for Indirect args. X86 does create stack temporaries for !isByVal args, but using VA.getValVT(), which I think means it's using a separate temporary for each piece of the split arg. However, RISCVTargetLowering::LowerCall is pretty much the same as SystemZ:

// Promote the value if needed.
// For now, only handle fully promoted and indirect arguments.
if (VA.getLocInfo() == CCValAssign::Indirect) {
  // Store the argument in a stack slot and pass its address.
  SDValue SpillSlot = DAG.CreateStackTemporary(Outs[i].ArgVT);

RISCV appears fine with the i96 test, as it just passes that in two registers, but it does have a problem if I bump that to i160:
(-mtriple riscv64-unknown-linux-gnu)

    .text
    .attribute      4, 16
    .attribute      5, "rv64i2p0"
    .file   "<stdin>"
    .hidden fn2                             # -- Begin function fn2
    .globl  fn2
    .p2align        2
    .type   fn2,@function

fn2: # @​fn2
.cfi_startproc

%bb.0: # %start

    addi    sp, sp, -32
    .cfi_def_cfa_offset 32
    sd      ra, 24(sp)                      # 8-byte Folded Spill
    .cfi_offset ra, -8
    addi    a0, zero, 1
    slli    a0, a0, 32
    addi    a0, a0, -1
    sw      a0, 20(sp)
    mv      a0, zero
    sd      a0, 16(sp)
    sd      a0, 8(sp)
    sd      a0, 0(sp)
    mv      a0, sp
    call    fn1
    lw      a0, 20(sp)
    ld      ra, 24(sp)                      # 8-byte Folded Reload
    addi    sp, sp, 32
    ret

.Lfunc_end0:
.size fn2, .Lfunc_end0-fn2
.cfi_endproc
# -- End function
.hidden fn1
.section ".note.GNU-stack","",@progbits

So the -1 "sw a0, 20(sp)" is clobbered by the 0 "sd a0, 16(sp)".

@uweigand
Copy link
Member

This does indeed look like a SystemZ back-end bug.

The problem is the way common code handles large integer types: they are split into "register-sized" pieces. If the size of the type is not a multiple of the register size, then common code implicitly extends the type to the next-larger type that is.

In this case, this means the i96 input type is implicitly extended to i128, which is then split into two i64 pieces.

However, according to the SystemZ ABI integer types larger than a register are actually supposed to be passed via implicit reference, so the back-end attempts to undo the split into pieces done by common code. The current back-end code for this works correctly only if no such extension has taken place. This used to be OK since the only type this code path was really ever intended to be used for was the i128 case (which needs to be passed via implicit reference for compatibility with GCC).

So in the end this is now a question of defining and then implementing a proper ABI for these other large integer types. I guess there's two "obvious" possibilities:

  1. Pass by implicit reference to the actual type (i96 in this case)
  2. Pass by implicit reference to the extended type (i128 in this case)

Variant 1) might be considered more natural and uses less stack space, but is more difficult and inefficient to implement (on a big-endian system we'd have to load/store parts of the first piece, and then load/store the remaining pieces at "odd" adjusted offsets).

Variant 2) seems straightforward to implement, the only change to existing code would be to allocate a larger stack slot. So I think this is what I'd prefer to do.

Jonas, can you have a try at implementing this?

@JonPsson
Copy link
Contributor

Thanks for the reduced test case!

Suggested patch at https://reviews.llvm.org/D97514.

@JonPsson
Copy link
Contributor

JonPsson commented Mar 2, 2021

52bbbf4

@cuviper
Copy link
Member Author

cuviper commented Mar 9, 2021

Thank you both! I have confirmed that this does fix the issues seen in Rust #​80810. I also opened bug 49500 for the similar RISCV bug.

@tstellar
Copy link
Collaborator

Hi Jonas,

What is your opinion on backporting this?

https://reviews.llvm.org/rG52bbbf4d4459239e0f461bc302ada89e2c5d07fc

@JonPsson
Copy link
Contributor

Hi Jonas,

What is your opinion on backporting this?

https://reviews.llvm.org/rG52bbbf4d4459239e0f461bc302ada89e2c5d07fc

That seems like a good idea...

@tstellar
Copy link
Collaborator

Merged: 0193a7d

@cuviper
Copy link
Member Author

cuviper commented Nov 27, 2021

mentioned in issue #50032

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:SystemZ bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants