SystemZ stack temporary overflow #48666

cuviper · 2021-02-22T22:56:40Z


Bugzilla Link	49322
Resolution	FIXED
Resolved on	Jun 21, 2021 11:00
Version	trunk
OS	Linux
Blocks	#48661 #50032
CC	@alex,@asb,@JonPsson,@tstellar,@uweigand
Fixed by commit(s)	52bbbf4 `0193a7d`

Extended Description

This bug was reduced from one of the failures in Rust #80810:
rust-lang/rust#80810

When a large integer argument on s390x is converted to indirect, but the type is not a multiple of 64 bits, the writes to the stack are all still in 64-bit chunks and may clobber neighboring values on the stack.

arg-i96.ll

target datalayout = "E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64"
target triple = "s390x-unknown-linux-gnu"

declare hidden void @fn1(i96) unnamed_addr

define hidden i32 @fn2() unnamed_addr {
start:
%0 = alloca i32, align 4
store i32 -1, i32* %0, align 4
call void @fn1(i96 0)
%1 = load i32, i32* %0, align 4
ret i32 %1
}

llc -O0

.text
.file	"arg-i96.ll"
.hidden	fn2                             # -- Begin function fn2
.globl	fn2
.p2align	4
.type	fn2,@function

fn2: # @fn2
.cfi_startproc

%bb.0: # %start

stmg	%r14, %r15, 112(%r15)
.cfi_offset %r14, -48
.cfi_offset %r15, -40
aghi	%r15, -176
.cfi_def_cfa_offset 336
mvhi	172(%r15), -1
mvghi	168(%r15), 0
mvghi	160(%r15), 0
la	%r2, 160(%r15)
brasl	%r14, fn1@PLT
l	%r2, 172(%r15)
lmg	%r14, %r15, 288(%r15)
br	%r14

.Lfunc_end0:
.size fn2, .Lfunc_end0-fn2
.cfi_endproc
# -- End function
.hidden fn1
.section ".note.GNU-stack","",@progbits

In this reproducer, the 32-bit store to %0 -- mvhi 172(%r15), -1 -- is immediately overwritten by the overflowing 64-bit store to the end of %1 -- mvghi 168(%r15), 0.

With --print-after-all, you can also see the 12-byte (96-bit) frame allocation with two 8-byte writes.

* IR Dump After Finalize ISel and expand pseudo-instructions *:

Machine code for function fn2: IsSSA, TracksLiveness

Frame Objects:
fi#0: size=4, align=4, at location [SP]
fi#1: size=12, align=8, at location [SP]

bb.0.start:
MVHI %stack.0, 0, -1 :: (store 4 into %ir.0)
ADJCALLSTACKDOWN 0, 0
MVGHI %stack.1, 8, 0 :: (store 8 into %stack.1)
MVGHI %stack.1, 0, 0 :: (store 8 into %stack.1)
%0:gr64bit = LA %stack.1, 0, $noreg
$r2d = COPY %0:gr64bit
CallBRASL @fn1, $r2d, <regmask $f8d $f9d $f10d $f11d $f12d $f13d $f14d $f15d $f8q $f9q $f12q $f13q $f8s $f9s $f10s $f11s $f12s $f13s $f14s $f15s $r6d $r7d $r8d $r9d $r10d $r11d $r12d $r13d $r14d $r15d $r6h $r7h $r8h and 22 more...>, implicit-def dead $r14d, implicit-def dead $cc, implicit $fpc
ADJCALLSTACKUP 0, 0
%1:gr32bit = L %stack.0, 0, $noreg :: (dereferenceable load 4 from %ir.0)
$r2l = COPY %1:gr32bit
Return implicit $r2l

End machine code for function fn2.

The text was updated successfully, but these errors were encountered:

cuviper · 2021-02-22T22:56:40Z

assigned to @JonPsson

cuviper · 2021-02-22T23:20:11Z

In this reproducer, the 32-bit store to %0 -- mvhi 172(%r15), -1 -- is immediately overwritten by the overflowing 64-bit store to the end of %1 -- mvghi 168(%r15), 0.

Sorry, I misspoke -- the latter move is part of the spilled argument for the call, not %1.

cuviper · 2021-02-24T17:53:10Z

The 12-byte allocation comes from SystemZTargetLowering::LowerCall here:

if (VA.getLocInfo() == CCValAssign::Indirect) {
  // Store the argument in a stack slot and pass its address.
  SDValue SpillSlot = DAG.CreateStackTemporary(Outs[I].ArgVT);

I looked around for other targets that do this for Indirect args. X86 does create stack temporaries for !isByVal args, but using VA.getValVT(), which I think means it's using a separate temporary for each piece of the split arg. However, RISCVTargetLowering::LowerCall is pretty much the same as SystemZ:

// Promote the value if needed.
// For now, only handle fully promoted and indirect arguments.
if (VA.getLocInfo() == CCValAssign::Indirect) {
  // Store the argument in a stack slot and pass its address.
  SDValue SpillSlot = DAG.CreateStackTemporary(Outs[i].ArgVT);

RISCV appears fine with the i96 test, as it just passes that in two registers, but it does have a problem if I bump that to i160:
(-mtriple riscv64-unknown-linux-gnu)

    .text
    .attribute      4, 16
    .attribute      5, "rv64i2p0"
    .file   "<stdin>"
    .hidden fn2                             # -- Begin function fn2
    .globl  fn2
    .p2align        2
    .type   fn2,@function

fn2: # @fn2
.cfi_startproc

%bb.0: # %start

    addi    sp, sp, -32
    .cfi_def_cfa_offset 32
    sd      ra, 24(sp)                      # 8-byte Folded Spill
    .cfi_offset ra, -8
    addi    a0, zero, 1
    slli    a0, a0, 32
    addi    a0, a0, -1
    sw      a0, 20(sp)
    mv      a0, zero
    sd      a0, 16(sp)
    sd      a0, 8(sp)
    sd      a0, 0(sp)
    mv      a0, sp
    call    fn1
    lw      a0, 20(sp)
    ld      ra, 24(sp)                      # 8-byte Folded Reload
    addi    sp, sp, 32
    ret

.Lfunc_end0:
.size fn2, .Lfunc_end0-fn2
.cfi_endproc
# -- End function
.hidden fn1
.section ".note.GNU-stack","",@progbits

So the -1 "sw a0, 20(sp)" is clobbered by the 0 "sd a0, 16(sp)".

uweigand · 2021-02-25T18:43:41Z

This does indeed look like a SystemZ back-end bug.

The problem is the way common code handles large integer types: they are split into "register-sized" pieces. If the size of the type is not a multiple of the register size, then common code implicitly extends the type to the next-larger type that is.

In this case, this means the i96 input type is implicitly extended to i128, which is then split into two i64 pieces.

However, according to the SystemZ ABI integer types larger than a register are actually supposed to be passed via implicit reference, so the back-end attempts to undo the split into pieces done by common code. The current back-end code for this works correctly only if no such extension has taken place. This used to be OK since the only type this code path was really ever intended to be used for was the i128 case (which needs to be passed via implicit reference for compatibility with GCC).

So in the end this is now a question of defining and then implementing a proper ABI for these other large integer types. I guess there's two "obvious" possibilities:

Pass by implicit reference to the actual type (i96 in this case)
Pass by implicit reference to the extended type (i128 in this case)

Variant 1) might be considered more natural and uses less stack space, but is more difficult and inefficient to implement (on a big-endian system we'd have to load/store parts of the first piece, and then load/store the remaining pieces at "odd" adjusted offsets).

Variant 2) seems straightforward to implement, the only change to existing code would be to allocate a larger stack slot. So I think this is what I'd prefer to do.

Jonas, can you have a try at implementing this?

JonPsson · 2021-02-26T01:25:11Z

Thanks for the reduced test case!

Suggested patch at https://reviews.llvm.org/D97514.

JonPsson · 2021-03-02T19:06:37Z

52bbbf4

cuviper · 2021-03-09T21:56:01Z

Thank you both! I have confirmed that this does fix the issues seen in Rust #80810. I also opened bug 49500 for the similar RISCV bug.

tstellar · 2021-06-16T00:30:10Z

Hi Jonas,

What is your opinion on backporting this?

https://reviews.llvm.org/rG52bbbf4d4459239e0f461bc302ada89e2c5d07fc

JonPsson · 2021-06-18T00:22:30Z

Hi Jonas,

What is your opinion on backporting this?

https://reviews.llvm.org/rG52bbbf4d4459239e0f461bc302ada89e2c5d07fc

That seems like a good idea...

tstellar · 2021-06-21T18:00:59Z

Merged: 0193a7d

cuviper · 2021-11-27T04:29:04Z

mentioned in issue #50032

tstellar mentioned this issue Feb 22, 2021

[meta] 12.0.1 Release Blockers #48661

Closed

cuviper mentioned this issue Jun 12, 2021

LLVM 12 backports in rust-lang/llvm-project #50032

Closed

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SystemZ stack temporary overflow #48666

SystemZ stack temporary overflow #48666

cuviper commented Feb 22, 2021

cuviper commented Feb 22, 2021

cuviper commented Feb 22, 2021

cuviper commented Feb 24, 2021

uweigand commented Feb 25, 2021

JonPsson commented Feb 26, 2021

JonPsson commented Mar 2, 2021

cuviper commented Mar 9, 2021

tstellar commented Jun 16, 2021

JonPsson commented Jun 18, 2021

tstellar commented Jun 21, 2021

cuviper commented Nov 27, 2021

SystemZ stack temporary overflow #48666

SystemZ stack temporary overflow #48666

Comments

cuviper commented Feb 22, 2021

Extended Description

arg-i96.ll

llc -O0

%bb.0: # %start

*** IR Dump After Finalize ISel and expand pseudo-instructions ***:

Machine code for function fn2: IsSSA, TracksLiveness

End machine code for function fn2.

cuviper commented Feb 22, 2021

cuviper commented Feb 22, 2021

cuviper commented Feb 24, 2021

%bb.0: # %start

uweigand commented Feb 25, 2021

JonPsson commented Feb 26, 2021

JonPsson commented Mar 2, 2021

cuviper commented Mar 9, 2021

tstellar commented Jun 16, 2021

JonPsson commented Jun 18, 2021

tstellar commented Jun 21, 2021

cuviper commented Nov 27, 2021

* IR Dump After Finalize ISel and expand pseudo-instructions *: