-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SystemZ stack temporary overflow #48666
Comments
assigned to @JonPsson |
Sorry, I misspoke -- the latter move is part of the spilled argument for the call, not %1. |
The 12-byte allocation comes from SystemZTargetLowering::LowerCall here:
I looked around for other targets that do this for Indirect args. X86 does create stack temporaries for !isByVal args, but using VA.getValVT(), which I think means it's using a separate temporary for each piece of the split arg. However, RISCVTargetLowering::LowerCall is pretty much the same as SystemZ:
RISCV appears fine with the i96 test, as it just passes that in two registers, but it does have a problem if I bump that to i160:
fn2: # @fn2 %bb.0: # %start
.Lfunc_end0: So the -1 "sw a0, 20(sp)" is clobbered by the 0 "sd a0, 16(sp)". |
This does indeed look like a SystemZ back-end bug. The problem is the way common code handles large integer types: they are split into "register-sized" pieces. If the size of the type is not a multiple of the register size, then common code implicitly extends the type to the next-larger type that is. In this case, this means the i96 input type is implicitly extended to i128, which is then split into two i64 pieces. However, according to the SystemZ ABI integer types larger than a register are actually supposed to be passed via implicit reference, so the back-end attempts to undo the split into pieces done by common code. The current back-end code for this works correctly only if no such extension has taken place. This used to be OK since the only type this code path was really ever intended to be used for was the i128 case (which needs to be passed via implicit reference for compatibility with GCC). So in the end this is now a question of defining and then implementing a proper ABI for these other large integer types. I guess there's two "obvious" possibilities:
Variant 1) might be considered more natural and uses less stack space, but is more difficult and inefficient to implement (on a big-endian system we'd have to load/store parts of the first piece, and then load/store the remaining pieces at "odd" adjusted offsets). Variant 2) seems straightforward to implement, the only change to existing code would be to allocate a larger stack slot. So I think this is what I'd prefer to do. Jonas, can you have a try at implementing this? |
Thanks for the reduced test case! Suggested patch at https://reviews.llvm.org/D97514. |
52bbbf4 |
Thank you both! I have confirmed that this does fix the issues seen in Rust #80810. I also opened bug 49500 for the similar RISCV bug. |
Hi Jonas, What is your opinion on backporting this? https://reviews.llvm.org/rG52bbbf4d4459239e0f461bc302ada89e2c5d07fc |
That seems like a good idea... |
Merged: 0193a7d |
mentioned in issue #50032 |
Extended Description
This bug was reduced from one of the failures in Rust #80810:
rust-lang/rust#80810
When a large integer argument on s390x is converted to indirect, but the type is not a multiple of 64 bits, the writes to the stack are all still in 64-bit chunks and may clobber neighboring values on the stack.
arg-i96.ll
target datalayout = "E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64"
target triple = "s390x-unknown-linux-gnu"
declare hidden void @fn1(i96) unnamed_addr
define hidden i32 @fn2() unnamed_addr {
start:
%0 = alloca i32, align 4
store i32 -1, i32* %0, align 4
call void @fn1(i96 0)
%1 = load i32, i32* %0, align 4
ret i32 %1
}
llc -O0
fn2: # @fn2
.cfi_startproc
%bb.0: # %start
.Lfunc_end0:
.size fn2, .Lfunc_end0-fn2
.cfi_endproc
# -- End function
.hidden fn1
.section ".note.GNU-stack","",@progbits
In this reproducer, the 32-bit store to %0 -- mvhi 172(%r15), -1 -- is immediately overwritten by the overflowing 64-bit store to the end of %1 -- mvghi 168(%r15), 0.
With --print-after-all, you can also see the 12-byte (96-bit) frame allocation with two 8-byte writes.
*** IR Dump After Finalize ISel and expand pseudo-instructions ***:
Machine code for function fn2: IsSSA, TracksLiveness
Frame Objects:
fi#0: size=4, align=4, at location [SP]
fi#1: size=12, align=8, at location [SP]
bb.0.start:
MVHI %stack.0, 0, -1 :: (store 4 into %ir.0)
ADJCALLSTACKDOWN 0, 0
MVGHI %stack.1, 8, 0 :: (store 8 into %stack.1)
MVGHI %stack.1, 0, 0 :: (store 8 into %stack.1)
%0:gr64bit = LA %stack.1, 0, $noreg
$r2d = COPY %0:gr64bit
CallBRASL @fn1, $r2d, <regmask $f8d $f9d $f10d $f11d $f12d $f13d $f14d $f15d $f8q $f9q $f12q $f13q $f8s $f9s $f10s $f11s $f12s $f13s $f14s $f15s $r6d $r7d $r8d $r9d $r10d $r11d $r12d $r13d $r14d $r15d $r6h $r7h $r8h and 22 more...>, implicit-def dead $r14d, implicit-def dead $cc, implicit $fpc
ADJCALLSTACKUP 0, 0
%1:gr32bit = L %stack.0, 0, $noreg :: (dereferenceable load 4 from %ir.0)
$r2l = COPY %1:gr32bit
Return implicit $r2l
End machine code for function fn2.
The text was updated successfully, but these errors were encountered: