builtin trap placed after frame pop on Arm 32-bit #113154

DavidSpickett · 2024-10-21T11:37:12Z

Since #109628, we have had an lldb test failure on 32 bit Arm:
https://lab.llvm.org/buildbot/#/builders/18/builds/5545

# CHECK: frame #{{.*}}`g() at verbose_trap-in-stl.cpp
         ^
<stdin>:11:88: note: scanning from here
* thread #1, name = 'verbose_trap-in', stop reason = Bounds error: out-of-bounds access
                                                                                       ^
<stdin>:12:44: note: possible intended match here
 frame #2: 0x00410694 verbose_trap-in-stl.test.tmp.out`main at verbose_trap-in-stl.cpp:15:3
                                           ^
Input file: <stdin>
Check file: /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/Shell/Recognizer/verbose_trap-in-stl.test

-dump-input=help explains the following input dump.

This test checks that lldb can show the first frame of user written code when there's an error in the STL. So in this case we expect to see g() in the backtrace, rather than the operator[], or main as we now get.

This appears to be because the trap instruction generated by __builtin_trap is put after the frame information is reset:

(lldb) dis
verbose_trap-in-stl.test.tmp.out`std::__1::vector<int>::operator[]:
    0x400558 <+0>:  push   {r11, lr}
    0x40055c <+4>:  mov    r11, sp
    0x400560 <+8>:  sub    sp, sp, #8
    0x400564 <+12>: str    r0, [sp, #0x4]
    0x400568 <+16>: str    r1, [sp]
    0x40056c <+20>: mov    sp, r11
    0x400570 <+24>: pop    {r11, lr}
->  0x400574 <+28>: trap
    0x400578 <+32>: bx     lr

I can show this with another example: https://godbolt.org/z/jnGWh1Wqh

void fn() {
    __builtin_trap();
}

Produces:

fn():
  push {r11, lr}
  mov r11, sp
  pop {r11, lr}
  .inst 0xe7ffdefe
  bx lr

fn has the trap after the frame pointer and link register is reset. This is not the case on RISC-V:

fn():
  addi sp, sp, -16
  sw ra, 12(sp)
  sw s0, 8(sp)
  addi s0, sp, 16
  unimp                          <<<<<<<<< trap is here
  lw ra, 12(sp)
  lw s0, 8(sp)
  addi sp, sp, 16
  ret

And I think changing the frame information before the trap is misleading because a debugger will see a PC within fn but the unwind information will look as if it is already in the caller of fn.

I don't know right now whether the linked change has caused the problem or merely exposed the problem by changing how leaf functions are handled.

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-10-21T11:37:30Z

@llvm/issue-subscribers-backend-arm

Author: David Spickett (DavidSpickett)

Since https://github.com//pull/109628, we have had an lldb test failure on 32 bit Arm: https://lab.llvm.org/buildbot/#/builders/18/builds/5545

# CHECK: frame #{{.*}}`g() at verbose_trap-in-stl.cpp
         ^
&lt;stdin&gt;:11:88: note: scanning from here
* thread #<!-- -->1, name = 'verbose_trap-in', stop reason = Bounds error: out-of-bounds access
                                                                                       ^
&lt;stdin&gt;:12:44: note: possible intended match here
 frame #<!-- -->2: 0x00410694 verbose_trap-in-stl.test.tmp.out`main at verbose_trap-in-stl.cpp:15:3
                                           ^
Input file: &lt;stdin&gt;
Check file: /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/Shell/Recognizer/verbose_trap-in-stl.test

-dump-input=help explains the following input dump.

This test checks that lldb can show the first frame of user written code when there's an error in the STL. So in this case we expect to see g() in the backtrace, rather than the operator[], or main as we now get.

This appears to be because the trap instruction generated by __builtin_trap is put after the frame information is reset:

(lldb) dis
verbose_trap-in-stl.test.tmp.out`std::__1::vector&lt;int&gt;::operator[]:
    0x400558 &lt;+0&gt;:  push   {r11, lr}
    0x40055c &lt;+4&gt;:  mov    r11, sp
    0x400560 &lt;+8&gt;:  sub    sp, sp, #<!-- -->8
    0x400564 &lt;+12&gt;: str    r0, [sp, #<!-- -->0x4]
    0x400568 &lt;+16&gt;: str    r1, [sp]
    0x40056c &lt;+20&gt;: mov    sp, r11
    0x400570 &lt;+24&gt;: pop    {r11, lr}
-&gt;  0x400574 &lt;+28&gt;: trap
    0x400578 &lt;+32&gt;: bx     lr

I can show this with another example: https://godbolt.org/z/jnGWh1Wqh

void fn() {
    __builtin_trap();
}

Produces:

fn():
  push {r11, lr}
  mov r11, sp
  pop {r11, lr}
  .inst 0xe7ffdefe
  bx lr

fn has the trap after the frame pointer and link register is reset. This is not the case on RISC-V:

fn():
  addi sp, sp, -16
  sw ra, 12(sp)
  sw s0, 8(sp)
  addi s0, sp, 16
  unimp                          &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt; trap is here
  lw ra, 12(sp)
  lw s0, 8(sp)
  addi sp, sp, 16
  ret

And I think changing the frame information before the trap is misleading because a debugger will see a PC within fn but the unwind information will look as if it is already in the caller of fn.

I don't know right now whether the linked change has caused the problem or merely exposed the problem by changing how leaf functions are handled.

DavidSpickett · 2024-10-21T11:38:38Z

You can work around the issue by putting something after the trap:

void fn() {
    __builtin_trap();
    asm ("nop");
}

Then you get the expected order:

fn():
  push {r11, lr}
  mov r11, sp
  .inst 0xe7ffdefe
  mov r0, r0
  pop {r11, lr}
  bx lr

DavidSpickett · 2024-10-21T11:45:21Z

prologepilog for other architectures is inserting around the code of the function, but Arm is inserting before it:
https://godbolt.org/z/E3rorvnPG

ARM:

bb.0.entry:
  liveins: $lr
  $sp = frame-setup STMDB_UPD $sp(tied-def 0), 14, $noreg, killed $r11, killed $lr
  frame-setup CFI_INSTRUCTION def_cfa_offset 8
  frame-setup CFI_INSTRUCTION offset $lr, -4
  frame-setup CFI_INSTRUCTION offset $r11, -8
  $r11 = frame-setup MOVr killed $sp, 14, $noreg, $noreg
  frame-setup CFI_INSTRUCTION def_cfa_register $r11
  $sp = frame-destroy LDMIA_UPD $sp(tied-def 0), 14, $noreg, def $r11, def $lr; example.cpp:2:5

  TRAP; example.cpp:2:5

  BX_RET 14, $noreg; example.cpp:3:1

# End machine code for function fn().

RISC-V:

# Machine code for function fn(): NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten
Frame Objects:
  fi#0: size=4, align=4, at location [SP-4]
  fi#1: size=4, align=4, at location [SP-8]

bb.0.entry:
  liveins: $x1
  $x2 = frame-setup ADDI $x2, -16
  frame-setup CFI_INSTRUCTION def_cfa_offset 16
  SW killed $x1, $x2, 12 :: (store (s32) into %stack.0)
  SW killed $x8, $x2, 8 :: (store (s32) into %stack.1)
  frame-setup CFI_INSTRUCTION offset $x1, -4
  frame-setup CFI_INSTRUCTION offset $x8, -8
  $x8 = frame-setup ADDI $x2, 16
  frame-setup CFI_INSTRUCTION def_cfa $x8, 0

  UNIMP; example.cpp:2:5

  $x1 = LW $x2, 12 :: (load (s32) from %stack.0)
  $x8 = LW $x2, 8 :: (load (s32) from %stack.1)
  $x2 = frame-destroy ADDI $x2, 16; example.cpp:3:1
  PseudoRET; example.cpp:3:1

However this does not happen if the only code in the body is asm ("nop"); or some other thing that is not builtin_trap. So perhaps there is someone special about that?

…en issue Since #109628 landed, this test has been failing on 32-bit Arm. This is due to a codegen problem (whether added or uncovered by the change, not known) where the trap instruction is placed after the frame pointer and link register are restored. #113154 So the code was: ``` std::__1::vector<int>::operator[](unsigned int): sub sp, sp, #8 str r0, [sp, #4] str r1, [sp] add sp, sp, #8 .inst 0xe7ffdefe bx lr ``` When lldb saw the trap, the PC was inside operator[] but the frame information actually pointed to g. This bug only happens for leaf functions so adding a return type works around it: ``` std::__1::vector<int>::operator[](unsigned int): push {r11, lr} mov r11, sp sub sp, sp, #8 str r0, [sp, #4] str r1, [sp] mov sp, r11 pop {r11, lr} .inst 0xe7ffdefe bx lr ``` (and operator[] should return T& anyway) Now the PC location and frame information should match and the test passes.

guoxin049 · 2024-10-21T12:10:59Z

I'm not sure how you're building it, but if it's a leaf function, you could try eliminating the frame pointer (FP) by using the -momit-leaf-frame-pointer option. If you want to remove the frame pointer entirely, the -fomit-frame-pointer option can be used to eliminate it across the board. like this: https://godbolt.org/z/4vKnKvbj6

DavidSpickett · 2024-10-21T12:30:36Z

I'm building with: clang <file> -O0 -g

No explicit options related to frame pointers. Yes adding that flag helps, but in the case of the lldb test case I was able to add a return type to make it non-leaf and avoid this.

guoxin049 · 2024-10-21T13:35:31Z

#109628 The behavior for non-leaf functions remains consistent before and after the patch modification, so I don't believe this modification introduced a bug. like this : https://godbolt.org/z/vrqn71qKa

DavidSpickett · 2024-10-21T13:52:37Z

Right, this has just revealed what was already there by adding frame information to the function in question. As long as no one backtraces it, it would work fine it's only because lldb inspected it that we found this.

Without frame pointers I can still see the trap being placed after stack pointer restore code:

void gn() {
    volatile int i = 5;
    // As long as the trap is last, it'll be placed
    // after the pop.
    __builtin_trap();
}

clang -O0 -g -fomit-frame-pointer:

gn:
        sub     sp, sp, #4
        mov     r0, #5
        str     r0, [sp]
        add     sp, sp, #4         << destroy frame
        .inst   0xe7ffdefe         << trap is after that
        bx      lr

From CE's pipeline viewer:

bb.0.entry:
  $sp = frame-setup SUBri killed $sp, 4, 14, $noreg, $noreg
  frame-setup CFI_INSTRUCTION def_cfa_offset 4
  renamable $r0 = MOVi 5, 14, $noreg, $noreg
  STRi12 killed renamable $r0, $sp, 0, 14, $noreg :: (volatile store (s32) into %ir.i); example.c:7:18
  $sp = frame-destroy ADDri killed $sp, 4, 14, $noreg, $noreg; example.c:10:5
  TRAP; example.c:10:5
  BX_RET 14, $noreg; example.c:11:1

# End machine code for function gn.

Again it's inserting the frame destroy before the trap. Does not do this for the RISC-V equivalent options:

gn:
        addi    sp, sp, -16
        li      a0, 5
        sw      a0, 12(sp)
        unimp                       << trap
        addi    sp, sp, 16       << destroy frame
        ret

And this happens prior to your PR.

efriedma-quic · 2024-10-21T21:50:29Z

I think the problem is that the Arm "trap" instruction is marked "isTerminator = 1" in ARMInstrInfo.td. No other target does this, as far as I can tell, precisely because it leads to this sort of weird result.

void fn() {
    __builtin_trap();
    asm ("nop");
}

This actually fails MachineVerifier: "Bad machine code: Non-terminator instruction after the first terminator".

Fixes llvm#113154 The encodings used for llvm.trap() on ARM were all marked as barriers and terminators. This lead to stack frame destroy code being inserted before the trap if the trap was the last thing in the function and it had no return statement. ``` void fn() { volatile int i = 0; __builtin_trap(); } ``` Produced: ``` fn: push {r11, lr} << stack frame create <...> pop {r11, lr} << stack frame destroy .inst 0xe7ffdefe << trap bx lr ``` All the other targets don't mark them this way, instead they mark them with isTrap. I've changed ARM to do this, which fixes the code generation: ``` fn: push {r11, lr} << stack frame create <...> .inst 0xe7ffdefe << trap pop {r11, lr} << stack frame destroy bx lr ``` I've updated the existing trap test to force the need for a stack frame, then check that the instruction immediately after the trap is resetting the stack pointer.

Fixes llvm#113154 The encodings used for llvm.trap() on ARM were all marked as barriers and terminators. This lead to stack frame destroy code being inserted before the trap if the trap was the last thing in the function and it had no return statement. ``` void fn() { volatile int i = 0; __builtin_trap(); } ``` Produced: ``` fn: push {r11, lr} << stack frame create <...> mov sp, r11 pop {r11, lr} << stack frame destroy .inst 0xe7ffdefe << trap bx lr ``` All the other targets don't mark them this way, instead they mark them with isTrap. I've changed ARM to do this, which fixes the code generation: ``` fn: push {r11, lr} << stack frame create <...> .inst 0xe7ffdefe << trap mov sp, r11 pop {r11, lr} << stack frame destroy bx lr ``` I've updated the existing trap test to force the need for a stack frame, then check that the instruction immediately after the trap is resetting the stack pointer.

…en issue Since llvm#109628 landed, this test has been failing on 32-bit Arm. This is due to a codegen problem (whether added or uncovered by the change, not known) where the trap instruction is placed after the frame pointer and link register are restored. llvm#113154 So the code was: ``` std::__1::vector<int>::operator[](unsigned int): sub sp, sp, llvm#8 str r0, [sp, #4] str r1, [sp] add sp, sp, llvm#8 .inst 0xe7ffdefe bx lr ``` When lldb saw the trap, the PC was inside operator[] but the frame information actually pointed to g. This bug only happens for leaf functions so adding a return type works around it: ``` std::__1::vector<int>::operator[](unsigned int): push {r11, lr} mov r11, sp sub sp, sp, llvm#8 str r0, [sp, #4] str r1, [sp] mov sp, r11 pop {r11, lr} .inst 0xe7ffdefe bx lr ``` (and operator[] should return T& anyway) Now the PC location and frame information should match and the test passes.

Fixes #113154 The encodings used for llvm.trap() on ARM were all marked as barriers and terminators. This lead to stack frame destroy code being inserted before the trap if the trap was the last thing in the function and it had no return statement. ``` void fn() { volatile int i = 0; __builtin_trap(); } ``` Produced: ``` fn: push {r11, lr} << stack frame create <...> mov sp, r11 pop {r11, lr} << stack frame destroy .inst 0xe7ffdefe << trap bx lr ``` All the other targets don't mark them this way, instead they mark them with isTrap. I've changed ARM to do this, which fixes the code generation: ``` fn: push {r11, lr} << stack frame create <...> .inst 0xe7ffdefe << trap mov sp, r11 pop {r11, lr} << stack frame destroy bx lr ``` I've updated the existing trap test to force the need for a stack frame, then check that the instruction immediately after the trap is resetting the stack pointer. debugtrap was already working but I've added the same checks for it anyway.

DavidSpickett added backend:ARM miscompilation labels Oct 21, 2024

DavidSpickett changed the title ~~builtin trap placed after frame pop on Arm 32 bit~~ builtin trap placed after frame pop on Arm 32-bit Oct 21, 2024

DavidSpickett mentioned this issue Oct 21, 2024

[ARM] Fix -mno-omit-leaf-frame-pointer flag doesn't works on 32-bit ARM #109628

Merged

DavidSpickett mentioned this issue Oct 22, 2024

[llvm][ARM] Correct the properties of trap instructions #113287

Merged

DavidSpickett closed this as completed in #113287 Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

builtin trap placed after frame pop on Arm 32-bit #113154

builtin trap placed after frame pop on Arm 32-bit #113154

DavidSpickett commented Oct 21, 2024

llvmbot commented Oct 21, 2024

DavidSpickett commented Oct 21, 2024

DavidSpickett commented Oct 21, 2024

guoxin049 commented Oct 21, 2024

DavidSpickett commented Oct 21, 2024

guoxin049 commented Oct 21, 2024

DavidSpickett commented Oct 21, 2024 •

edited

Loading

efriedma-quic commented Oct 21, 2024

builtin trap placed after frame pop on Arm 32-bit #113154

builtin trap placed after frame pop on Arm 32-bit #113154

Comments

DavidSpickett commented Oct 21, 2024

llvmbot commented Oct 21, 2024

DavidSpickett commented Oct 21, 2024

DavidSpickett commented Oct 21, 2024

guoxin049 commented Oct 21, 2024

DavidSpickett commented Oct 21, 2024

guoxin049 commented Oct 21, 2024

DavidSpickett commented Oct 21, 2024 • edited Loading

efriedma-quic commented Oct 21, 2024

DavidSpickett commented Oct 21, 2024 •

edited

Loading