Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64] Implement stack probing using helper #13519

Open
BruceForstall opened this issue Oct 2, 2019 · 6 comments
Open

[Arm64] Implement stack probing using helper #13519

BruceForstall opened this issue Oct 2, 2019 · 6 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@BruceForstall
Copy link
Member

x86/x64 was implemented with dotnet/coreclr#26807. This issue tracks doing the work to implement it for arm32/arm64.

This would provide consistency in implementation, simplicity in the JIT stack probing implementation, as well as provide the benefits of stack probing exception stack traces for arm32/arm64.

Related: https://github.com/dotnet/coreclr/issues/21061

Update: Arm32 part of the issue is addressed in dotnet/coreclr#27184
For Arm64 we decided only to fix the stack probing loop (it currently under-probes one page) without implementing the helper.

In the future, we can implement the helper by following the suggested approach dotnet/coreclr#27184 (comment) but it is out of scope for the near term future

category:implementation
theme:prolog-epilog
skill-level:intermediate
cost:medium

@echesakov
Copy link
Contributor

echesakov commented Oct 18, 2019

I was testing my implementation of stack probing using helpers on linux-arm and comparing its behavior with current implementation of the stack probing using inlined loops. I believe that the current implementation is under-probing one page.

For example, below I have a disassembly of a funclet with large outgoing argument space (32712 bytes).

(gdb) disassemble 0xaa0140d0,+50
Dump of assembler code from 0xaa0140d0 to 0xaa014102:
   0xaa0140d0:  stmdb   sp!, {r4, r10, r11, lr}
   0xaa0140d4:  movw    r3, #61440      ; 0xf000
   0xaa0140d8:  sxth    r3, r3
   0xaa0140da:  movw    r2, #32824      ; 0x8038
   0xaa0140de:  sxth    r2, r2
   0xaa0140e0:  ldr.w   r1, [sp, r3]
   0xaa0140e4:  sub.w   r3, r3, dotnet/coreclr#4096   ; 0x1000
   0xaa0140e8:  cmp     r2, r3
   0xaa0140ea:  bls.n   0xaa0140e0
=> 0xaa0140ec:  add     sp, r2
   0xaa0140ee:  add.w   r3, r11, dotnet/coreclr#8
   0xaa0140f2:  movw    r10, #32708     ; 0x7fc4
   0xaa0140f6:  str.w   r3, [sp, r10]
   0xaa0140fa:  movs    r2, #0
   0xaa0140fc:  movs    r3, #0
   0xaa0140fe:  vmov    d4, r2, r3
End of assembler dump.
(gdb) info reg r2 r3 sp
r2             0xffff8038       4294934584
r3             0xffff8000       4294934528
sp             0xbea07398       0xbea07398

The thread stack ends at 0xBEA00000.

1121:   /mnt/ssd/git/coreclr/BinDir_Linux_arm_debug/corerun GitHub_21061_StackOverflowInFuncletProlog.exe
Address   Kbytes Mode  Offset           Device    Mapping
00400000      16 r-x-- 0000000000000000 008:00001 corerun
00413000       4 r---- 0000000000003000 008:00001 corerun
00414000       4 rw--- 0000000000004000 008:00001 corerun
.
.
b6fff000       4 rw--- 0000000000019000 0b3:00002 ld-2.27.so
bea00000    6144 rw--- 0000000000000000 000:00000   [ stack ]
ffff0000       4 r-x-- 0000000000000000 000:00000   [ anon ]
mapped: 222664K    writeable/private: 50228K    shared: 74224K

Below is summary of what happens in the loop:

Initial SP at the beginning of funclet prolog 0xBEA07398
Funclet frame size is 0x7FC8 (32712 bytes)
Last probed address is 0xBEA00398 = 0xBEA07398‬ - 0x7000
First address on the last probed page 0xBEA00000
First address on the first unprobed page 0xBE9FF000‬. Note that this address doesn't belong to stack.
First address accessed after the funclet prolog 0xBE9FF3D0

The funclet segfaults in the boby of funclet

Thread 1 "corerun" received signal SIGSEGV, Segmentation fault.
0xaa014102 in ?? ()
(gdb) bt
#0  0xaa014102 in ?? ()
dotnet/coreclr#1  0xb66653a8 in CallEHFunclet () at /__w/3/s/src/vm/arm/ehhelpers.S:100
dotnet/coreclr#2  0xb6649f52 in ExceptionTracker::CallHandler (this=0x44e760, uHandlerStartPC=2852208848, sf=..., pEHClause=0x44e7b4, pMD=0xb25fb91c, funcletType=Catch, pContextRecord=0xbea07da0)
    at /__w/3/s/src/vm/exceptionhandling.cpp:3405
dotnet/coreclr#3  0xb6649ce2 in ExceptionTracker::CallCatchHandler (this=0x44e760, pContextRecord=0xbea07da0, pfAborting=0xbea077bf) at /__w/3/s/src/vm/exceptionhandling.cpp:656
dotnet/coreclr#4  0xb664b0f2 in ProcessCLRException (pExceptionRecord=0x482218, MemoryStackFp=3204440864, pContextRecord=0xbea07da0, pDispatcherContext=0xbea079fc) at /__w/3/s/src/vm/exceptionhandling.cpp:1192
dotnet/coreclr#5  0xb6650af0 in UnwindManagedExceptionPass2 (ex=..., unwindStartContext=0xbea07da0) at /__w/3/s/src/vm/exceptionhandling.cpp:4489
dotnet/coreclr#6  0xb6650f10 in UnwindManagedExceptionPass1 (ex=..., frameContext=0xbea07fe8) at /__w/3/s/src/vm/exceptionhandling.cpp:4651
dotnet/coreclr#7  0xb6651466 in DispatchManagedException (ex=..., isHardwareException=false) at /__w/3/s/src/vm/exceptionhandling.cpp:4777
dotnet/coreclr#8  0xb6566a50 in __FCThrow (__me=0x0, reKind=kDivideByZeroException, resID=0, arg1=0x0, arg2=0x0, arg3=0x0) at /__w/3/s/src/vm/fcall.cpp:56
dotnet/coreclr#9  0xb6577f6c in JIT_Div (dividend=1, divisor=0) at /__w/3/s/src/vm/jithelpers.cpp:277
dotnet/coreclr#10 0xaa0140ae in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) disassemble 0xaa0140d0,0xaa014106
Dump of assembler code from 0xaa0140d0 to 0xaa014106:
   0xaa0140d0:  stmdb   sp!, {r4, r10, r11, lr}
   0xaa0140d4:  movw    r3, #61440      ; 0xf000
   0xaa0140d8:  sxth    r3, r3
   0xaa0140da:  movw    r2, #32824      ; 0x8038
   0xaa0140de:  sxth    r2, r2
   0xaa0140e0:  ldr.w   r1, [sp, r3]
   0xaa0140e4:  sub.w   r3, r3, dotnet/coreclr#4096   ; 0x1000
   0xaa0140e8:  cmp     r2, r3
   0xaa0140ea:  bls.n   0xaa0140e0
   0xaa0140ec:  add     sp, r2
   0xaa0140ee:  add.w   r3, r11, dotnet/coreclr#8
   0xaa0140f2:  movw    r10, #32708     ; 0x7fc4
   0xaa0140f6:  str.w   r3, [sp, r10]
   0xaa0140fa:  movs    r2, #0
   0xaa0140fc:  movs    r3, #0
   0xaa0140fe:  vmov    d4, r2, r3
=> 0xaa014102:  vstr    d4, [sp]
(gdb) info reg sp
sp             0xbe9ff3d0       0xbe9ff3d0

The analysis is done on top of ef3180c

@echesakov echesakov changed the title Implement stack probing using helper for arm32/arm64 [Arm64] Implement stack probing using helper Oct 31, 2019
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@janvorli
Copy link
Member

When implementing printing stack trace at stack overflow, I have found that the fact that we don't move SP during the probing on Linux on ARM64 (as we haven't implemented the probing using helper for arm64) is causing a problem. To print the stack overflow stack, we need about 28kB of stack space. So when a stack overflow is detected in the SIGSEGV handler, we switch to a special preallocated stack of that size and run the exception handling on it. But when we hit a sigseg, we cannot get the actual stack limits as calling the function to get the limits is not allowed from an async signal handler when you don't know what code has triggered it. So we consider it to be a stack overflow based on whether the memory accessed was +/- a page around the SP.
That means that without moving SP during stack probing, we don't consider it to be a stack overflow at this point (where we run on a per-thread alternate stack for handling SIGSEGVs). So instead of switching to the special stack overflow stack, we switch back to the original stack of the thread.

Then in both cases we run the common_signal_handler that is common for all hardware exceptions. At this point, it is possible that we only have a little over one memory page of stack space left if we are executing on the original stack. That is enough for checking if we are running in managed code and if we are, we can read the actual stack limits and detect the stack overflow even for probing without the helper.
But the remaining stack size is not sufficient for printing the stack trace. Before, we were just printing the "Stack overflow" message and aborting the process. Now we actually call the hardware exception handler in runtime that checks for stack overflow and ends up calling the EEPolicy::HandleFatalStackOverflow that dumps the stack trace.

If the probing helper that moves SP while probing was implemented for arm64, this problem would go away as we would never hit this code path.

@BruceForstall BruceForstall modified the milestones: Future, 5.0 Mar 12, 2020
@AndyAyersMS
Copy link
Member

@BruceForstall I take it you think this should stay in 5.0?

@BruceForstall
Copy link
Member Author

Yes, @echesakovMSFT plans to get to it in 5.0

@echesakov
Copy link
Contributor

This should be moved to 6.0 - I won't have time to work on this

@echesakov echesakov modified the milestones: 5.0.0, 6.0.0 Jun 23, 2020
@BruceForstall BruceForstall added JitUntriaged CLR JIT issues needing additional triage and removed JitUntriaged CLR JIT issues needing additional triage labels Oct 28, 2020
@echesakov echesakov added the in-pr There is an active PR which will close this issue when it is merged label Jan 29, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 26, 2021
@echesakov
Copy link
Contributor

I implemented the functionality of stack probing with helpers on arm64 and posted the changes in https://github.com/echesakovMSFT/runtime/tree/Arm64-Implement-Jit-StackProbe-Helper
However, the further work is blocked by #47810 and given that work on that issue will not be done in 6.0 I move this issue to Future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Status: Backlog (General)
Development

Successfully merging a pull request may close this issue.

5 participants