-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nonvolatile context restoration #101709
Fix nonvolatile context restoration #101709
Conversation
There is a possibility of a race between the ClrRestoreNonVolatileContext and an async signal handling (like the one we use for runtime suspension). If the signal kicks in after we've loaded Rsp, but before we jumped to the target address, the context we are loading the registers from could get overwritten by the signal handler stack. So the ClrRestoreNonVolatileContext would end up jumping into a wrong target address. The fix is to load the target address into a register before loading the Rsp and then jumping using the register.
Tagging subscribers to this area: @mangod9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This seems like the most plausible explanation.
Is only x64 affected?
Actually, the other targets use RtlRestoreContext in PAL and it also has this issue on some targets (arm, x86, most likely riscv64 and loongarch64, not sure about s390x and ppc64le). Arm64 is ok. Let me add fixes for more architectures here. |
@t-mustafin, @shushanhf, @vikasgupta8, @uweigand can you please check if the architectures you have added support for also need a fix like this? It seems to me that RISCV64 and LoongArch64 do need to be fixed, I have no idea about the S390x and PPC64LE. |
@jakobbotsch can you please take a look again? I have added arm and x86 fixes. Arm64 was already ok. I will leave possible fixes for the other architectures on the external people who have added support for those. |
The CI failure is a known issue. |
str r2, [r1, #-4] | ||
ldr r2, [r0, #(CONTEXT_R12)] | ||
str r2, [r1, #-8] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any possibility that these stores accidentally overwrite some of the registers in the context that we are going to use later, if the context happens to be at this location in the stack? Or does the context not end with the GPR registers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context contains GPR registers first, then all the NEON ones, debug registers and two DWORDS of padding at the end. So worst case scenario, if we ever overwritten anything from the context, it could only be the padding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here:
runtime/src/coreclr/pal/inc/pal.h
Lines 1739 to 1797 in 017593d
typedef struct DECLSPEC_ALIGN(8) _CONTEXT { | |
// | |
// Control flags. | |
// | |
DWORD ContextFlags; | |
// | |
// Integer registers | |
// | |
DWORD R0; | |
DWORD R1; | |
DWORD R2; | |
DWORD R3; | |
DWORD R4; | |
DWORD R5; | |
DWORD R6; | |
DWORD R7; | |
DWORD R8; | |
DWORD R9; | |
DWORD R10; | |
DWORD R11; | |
DWORD R12; | |
// | |
// Control Registers | |
// | |
DWORD Sp; | |
DWORD Lr; | |
DWORD Pc; | |
DWORD Cpsr; | |
// | |
// Floating Point/NEON Registers | |
// | |
DWORD Fpscr; | |
DWORD Padding; | |
union { | |
NEON128 Q[16]; | |
ULONGLONG D[32]; | |
DWORD S[32]; | |
}; | |
// | |
// Debug registers | |
// | |
DWORD Bvr[ARM_MAX_BREAKPOINTS]; | |
DWORD Bcr[ARM_MAX_BREAKPOINTS]; | |
DWORD Wvr[ARM_MAX_WATCHPOINTS]; | |
DWORD Wcr[ARM_MAX_WATCHPOINTS]; | |
DWORD Padding2[2]; | |
} CONTEXT, *PCONTEXT, *LPCONTEXT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me, just one hypothetical wondering.
@janvorli thanks for notification. Are this changes needed on release/8.0 branch? cc @dotnet/samsung |
Fix s390x context restoration along the lines of dotnet#101709
Thanks for the heads-up! The s390x patch is here: #101854 |
Fix s390x context restoration along the lines of #101709
* Fix nonvolatile context restoration There is a possibility of a race between the ClrRestoreNonVolatileContext and an async signal handling (like the one we use for runtime suspension). If the signal kicks in after we've loaded Rsp, but before we jumped to the target address, the context we are loading the registers from could get overwritten by the signal handler stack. So the ClrRestoreNonVolatileContext would end up jumping into a wrong target address. The fix is to load the target address into a register before loading the Rsp and then jumping using the register. * Fix arm and x86
Fix s390x context restoration along the lines of dotnet#101709
* Fix nonvolatile context restoration There is a possibility of a race between the ClrRestoreNonVolatileContext and an async signal handling (like the one we use for runtime suspension). If the signal kicks in after we've loaded Rsp, but before we jumped to the target address, the context we are loading the registers from could get overwritten by the signal handler stack. So the ClrRestoreNonVolatileContext would end up jumping into a wrong target address. The fix is to load the target address into a register before loading the Rsp and then jumping using the register. * Fix arm and x86
Fix s390x context restoration along the lines of dotnet#101709
There is a possibility of a race between the
ClrRestoreNonVolatileContext and an async signal handling (like the one we use for runtime suspension). If the signal kicks in after we've loaded Rsp, but before we jumped to the target address, the context we are loading the registers from could get overwritten by the signal handler stack. So the ClrRestoreNonVolatileContext would end up jumping into a wrong target address.
The fix is to load the target address into a register before loading the Rsp and then jumping using the register.
Close #101060, #98292