Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nonvolatile context restoration #101709

Merged
merged 2 commits into from
Apr 30, 2024

Conversation

janvorli
Copy link
Member

@janvorli janvorli commented Apr 30, 2024

There is a possibility of a race between the
ClrRestoreNonVolatileContext and an async signal handling (like the one we use for runtime suspension). If the signal kicks in after we've loaded Rsp, but before we jumped to the target address, the context we are loading the registers from could get overwritten by the signal handler stack. So the ClrRestoreNonVolatileContext would end up jumping into a wrong target address.

The fix is to load the target address into a register before loading the Rsp and then jumping using the register.

Close #101060, #98292

There is a possibility of a race between the
ClrRestoreNonVolatileContext and an async signal handling (like
the one we use for runtime suspension). If the signal kicks in after
we've loaded Rsp, but before we jumped to the target address, the
context we are loading the registers from could get overwritten by the
signal handler stack. So the ClrRestoreNonVolatileContext would end up
jumping into a wrong target address.

The fix is to load the target address into a register before loading the
Rsp and then jumping using the register.
@janvorli janvorli added this to the 9.0.0 milestone Apr 30, 2024
@janvorli janvorli requested a review from jakobbotsch April 30, 2024 07:04
@janvorli janvorli self-assigned this Apr 30, 2024
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This seems like the most plausible explanation.
Is only x64 affected?

@janvorli
Copy link
Member Author

Actually, the other targets use RtlRestoreContext in PAL and it also has this issue on some targets (arm, x86, most likely riscv64 and loongarch64, not sure about s390x and ppc64le). Arm64 is ok. Let me add fixes for more architectures here.

@AndyAyersMS
Copy link
Member

We expect this to fix #98292 and #101060, right?

Seems like a very small race window, which jives with the fact that these are very hard to repro.

@janvorli
Copy link
Member Author

We expect this to fix #98292 and #101060, right?

I think so as they both seem to be OSR related.

@janvorli
Copy link
Member Author

@t-mustafin, @shushanhf, @vikasgupta8, @uweigand can you please check if the architectures you have added support for also need a fix like this? It seems to me that RISCV64 and LoongArch64 do need to be fixed, I have no idea about the S390x and PPC64LE.

@janvorli
Copy link
Member Author

@jakobbotsch can you please take a look again? I have added arm and x86 fixes. Arm64 was already ok. I will leave possible fixes for the other architectures on the external people who have added support for those.

@janvorli
Copy link
Member Author

The CI failure is a known issue.

Comment on lines +165 to +167
str r2, [r1, #-4]
ldr r2, [r0, #(CONTEXT_R12)]
str r2, [r1, #-8]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possibility that these stores accidentally overwrite some of the registers in the context that we are going to use later, if the context happens to be at this location in the stack? Or does the context not end with the GPR registers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context contains GPR registers first, then all the NEON ones, debug registers and two DWORDS of padding at the end. So worst case scenario, if we ever overwritten anything from the context, it could only be the padding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here:

typedef struct DECLSPEC_ALIGN(8) _CONTEXT {
//
// Control flags.
//
DWORD ContextFlags;
//
// Integer registers
//
DWORD R0;
DWORD R1;
DWORD R2;
DWORD R3;
DWORD R4;
DWORD R5;
DWORD R6;
DWORD R7;
DWORD R8;
DWORD R9;
DWORD R10;
DWORD R11;
DWORD R12;
//
// Control Registers
//
DWORD Sp;
DWORD Lr;
DWORD Pc;
DWORD Cpsr;
//
// Floating Point/NEON Registers
//
DWORD Fpscr;
DWORD Padding;
union {
NEON128 Q[16];
ULONGLONG D[32];
DWORD S[32];
};
//
// Debug registers
//
DWORD Bvr[ARM_MAX_BREAKPOINTS];
DWORD Bcr[ARM_MAX_BREAKPOINTS];
DWORD Wvr[ARM_MAX_WATCHPOINTS];
DWORD Wcr[ARM_MAX_WATCHPOINTS];
DWORD Padding2[2];
} CONTEXT, *PCONTEXT, *LPCONTEXT;

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, just one hypothetical wondering.

@janvorli janvorli merged commit 7aefd27 into dotnet:main Apr 30, 2024
87 of 89 checks passed
@janvorli janvorli deleted the fix-nonvolatile-context-restoration branch April 30, 2024 22:47
@t-mustafin
Copy link
Contributor

@t-mustafin, @shushanhf, @vikasgupta8, @uweigand can you please check if the architectures you have added support for also need a fix like this? It seems to me that RISCV64 and LoongArch64 do need to be fixed, I have no idea about the S390x and PPC64LE.

@janvorli thanks for notification. Are this changes needed on release/8.0 branch?

cc @dotnet/samsung

uweigand added a commit to uweigand/runtime that referenced this pull request May 3, 2024
Fix s390x context restoration along the lines of
dotnet#101709
@uweigand
Copy link
Contributor

uweigand commented May 3, 2024

@t-mustafin, @shushanhf, @vikasgupta8, @uweigand can you please check if the architectures you have added support for also need a fix like this? It seems to me that RISCV64 and LoongArch64 do need to be fixed, I have no idea about the S390x and PPC64LE.

Thanks for the heads-up! The s390x patch is here: #101854

michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024
* Fix nonvolatile context restoration

There is a possibility of a race between the
ClrRestoreNonVolatileContext and an async signal handling (like
the one we use for runtime suspension). If the signal kicks in after
we've loaded Rsp, but before we jumped to the target address, the
context we are loading the registers from could get overwritten by the
signal handler stack. So the ClrRestoreNonVolatileContext would end up
jumping into a wrong target address.

The fix is to load the target address into a register before loading the
Rsp and then jumping using the register.

* Fix arm and x86
michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024
Fix s390x context restoration along the lines of
dotnet#101709
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
* Fix nonvolatile context restoration

There is a possibility of a race between the
ClrRestoreNonVolatileContext and an async signal handling (like
the one we use for runtime suspension). If the signal kicks in after
we've loaded Rsp, but before we jumped to the target address, the
context we are loading the registers from could get overwritten by the
signal handler stack. So the ClrRestoreNonVolatileContext would end up
jumping into a wrong target address.

The fix is to load the target address into a register before loading the
Rsp and then jumping using the register.

* Fix arm and x86
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
Fix s390x context restoration along the lines of
dotnet#101709
@github-actions github-actions bot locked and limited conversation to collaborators Jun 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
5 participants