Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control flow guard support in the JIT #59190

Closed
MichalStrehovsky opened this issue Sep 16, 2021 · 14 comments · Fixed by #63763
Closed

Control flow guard support in the JIT #59190

MichalStrehovsky opened this issue Sep 16, 2021 · 14 comments · Fixed by #63763
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@MichalStrehovsky
Copy link
Member

MichalStrehovsky commented Sep 16, 2021

For NativeAOT we would like RyuJIT to be able to optionally generate code that performs control flow guard checks. The requirement is that whenever we do an indirect method call (or an equivalent crossmethod jump), we do it in a way that gives the OS a chance to terminate the process if the destination is invalid.

We already did the work in NativeAOT to tell to the OS all valid call targets. We now need to ensure the code generated by RyuJIT checks with the OS before doing indirect calls.

We basically need to transform this:

movl  $_target_func, %rax
calll *%rax

Into this:

movl  $_target_func, %rcx
calll *___guard_check_icall_fptr
calll *%rcx

This would involve:

  • Virtual calls through vtable
  • Virtual stub dispatch
  • Calli
  • Delegate invoke
  • NativeAOT fat function pointer calls

The OS provides two kinds of helpers:

  • X64 only: A dispatch helper that expects the destination address in RAX and all arguments in their appropriate registers. It validates the pointer in RAX and dispatches to the address if it’s valid. It reserves the right to clobber R10 and R11. It fails fast if the target is not valid. (LdrpDispatchUserCallTarget for those with Windows source access)
  • All architectures: A validation helper that expects the destination in the first argument register. If the address is invalid, it fails fast (LdrpValidateUserCallTarget for those with Windows source access)

We could provide these helpers directly as CORINFO_HELP_FUNC_XYZ or provide a wrapper of those that might be more convenient (I think CRT wraps the validation helper in something that preserves argument registers for the purposes for C++ vtable calls, for example).

The validation helper has a custom calling convention that varies greatly depending on the platform. At this point, we only care about CFG support in x64 and ARM64. For ARM64: helper expects target address in x15; apart from the usual regs, also preserves x0-x8, q0-q7. For x64: helper expects target address in RCX; apart from the usual regs, also preserves RCX and R10.

Initially, I wanted to focus on x64 and use the dispatch helper. It feels like the X64 only dispatch helper will be a nice perf improvement, but as a baseline, I think we’ll need support for the validation helper first (the dispatch helper looks incompatible with VSD because it can clobber the secret parameter so we can’t use it for everything, even on x64).

I’m ready to do the implementation of this, but I’ll need some guidance as to where to architecturally fit this in within RyuJIT.

Cc @dotnet/jit-contrib

@MichalStrehovsky MichalStrehovsky added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 16, 2021
@ghost
Copy link

ghost commented Sep 16, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

For NativeAOT we would like RyuJIT to be able to optionally generate code that performs control flow guard checks. The requirement is that whenever we do an indirect method call (or an equivalent crossmethod jump), we do it in a way that gives the OS a chance to terminate the process if the destination is invalid.

We already did the work in NativeAOT to tell to the OS all valid call targets. We now need to ensure the code generated by RyuJIT checks with the OS before doing indirect calls.

We basically need to transform this:

movl  $_target_func, %rax
calll *%rax

Into this:

movl  $_target_func, %rcx
calll *___guard_check_icall_fptr
calll *%rcx

This would involve:

  • Virtual calls through vtable
  • Virtual stub dispatch
  • Calli
  • Delegate invoke
  • NativeAOT fat function pointer calls

The OS provides two kinds of helpers:

  • X64 only: A dispatch helper that expects the destination address in RAX and all arguments in their appropriate registers. It validates the pointer in RAX and dispatches to the address if it’s valid. It reserves the right to clobber R10 and R11. It fails fast if the target is not valid. (LdrpDispatchUserCallTarget for those with Windows source access)
  • All architectures: A validation helper that expects the destination in the first argument register. (LdrpValidateUserCallTarget for those with Windows source access)

We could provide these helpers directly as CORINFO_HELP_FUNC_XYZ or provide a wrapper of those that might be more convenient (I think CRT wraps the validation helper in something that preserves argument registers for the purposes for C++ vtable calls, for example).

The validation helper has a custom calling convention that varies greatly depending on the platform. At this point, we only care about CFG support in x64 and ARM64. For ARM64: helper expects target address in x15; apart from the usual regs, also preserves x0-x8, q0-q7. For x64: helper expects target address in RCX; apart from the usual regs, also preserves RCX and R10.

Initially, I wanted to focus on x64 and use the dispatch helper. It feels like the X64 only dispatch helper will be a nice perf improvement, but as a baseline, I think we’ll need support for the validation helper first (the dispatch helper looks incompatible with VSD because it can clobber the secret parameter so we can’t use it for everything, even on x64).

I’m ready to do the implementation of this, but I’ll need some guidance as to where to architecturally fit this in within RyuJIT.

Cc @dotnet/jit-contrib

Author: MichalStrehovsky
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Sep 16, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Sep 16, 2021
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Sep 16, 2021
@jakobbotsch
Copy link
Member

Just to be clear, is the transformation we need to do on x64 (in Intel syntax):

call <any addressing mode/reg>

to

mov rcx, <any addressing mode/reg>
call [__guard_check_icall_fptr]
call rcx

? How does that work out when rcx contains the first register argument?

as a baseline, I think we’ll need support for the validation helper first (the dispatch helper looks incompatible with VSD because it can clobber the secret parameter so we can’t use it for everything, even on x64).

I don't think I understand what work is needed in the JIT for the VSD case. Won't it be the VSD stub that handles verifying the function pointer anyway? I guess if the call to the VSD itself is indirect it needs to be checked, but maybe it is possible to guarantee that the call to the VSD itself is always direct (for simple cases) or was otherwise verified before-hand (for runtime lookup cases) in NativeAOT?

X64 only: A dispatch helper that expects the destination address in RAX

FWIW, in the Windows source I do see a variant of the dispatch helper for ARM64. Is it not available?

I’m ready to do the implementation of this, but I’ll need some guidance as to where to architecturally fit this in within RyuJIT.

I think for the dispatch helper it is simple enough: move the target to the normal arg list and add it as a non-standard arg that is put in x9/rax.

If we need to use the validation helper then if I understand correctly we will need to set up args to the actual call after validation has succeeded, so this should probably be expanded into an additional call IR node.

@sylveon
Copy link
Contributor

sylveon commented Sep 16, 2021

FWIW, in the Windows source I do see a variant of the dispatch helper for ARM64. Is it not available?

MSVC doesn't seem to use it: https://godbolt.org/z/GjGoac1s5. I'd generally follow what MSVC does here.

@jakobbotsch
Copy link
Member

I'd generally follow what MSVC does here.

If the dispatch helper is available and results in smaller and more efficient code then we should use it.

@sylveon
Copy link
Contributor

sylveon commented Sep 16, 2021

Sure, but this is also what MSVC strives to achieve yet it doesn't use the dispatch helper, so there must be some reason.

@jkotas
Copy link
Member

jkotas commented Sep 16, 2021

The dispatch helper reduces effectiveness of indirect branch predictor. Indirect branch predictor tries to remember the indirection that the given callsite went through last time or last few times (this information is attached to the callsite address). If all indirect calls go through the same callsite, the indirect branch predictor will often get lost.

@BruceForstall
Copy link
Member

Make sure to add a detailed description to https://github.com/dotnet/runtime/blob/main/docs/design/coreclr/botr/clr-abi.md about the ABI requirements (or options). Maybe even do this first, before the implementation.

@jakobbotsch jakobbotsch self-assigned this Jan 8, 2022
@jakobbotsch
Copy link
Member

The choice of taking the call target in rcx on x64 for the validator is quite unfortunate, since it often forces the JIT to save and restore the first argument to the actual call somewhere. Typically this results in diffs like the following:

+       mov      gword ptr [rsp+20H], rcx
+       mov      rcx, qword ptr [(reloc)]
+       ; gcrRegs -[rcx]
+       call     CORINFO_HELP_VALIDATE_INDIRECT_CALL
+       ; gcr arg pop 0
+       mov      rax, rcx
+       mov      rcx, gword ptr [rsp+20H]
+       ; gcrRegs +[rcx]
        cmp      dword ptr [rcx], ecx
-       call     [System.Data.DataTable:get_ParentRelations():System.Data.DataRelationCollection:this]
+       call     rax ; System.Data.DataTable:get_ParentRelations():System.Data.DataRelationCollection:this
        ; gcrRegs -[rcx] +[rax]
        ; gcr arg pop 0
-       mov      rcx, rax
+       mov      rbp, rax
+       ; gcrRegs +[rbp]
+       mov      rcx, qword ptr [(reloc)]
+       call     CORINFO_HELP_VALIDATE_INDIRECT_CALL
+       ; gcrRegs -[rax]
+       ; gcr arg pop 0
+       mov      rax, rcx
+       mov      rcx, rbp
        ; gcrRegs +[rcx]
        mov      rdx, rdi
        ; gcrRegs +[rdx]
        lea      r11, [(reloc)]
        cmp      dword ptr [rcx], ecx
-       call     [r11]hackishModuleName:hackishMethodName()
-       ; gcrRegs -[rcx rdx rdi]
+       call     rax ; hackishModuleName:hackishMethodName()
+       ; gcrRegs -[rcx rdx rbp rdi] +[rax]
        ; gcr arg pop 0
        mov      rdx, rax

As mentioned in the original post we will only need this scheme for VSD calls, but the impact still seems quite significant.
It would be ideal for us if there was a validate helper that took the call target in rax and preserved rax and rcx instead of taking the call target in rcx and preserving rcx and r10. We could write such an adapter but this adds an extra level of indirection to every validation to shuffle the registers around. Not totally sure what the best way to proceed here is.

@jkotas
Copy link
Member

jkotas commented Jan 12, 2022

We could write such an adapter

Is it possible to write the helper that does it directly, that is not an adapter? I understand that these helpers are special and so it may not be an ordinary piece of assembly code.

@sylveon
Copy link
Contributor

sylveon commented Jan 12, 2022

As mentioned in the original post we will only need this scheme for VSD calls, but the impact still seems quite significant.
It would be ideal for us if there was a validate helper that took the call target in rax and preserved rax and rcx instead of taking the call target in rcx and preserving rcx and r10. We could write such an adapter but this adds an extra level of indirection to every validation to shuffle the registers around. Not totally sure what the best way to proceed here is.

On x64 there is LdrpDispatchUserCallTarget/__guard_dispatch_icall_fptr. It takes the target function in rax and will call the function by itself if valid, so most of the time it ends up replacing indirect calls by simply the following code, which isn't as big of a diff.

mov rax, target
rex_jmp QWORD PTR [__guard_dispatch_icall_fptr]

@jakobbotsch
Copy link
Member

@sylveon The dispatch helper clobbers r11 containing the indirection cell so we cannot use it for VSD calls. Indeed for non-VSD calls the situation is not as bad.

@MichalStrehovsky
Copy link
Member Author

MSVC has an adapter like this, but I couldn't figure out a way to trigger MSVC to use it (I though /D2guardcfgdispatch- would do the trick, but it doesn't): https://devdiv.visualstudio.com/DevDiv/_git/msvc?path=/src/vctools/crt/vcstartup/src/misc/amd64/cfgcheckthunk.asm

It's not particularly efficient, but at least helps with code size.

@jakobbotsch
Copy link
Member

There is still a question whether we even need any CFG check for VSD calls. It seems to me it would be better to have indirection cells go under the W^X umbrella and be mapped as read-only with the necessary double mapping when they need to be changed.

@jkotas
Copy link
Member

jkotas commented Jan 13, 2022

I would start with the simplest approach. I think it means doing the spilling/reloading in the JITed code. The security super hardened mode is not a high-performance mode. It is fine for the code to be bigger when it is turned on.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 13, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 8, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Mar 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants