Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX-512 debugger support: breakpoints #87843

Closed
Tracked by #77034
BruceForstall opened this issue Jun 20, 2023 · 6 comments · Fixed by #89705
Closed
Tracked by #77034

AVX-512 debugger support: breakpoints #87843

BruceForstall opened this issue Jun 20, 2023 · 6 comments · Fixed by #89705
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture
Milestone

Comments

@BruceForstall
Copy link
Member

BruceForstall commented Jun 20, 2023

As part of implementing AVX-512 support (link), we should support breakpoints on AVX-512 instructions, especially newly supported EVEX encoded instructions.

Currently, consider the following test case:

using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;

namespace avx512
{
    internal class Program
    {
        [MethodImpl(MethodImplOptions.NoInlining)]
        static void Print(float f)
        {
            Console.WriteLine(f);
        }

        static void Main(string[] args)
        {
            Vector512<float> v2 = Vector512.Create(17.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f, 10.0f, 11.0f, 12.0f, 13.0f, 14.0f, 15.0f, 16.0f);  // ***** set BP here
            Print(v2.GetElement(15));
        }
    }
}

If you build it (either Debug or Release in Visual Studio) and run it, it prints 16. If you set a breakpoint on the indicated line, then run it in VS, hit the breakpoint, then continue, it prints 0. The act of setting, hitting, and running after a breakpoint in the debugger causes program behavior to differ.

Setting the breakpoint that location actually sets it on this instruction:

00007FFCD6977091 62 F1 7C 48 10 05 25 00 00 00 vmovups     zmm0,zmmword ptr [avx512.Program.Main(System.String[])+040h (07FFCD69770C0h)]  

This is a RIP-relative read instruction.

When the debugger sets a breakpoint, it copies the instruction to a "patch" location. After the breakpoint, it executes the instruction from the patch location. Thus, it needs special handling for RIP-relative addressing, as the executed code RIP will be different from when the code was generated. The debugger disassembles the instruction to determine if it uses RIP-relative addressing. If so, it updates the RIP-relative address in the patch to point to the patch buffer with additional space for data, which is copied from the original location. For write operations, the data is copied back to the original location after the instruction is executed.

The debugger figures out characteristics of the instruction, such as whether it contains RIP-relative addressing, using NativeWalker::DecodeInstructionForPatchSkip and the tables that were introduced with dotnet/coreclr#25958: see https://github.com/dotnet/runtime/tree/main/src/coreclr/debug/ee/amd64/gen_amd64InstrDecode.

These tables understand up to VEX encodings, but do not understand EVEX encodings. Either they need to be updated, or some other mechanism needs to be introduced in NativeWalker::DecodeInstructionForPatchSkip for EVEX instructions.

Note that the JIT currently generates RIP-relative reads for 512-bit vector constants in the data section. It generally does not generate RIP-relative reads or writes to class static variables (where it is expected it would generate RIP-relative addressing) because 512-bit vector statics get placed in the Frozen (non GC) heap (so they don't move). Currently, that heap is normally too far away from the generated code heap to allow for RIP-relative addressing. However, that could change (and it is a goal to change it, as with #78292, which was reverted). And if they are placed closer to the JIT, RIP-relative addressing will automatically be enabled.

Thus, for RIP-relative reads on which breakpoints are set, executing in the debugger will currently load random data, leading to incorrect program behavior. For RIP-relative writes (which probably don't currently occur), arbitrary data corruption could occur.

@BruceForstall BruceForstall added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture labels Jun 20, 2023
@BruceForstall BruceForstall added this to the 8.0.0 milestone Jun 20, 2023
@ghost
Copy link

ghost commented Jun 20, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

As part of implementing AVX-512 support (link), we should support breakpoints on AVX-512 instructions, especially newly supported EVEX encoded instructions.

Currently, consider the following test case:

using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;

namespace avx512
{
    internal class Program
    {
        [MethodImpl(MethodImplOptions.NoInlining)]
        static void Print(float f)
        {
            Console.WriteLine(f);
        }

        static void Main(string[] args)
        {
            Vector512<float> v2 = Vector512.Create(17.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f, 10.0f, 11.0f, 12.0f, 13.0f, 14.0f, 15.0f, 16.0f);  // ***** set BP here
            Print(v2.GetElement(15));
        }
    }
}

If you build it (either Debug or Release in Visual Studio) and run it, it prints 16. If you set a breakpoint on the indicated line, then run it in VS, hit the breakpoint, then continue, it prints 0. The act of setting, hitting, and running after a breakpoint in the debugger causes program behavior to differ.

Setting the breakpoint that location actually sets it on this instruction:

00007FFCD6977091 62 F1 7C 48 10 05 25 00 00 00 vmovups     zmm0,zmmword ptr [avx512.Program.Main(System.String[])+040h (07FFCD69770C0h)]  

This is a RIP-relative read instruction.

When the debugger sets a breakpoint, it copies the instruction to a "patch" location. After the breakpoint, it executes the instruction from the patch location. Thus, it needs special handling for RIP-relative addressing, as the executed code RIP will be different from when the code was generated. The debugger disassembles the instruction to determine if it uses RIP-relative addressing. If so, it updates the RIP-relative address in the patch to point to the patch buffer with additional space for data, which is copied from the original location. For write operations, the data is copied back to the original location after the instruction is executed.

The debugger figures out characteristics of the instruction, such as whether it contains RIP-relative addressing, using NativeWalker::DecodeInstructionForPatchSkip and the tables that were introduced with dotnet/coreclr#25958: see https://github.com/dotnet/runtime/tree/main/src/coreclr/debug/ee/amd64/gen_amd64InstrDecode.

These tables understand up to VEX encodings, but do not understand EVEX encodings. Either they need to be updated, or some other mechanism needs to be introduced in NativeWalker::DecodeInstructionForPatchSkip for EVEX instructions.

Note that the JIT currently generates RIP-relative reads for 512-bit vector constants in the data section. It generally does not generate RIP-relative reads or writes to class static variables (where it is expected it would generate RIP-relative addressing) because 512-bit vector statics get placed in the Frozen (non GC) heap (so they don't move). Currently, that heap is normally too far away from the generated code heap to allow for RIP-relative addressing. However, that could change (and it is a goal to change it, as with #78292, which was reverted). And if they are placed closer to the JIT, RIP-relative addressing will automatically be enabled.

Author: BruceForstall
Assignees: -
Labels:

area-CodeGen-coreclr, arch-avx512

Milestone: 8.0.0

@tommcdon
Copy link
Member

fyi @hoyosjs

@BruceForstall
Copy link
Member Author

cc @dotnet/avx512-contrib

@BruceForstall
Copy link
Member Author

Some places that need to change:

#if defined(TARGET_AMD64)
// If you update this value, make sure that it fits in the data payload of a
// DebuggerHeapExecutableMemoryChunk. This will need to be bumped to 0x40 for AVX 512 support.
const static int cbBufferBypass = 0x20;
BYTE BypassBuffer[cbBufferBypass];

case 16:
case 32:
memcpy(reinterpret_cast<void*>(targetFixup), bufferBypass, fixupSize);
break;

(needs "64" case)

@hoyosjs
Copy link
Member

hoyosjs commented Jun 21, 2023

Do the instruction tables need the update too?

@BruceForstall
Copy link
Member Author

Do the instruction tables need the update too?

It needs to learn about EVEX encodings. Whether that is by replacing the instruction tables, creating a new set of EVEX-specific tables, or writing manual EVEX parsing code (and not using the table code path) is up for design discussion.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants