AVX-512 debugger support: breakpoints #87843

BruceForstall · 2023-06-20T21:35:47Z

As part of implementing AVX-512 support (link), we should support breakpoints on AVX-512 instructions, especially newly supported EVEX encoded instructions.

Currently, consider the following test case:

using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;

namespace avx512
{
    internal class Program
    {
        [MethodImpl(MethodImplOptions.NoInlining)]
        static void Print(float f)
        {
            Console.WriteLine(f);
        }

        static void Main(string[] args)
        {
            Vector512<float> v2 = Vector512.Create(17.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f, 10.0f, 11.0f, 12.0f, 13.0f, 14.0f, 15.0f, 16.0f);  // ***** set BP here
            Print(v2.GetElement(15));
        }
    }
}

If you build it (either Debug or Release in Visual Studio) and run it, it prints 16. If you set a breakpoint on the indicated line, then run it in VS, hit the breakpoint, then continue, it prints 0. The act of setting, hitting, and running after a breakpoint in the debugger causes program behavior to differ.

Setting the breakpoint that location actually sets it on this instruction:

00007FFCD6977091 62 F1 7C 48 10 05 25 00 00 00 vmovups     zmm0,zmmword ptr [avx512.Program.Main(System.String[])+040h (07FFCD69770C0h)]

This is a RIP-relative read instruction.

When the debugger sets a breakpoint, it copies the instruction to a "patch" location. After the breakpoint, it executes the instruction from the patch location. Thus, it needs special handling for RIP-relative addressing, as the executed code RIP will be different from when the code was generated. The debugger disassembles the instruction to determine if it uses RIP-relative addressing. If so, it updates the RIP-relative address in the patch to point to the patch buffer with additional space for data, which is copied from the original location. For write operations, the data is copied back to the original location after the instruction is executed.

The debugger figures out characteristics of the instruction, such as whether it contains RIP-relative addressing, using NativeWalker::DecodeInstructionForPatchSkip and the tables that were introduced with dotnet/coreclr#25958: see https://github.com/dotnet/runtime/tree/main/src/coreclr/debug/ee/amd64/gen_amd64InstrDecode.

These tables understand up to VEX encodings, but do not understand EVEX encodings. Either they need to be updated, or some other mechanism needs to be introduced in NativeWalker::DecodeInstructionForPatchSkip for EVEX instructions.

Note that the JIT currently generates RIP-relative reads for 512-bit vector constants in the data section. It generally does not generate RIP-relative reads or writes to class static variables (where it is expected it would generate RIP-relative addressing) because 512-bit vector statics get placed in the Frozen (non GC) heap (so they don't move). Currently, that heap is normally too far away from the generated code heap to allow for RIP-relative addressing. However, that could change (and it is a goal to change it, as with #78292, which was reverted). And if they are placed closer to the JIT, RIP-relative addressing will automatically be enabled.

Thus, for RIP-relative reads on which breakpoints are set, executing in the debugger will currently load random data, leading to incorrect program behavior. For RIP-relative writes (which probably don't currently occur), arbitrary data corruption could occur.

The text was updated successfully, but these errors were encountered:

ghost · 2023-06-20T21:35:53Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

As part of implementing AVX-512 support (link), we should support breakpoints on AVX-512 instructions, especially newly supported EVEX encoded instructions.

Currently, consider the following test case:

using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;

namespace avx512
{
    internal class Program
    {
        [MethodImpl(MethodImplOptions.NoInlining)]
        static void Print(float f)
        {
            Console.WriteLine(f);
        }

        static void Main(string[] args)
        {
            Vector512<float> v2 = Vector512.Create(17.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f, 10.0f, 11.0f, 12.0f, 13.0f, 14.0f, 15.0f, 16.0f);  // ***** set BP here
            Print(v2.GetElement(15));
        }
    }
}

If you build it (either Debug or Release in Visual Studio) and run it, it prints 16. If you set a breakpoint on the indicated line, then run it in VS, hit the breakpoint, then continue, it prints 0. The act of setting, hitting, and running after a breakpoint in the debugger causes program behavior to differ.

Setting the breakpoint that location actually sets it on this instruction:

00007FFCD6977091 62 F1 7C 48 10 05 25 00 00 00 vmovups     zmm0,zmmword ptr [avx512.Program.Main(System.String[])+040h (07FFCD69770C0h)]

This is a RIP-relative read instruction.

When the debugger sets a breakpoint, it copies the instruction to a "patch" location. After the breakpoint, it executes the instruction from the patch location. Thus, it needs special handling for RIP-relative addressing, as the executed code RIP will be different from when the code was generated. The debugger disassembles the instruction to determine if it uses RIP-relative addressing. If so, it updates the RIP-relative address in the patch to point to the patch buffer with additional space for data, which is copied from the original location. For write operations, the data is copied back to the original location after the instruction is executed.

The debugger figures out characteristics of the instruction, such as whether it contains RIP-relative addressing, using NativeWalker::DecodeInstructionForPatchSkip and the tables that were introduced with dotnet/coreclr#25958: see https://github.com/dotnet/runtime/tree/main/src/coreclr/debug/ee/amd64/gen_amd64InstrDecode.

These tables understand up to VEX encodings, but do not understand EVEX encodings. Either they need to be updated, or some other mechanism needs to be introduced in NativeWalker::DecodeInstructionForPatchSkip for EVEX instructions.

Note that the JIT currently generates RIP-relative reads for 512-bit vector constants in the data section. It generally does not generate RIP-relative reads or writes to class static variables (where it is expected it would generate RIP-relative addressing) because 512-bit vector statics get placed in the Frozen (non GC) heap (so they don't move). Currently, that heap is normally too far away from the generated code heap to allow for RIP-relative addressing. However, that could change (and it is a goal to change it, as with #78292, which was reverted). And if they are placed closer to the JIT, RIP-relative addressing will automatically be enabled.

Author:	BruceForstall
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `arch-avx512`
Milestone:	8.0.0

tommcdon · 2023-06-21T13:57:43Z

fyi @hoyosjs

BruceForstall · 2023-06-21T16:48:37Z

cc @dotnet/avx512-contrib

BruceForstall · 2023-06-21T19:21:33Z

Some places that need to change:

runtime/src/coreclr/debug/ee/controller.h

Lines 304 to 308 in 5fd32d9

    
           #if defined(TARGET_AMD64) 
        
               // If you update this value, make sure that it fits in the data payload of a 
        
               // DebuggerHeapExecutableMemoryChunk. This will need to be bumped to 0x40 for AVX 512 support. 
        
               const static int cbBufferBypass = 0x20; 
        
               BYTE    BypassBuffer[cbBufferBypass];

runtime/src/coreclr/debug/ee/controller.cpp

Lines 4896 to 4899 in 5fd32d9

    
           case 16: 
        
           case 32: 
        
               memcpy(reinterpret_cast<void*>(targetFixup), bufferBypass, fixupSize); 
        
               break;

(needs "64" case)

hoyosjs · 2023-06-21T21:29:25Z

Do the instruction tables need the update too?

BruceForstall · 2023-06-21T22:04:49Z

Do the instruction tables need the update too?

It needs to learn about EVEX encodings. Whether that is by replacing the instruction tables, creating a new set of EVEX-specific tables, or writing manual EVEX parsing code (and not using the table code path) is up for design discussion.

BruceForstall added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture labels Jun 20, 2023

BruceForstall added this to the 8.0.0 milestone Jun 20, 2023

BruceForstall mentioned this issue Jun 20, 2023

Implement AVX-512 support #77034

Closed

56 tasks

JulieLeeMSFT assigned BruceForstall Jul 6, 2023

BruceForstall mentioned this issue Aug 2, 2023

Support breakpoints on AVX-512 instructions #89705

Merged

BruceForstall closed this as completed in #89705 Aug 11, 2023

ghost locked as resolved and limited conversation to collaborators Sep 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AVX-512 debugger support: breakpoints #87843

AVX-512 debugger support: breakpoints #87843

BruceForstall commented Jun 20, 2023 •

edited

Loading

ghost commented Jun 20, 2023

tommcdon commented Jun 21, 2023

BruceForstall commented Jun 21, 2023

BruceForstall commented Jun 21, 2023

hoyosjs commented Jun 21, 2023

BruceForstall commented Jun 21, 2023

AVX-512 debugger support: breakpoints #87843

AVX-512 debugger support: breakpoints #87843

Comments

BruceForstall commented Jun 20, 2023 • edited Loading

ghost commented Jun 20, 2023

tommcdon commented Jun 21, 2023

BruceForstall commented Jun 21, 2023

BruceForstall commented Jun 21, 2023

hoyosjs commented Jun 21, 2023

BruceForstall commented Jun 21, 2023

BruceForstall commented Jun 20, 2023 •

edited

Loading