Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support breakpoints on AVX-512 instructions #89705

Merged
merged 12 commits into from
Aug 11, 2023
1,829 changes: 833 additions & 996 deletions src/coreclr/debug/ee/amd64/amd64InstrDecode.h

Large diffs are not rendered by default.

Large diffs are not rendered by default.

125 changes: 72 additions & 53 deletions src/coreclr/debug/ee/amd64/gen_amd64InstrDecode/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,59 +18,79 @@ gcc -g opcodes.cpp -o opcodes
gdb opcodes -batch -ex "set disassembly-flavor intel" -ex "disass /r opcodes" > opcodes.intel

# Parse disassembly and generate code
cat opcodes.intel | dotnet run > ../amd64InstrDecode.h
# Build as a separate step so it will display build errors, if any.
../../../../../../dotnet.sh build
cat opcodes.intel | ../../../../../../dotnet.sh run > new_amd64InstrDecode.h
```

After checking it, copy the generated new_amd64InstrDecode.h to ../amd64InstrDecode.h.

This process can be run using the `createTables.sh` script in this directory.

## Technical design

`amd64InstrDecode.h`'s primary purpose is to provide a reliable
and accurately mechanism to implement
`Amd64 NativeWalker::DecodeInstructionForPatchSkip(..)`.
The primary purpose of `amd64InstrDecode.h` is to provide a reliable
and accurate mechanism to implement the `amd64`
`NativeWalker::DecodeInstructionForPatchSkip(..)` function.

This function needs to be able to decode an arbitrary `amd64`
instruction. The decoder currently must be able to identify:

- Whether the instruction includes an instruction pointer relative memory access
- Whether the instruction includes an instruction pointer relative memory access (RIP relative addressing)
- The location of the memory displacement within the instruction
- The instruction length in bytes
- The size of the memory operation in bytes

To get this right is complicated, because the `amd64` instruction set is
complicated.
To get this right is complicated, because the `amd64` instruction set is complicated.

A high level view of the `amd64` instruction set can be seen by looking at:

`AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions`
`Section 1.1 Instruction Encoding Overview`
`Figure 1-1. Instruction Encoding Syntax`

also:

`Intel(R) 64 and IA-32 Architectures Software Developer's Manual`
`Volume 2: Instruction Set Reference, A-Z`
`Chapter 2 Instruction Format`

A high level view of the `amd64` instruction set can be seen by looking at
`AMD64 Architecture Programmer’s
Manual Volume 3:
General-Purpose and System Instructions`
`Section 1.1 Instruction Encoding Overview`
`Figure 1-1. Instruction Encoding Syntax`
Also useful in the manuals:

AMD: `Appendix A: Opcode and Operand Encodings`
Intel: `Volume 2, Appendix A: Opcode Map`

The general behavior of each instruction can be modified by many of the
bytes in the 1-15 byte instruction.
bytes in the 1-15 byte instruction (15 is the maximum byte length of an instruction).

This set of files generates a metadata table by extracting the data from
sample instruction disassembly.

The process entails
The process entails:
- Generating a necessary set of instructions
- Generating parsable disassembly for the instructions
- Parsing the disassembly
- Generating the tables

### Generating a necessary set of instructions

What set of possible instruction encodings are needed to extract the information
needed in the tables?

#### The necessary set

- All instruction forms which use instruction pointer relative memory accesses.
- All instruction forms which use RIP relative memory accesses.
- All combinations of modifier bits which affect the instruction form
- presence and/or size of the memory access
- size or presence of immediates
- vector size bits

So with modrm.mod = 0, modrm.rm = 0x5 (instruction pointer relative memory access)
So with modrm.mod = 0, modrm.rm = 0x5 (RIP relative memory access)
we need all combinations of:
- `opcodemap`
- `opcode`
- `modrm.reg`
- `pp`, `W`, `L`
- `pp`, `W`, `L`, `L'L`
- Some combinations of `vvvv`
- Optional prefixes: `repe`, `repne`, `opSize`

Expand All @@ -80,7 +100,7 @@ We will iterate through all the necessary set. Many of these combinations
will lead to invalid/undefined encodings. This will cause the disassembler
to give up and mark the disassemble as bad.

The disassemble will then resume trying to disassemble at the next boundary.
The disassembly will then resume trying to disassemble at the next boundary.

To make sure the disassembler attempts to disassemble every instruction,
we need to make sure the preceding instruction is always valid and terminates
Expand All @@ -97,8 +117,7 @@ instruction.

Using a fixed suffix makes disassembly parsing simpler.

After the modrm byte, the generated instructions always include a
`postamble`,
After the modrm byte, the generated instructions always include a `postamble`,

```C++
const char* postamble = "0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59,\n";
Expand All @@ -109,14 +128,14 @@ This meets the padding consistency needs.
#### Ordering

As a convenience to the parser the encoded instructions are logically
ordered. The ordering is generally, but can vary slightly depending on
ordered. The ordering is generally as follows, but can vary slightly depending on
the needs of the particular opcode map:

- map
- opcode
- pp & some prefixes
- modrm.reg
- W, L, vvvv
- W, L, L'L, vvvv

This is to keep related instruction grouped together.

Expand Down Expand Up @@ -160,6 +179,7 @@ their sizes. For instance:
- "OWORD PTR [rip+0x53525150]"
- "XMMWORD PTR [rip+0x53525150]"
- "YMMWORD PTR [rip+0x53525150]"
- "ZMMWORD PTR [rip+0x53525150]"
- "FWORD PTR [rip+0x53525150]"
- "TBYTE PTR [rip+0x53525150]"

Expand All @@ -177,7 +197,7 @@ gdb opcodes -batch -ex "set disassembly-flavor intel" -ex "disass /r opcodes" >

#### Alternative disassemblers

It seems `objdump` could provide similar results. Untested, the parser may need to
It seems `objdump` could provide similar results. This is untested. The parser may need to
be modified for subtle differences.
```bash
objdump -D -M intel -b --insn-width=15 -j .data opcodes
Expand All @@ -186,27 +206,28 @@ objdump -D -M intel -b --insn-width=15 -j .data opcodes
The lldb parser aborts parsing when it observes bad instruction. It
might be usable with additional python scripts.

Windows disassembler may also work. Not attempted.
Windows disassembler may also work. It has not been tried.

### Parsing the disassembly

```bash
# Parse disassembly and generate code
cat opcodes.intel | dotnet run > ../amd64InstrDecode.h
```

#### Finding relevant disassembly lines

We are not interested in all lines in the disassembly. The disassembler
stray comments, recovery and our padding introduce lines we need to ignore.
stray comments, recovery, and our padding introduce lines we need to ignore.

We filter out and ignore non-disassembly lines using a `Regex` for a
disassembly line.

We expect the generated instruction samples to be in a group. The first
instruction in the group is the only one we are interested in. This is
the one we are interested in.
instruction in the group is the only one we are interested in.

The group is terminated by a pair of instructions. The first terminal
instruction must have `0x58` as the last byte in it encoding. The final
instruction must have `0x58` as the last byte in its encoding. The final
terminal instruction must be a `0x59\tpop`.

We continue parsing the first line of each group.
Expand All @@ -216,7 +237,7 @@ We continue parsing the first line of each group.
Many encodings are not valid. For `gdb`, these instructions are marked
`(bad)`. We filter and ignore these.

#### Parsing the disassambly for each instruction sample
#### Parsing the disassembly for each instruction sample

For each sample, we need to calculate the important properties:
- mnemonic
Expand All @@ -226,9 +247,9 @@ For each sample, we need to calculate the important properties:
- map
- opcode position
- Encoding Flags
- pp, W, L, prefix and encoding flags
- pp, W, L, L'L, prefix and encoding flags
- `SuffixFlags`
- presence of instruction relative accesses
- presence of RIP relative accesses
- size of operation
- position in the list of operands
- number of immediate bytes
Expand All @@ -242,37 +263,32 @@ unknown sizes.

#### `opCodeExt`

To facilitate identifying sets of instructions, the creates an `opCodeExt`.
To facilitate identifying sets of instructions, the tool creates an `opCodeExt`.

For the `Primary` map this is simply the encoded opcode from the instruction
shifted left by 4 bits.

For the 3D Now `NOW3D` map this is simply the encoded immediate from the
instruction shifted left by 4 bits.

For the `Secondary` `F38`, and `F39` maps this is the encoded opcode from
the instruction shifted left by 4 bits orred with a synthetic `pp`. The
For the `Secondary`, `F38`, and `F39` maps this is the encoded opcode from
the instruction shifted left by 4 bits or'ed with a synthetic `pp`. The
synthetic `pp` is constructed to match the rules of
`Table 1-22. VEX/XOP.pp Encoding` from the
`AMD64 Architecture Programmer’s
Manual Volume 3:
General-Purpose and System Instructions`. For the case where the opSize
0x66 prefix is present with a `rep*` prefix, the `rep*` prefix is used
to encode `pp`.
`AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions`.
For the case where the opSize 0x66 prefix is present with a `rep*` prefix, the `rep*` prefix is used
to encode `pp`.

For the `VEX*` and `XOP*` maps this is the encoded opcode from
the instruction shifted left by 4 bits orred with `pp`.
For the `VEX*` maps this is the encoded opcode from
the instruction shifted left by 4 bits or'ed with `pp`.

#### Identifying sets of instructions

For most instructions, the opCodeExt will uniquely identify the instruction.
For most instructions, the `opCodeExt` will uniquely identify the instruction.

For many instructions, `modrm.reg` is used to uniquely identify the instruction.
For many instructions, `modrm.reg` is used to help uniquely identify the instruction.
These instruction typically change mnemonic and behavior as `modrm.reg`
changes. These become problematic, when the form of these instructions vary.

For a few other instructions the `L`, `W`, `vvvv` value may the instruction
change behavior. Usually these do not change mnemonic.
For a few other instructions the `L`, `W`, `vvvv` values may change the instruction
behavior. Usually these do not change mnemonic.

The set of instructions is therefore usually grouped by the opcode map and
`opCodeExt` generated above. For these a change in `opCodeExt` or `map`
Expand All @@ -291,7 +307,9 @@ set based on the encoding flags. These are the `sometimesFlags`
`sometimesFlags`, check each rule by calling `TestHypothesis`. This
determines if the rule corresponds to the set of observations.

Encode the rule as a string.
Encode the rule as a string. The rule might encode that the `W` bit or `L`
bit causes a different memory/immediate behavior for the particular
`<map, opCodeExt>` entry.

Add the rule to the set of all observed rules.
Add the set's rule with comment to a dictionary.
Expand Down Expand Up @@ -328,7 +346,7 @@ samples, is restricted to a max compilation/link unit size. Early drafts
were generating more instructions, and couldn't be compiled.

However, there is no restriction that all the samples must come from
single executable. These could easily be separated by opcode map...
single executable. These could easily be separated by opcode map.

## Risks

Expand All @@ -342,7 +360,7 @@ Until a reasonably featured disassembler is created, the new instruction
set can not be supported by this methodology.

The previous methodology of manually encoding these new instruction set
would still be possible....
would still be possible.

### Disassembler errors

Expand All @@ -351,6 +369,7 @@ of the disassembler may have disassembly bugs. Using newer disassemblers
would mitigate this to some extent.

### Bugs

- Inadequate samples. Are there other bits which modify instruction
behavior which we missed?
- Parser/Table generator implementation bugs. Does the parser do what it
Expand All @@ -368,4 +387,4 @@ Regenerate and compare.

### New debugger feature requires more metadata

Add new feature code, regenerate
Add new feature code, regenerate.
Loading