dotnet · BruceForstall · Aug 11, 2023 · Jul 31, 2023 · Jul 31, 2023 · Jul 31, 2023
diff --git a/src/coreclr/debug/ee/amd64/amd64InstrDecode.h b/src/coreclr/debug/ee/amd64/amd64InstrDecode.h
diff --git a/src/coreclr/debug/ee/amd64/gen_amd64InstrDecode/Amd64InstructionTableGenerator.cs b/src/coreclr/debug/ee/amd64/gen_amd64InstrDecode/Amd64InstructionTableGenerator.cs
diff --git a/src/coreclr/debug/ee/amd64/gen_amd64InstrDecode/README.md b/src/coreclr/debug/ee/amd64/gen_amd64InstrDecode/README.md
@@ -18,59 +18,79 @@ gcc -g opcodes.cpp -o opcodes
 gdb opcodes -batch -ex "set disassembly-flavor intel" -ex "disass /r opcodes" > opcodes.intel
 
 # Parse disassembly and generate code
-cat opcodes.intel | dotnet run > ../amd64InstrDecode.h
+# Build as a separate step so it will display build errors, if any.
+../../../../../../dotnet.sh build
+cat opcodes.intel | ../../../../../../dotnet.sh run > new_amd64InstrDecode.h
 ```
 
+After checking it, copy the generated new_amd64InstrDecode.h to ../amd64InstrDecode.h.
+
+This process can be run using the `createTables.sh` script in this directory.
+
 ## Technical design
 
-`amd64InstrDecode.h`'s primary purpose is to provide a reliable
-and accurately mechanism to implement
-`Amd64 NativeWalker::DecodeInstructionForPatchSkip(..)`.
+The primary purpose of `amd64InstrDecode.h` is to provide a reliable
+and accurate mechanism to implement the `amd64`
+`NativeWalker::DecodeInstructionForPatchSkip(..)` function.
 
 This function needs to be able to decode an arbitrary `amd64`
 instruction.  The decoder currently must be able to identify:
 
-- Whether the instruction includes an instruction pointer relative memory access
+- Whether the instruction includes an instruction pointer relative memory access (RIP relative addressing)
 - The location of the memory displacement within the instruction
 - The instruction length in bytes
 - The size of the memory operation in bytes
 
-To get this right is complicated, because the `amd64` instruction set is
-complicated.
+To get this right is complicated, because the `amd64` instruction set is complicated.
+
+A high level view of the `amd64` instruction set can be seen by looking at:
+
+`AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions`
+`Section 1.1 Instruction Encoding Overview`
+`Figure 1-1. Instruction Encoding Syntax`
+
+also:
+
+`Intel(R) 64 and IA-32 Architectures Software Developer's Manual`
+`Volume 2: Instruction Set Reference, A-Z`
+`Chapter 2 Instruction Format`
 
-A high level view of the `amd64` instruction set can be seen by looking at
-`AMD64 Architecture Programmer’s
- Manual Volume 3:
- General-Purpose and System Instructions`
- `Section 1.1 Instruction Encoding Overview`
- `Figure 1-1.  Instruction Encoding Syntax`
+Also useful in the manuals:
+
+AMD: `Appendix A: Opcode and Operand Encodings`
+Intel: `Volume 2, Appendix A: Opcode Map`
 
 The general behavior of each instruction can be modified by many of the
-bytes in the 1-15 byte instruction.
+bytes in the 1-15 byte instruction (15 is the maximum byte length of an instruction).
 
 This set of files generates a metadata table by extracting the data from
 sample instruction disassembly.
 
-The process entails
+The process entails:
 - Generating a necessary set of instructions
 - Generating parsable disassembly for the instructions
 - Parsing the disassembly
+- Generating the tables
 
 ### Generating a necessary set of instructions
 
+What set of possible instruction encodings are needed to extract the information
+needed in the tables?
+
 #### The necessary set
 
-- All instruction forms which use instruction pointer relative memory accesses.
+- All instruction forms which use RIP relative memory accesses.
 - All combinations of modifier bits which affect the instruction form
     - presence and/or size of the memory access
     - size or presence of immediates
+    - vector size bits
 
-So with modrm.mod = 0, modrm.rm = 0x5 (instruction pointer relative memory access)
+So with modrm.mod = 0, modrm.rm = 0x5 (RIP relative memory access)
 we need all combinations of:
 - `opcodemap`
 - `opcode`
 - `modrm.reg`
-- `pp`, `W`, `L`
+- `pp`, `W`, `L`, `L'L`
 - Some combinations of `vvvv`
 - Optional prefixes: `repe`, `repne`, `opSize`
 
@@ -80,7 +100,7 @@ We will iterate through all the necessary set. Many of these combinations
 will lead to invalid/undefined encodings.  This will cause the disassembler
 to give up and mark the disassemble as bad.
 
-The disassemble will then resume trying to disassemble at the next boundary.
+The disassembly will then resume trying to disassemble at the next boundary.
 
 To make sure the disassembler attempts to disassemble every instruction,
 we need to make sure the preceding instruction is always valid and terminates
@@ -97,8 +117,7 @@ instruction.
 
 Using a fixed suffix makes disassembly parsing simpler.
 
-After the modrm byte, the generated instructions always include a
-`postamble`,
+After the modrm byte, the generated instructions always include a `postamble`,
 
 ```C++
 const char* postamble = "0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59,\n";
@@ -109,14 +128,14 @@ This meets the padding consistency needs.
 #### Ordering
 
 As a convenience to the parser the encoded instructions are logically
-ordered.  The ordering is generally, but can vary slightly depending on
+ordered.  The ordering is generally as follows, but can vary slightly depending on
 the needs of the particular opcode map:
 
 - map
 - opcode
 - pp & some prefixes
 - modrm.reg
-- W, L, vvvv
+- W, L, L'L, vvvv
 
 This is to keep related instruction grouped together.
 
@@ -160,6 +179,7 @@ their sizes. For instance:
 - "OWORD PTR [rip+0x53525150]"
 - "XMMWORD PTR [rip+0x53525150]"
 - "YMMWORD PTR [rip+0x53525150]"
+- "ZMMWORD PTR [rip+0x53525150]"
 - "FWORD PTR [rip+0x53525150]"
 - "TBYTE PTR [rip+0x53525150]"
 
@@ -177,7 +197,7 @@ gdb opcodes -batch -ex "set disassembly-flavor intel" -ex "disass /r opcodes" >
 
 #### Alternative disassemblers
 
-It seems `objdump` could provide similar results. Untested, the parser may need to
+It seems `objdump` could provide similar results. This is untested. The parser may need to
 be modified for subtle differences.
 ```bash
 objdump -D -M intel -b --insn-width=15 -j .data opcodes
@@ -186,27 +206,28 @@ objdump -D -M intel -b --insn-width=15 -j .data opcodes
 The lldb parser aborts parsing when it observes bad instruction. It
 might be usable with additional python scripts.
 
-Windows disassembler may also work.  Not attempted.
+Windows disassembler may also work. It has not been tried.
 
 ### Parsing the disassembly
+
 ```bash
 # Parse disassembly and generate code
 cat opcodes.intel | dotnet run > ../amd64InstrDecode.h
 ```
+
 #### Finding relevant disassembly lines
 
 We are not interested in all lines in the disassembly. The disassembler
-stray comments, recovery and our padding introduce lines we need to ignore.
+stray comments, recovery, and our padding introduce lines we need to ignore.
 
 We filter out and ignore non-disassembly lines using a `Regex` for a
 disassembly line.
 
 We expect the generated instruction samples to be in a group.  The first
-instruction in the group is the only one we are interested in.  This is
-the one we are interested in.
+instruction in the group is the only one we are interested in.
 
 The group is terminated by a pair of instructions.  The first terminal
-instruction must have `0x58` as the last byte in it encoding. The final
+instruction must have `0x58` as the last byte in its encoding. The final
 terminal instruction must be a `0x59\tpop`.
 
 We continue parsing the first line of each group.
@@ -216,7 +237,7 @@ We continue parsing the first line of each group.
 Many encodings are not valid.  For `gdb`, these instructions are marked
 `(bad)`.  We filter and ignore these.
 
-#### Parsing the disassambly for each instruction sample
+#### Parsing the disassembly for each instruction sample
 
 For each sample, we need to calculate the important properties:
 - mnemonic
@@ -226,9 +247,9 @@ For each sample, we need to calculate the important properties:
     - map
     - opcode position
 - Encoding Flags
-    - pp, W, L, prefix and encoding flags
+    - pp, W, L, L'L, prefix and encoding flags
 - `SuffixFlags`
-    - presence of instruction relative accesses
+    - presence of RIP relative accesses
         - size of operation
         - position in the list of operands
         - number of immediate bytes
@@ -242,37 +263,32 @@ unknown sizes.
 
 #### `opCodeExt`
 
-To facilitate identifying sets of instructions, the creates an `opCodeExt`.
+To facilitate identifying sets of instructions, the tool creates an `opCodeExt`.
 
 For the `Primary` map this is simply the encoded opcode from the instruction
 shifted left by 4 bits.
 
-For the 3D Now `NOW3D` map this is simply the encoded immediate from the
-instruction shifted left by 4 bits.
-
-For the `Secondary` `F38`, and `F39` maps this is the encoded opcode from
-the instruction shifted left by 4 bits orred with a synthetic `pp`. The
+For the `Secondary`, `F38`, and `F39` maps this is the encoded opcode from
+the instruction shifted left by 4 bits or'ed with a synthetic `pp`. The
 synthetic `pp` is constructed to match the rules of
 `Table 1-22. VEX/XOP.pp Encoding` from the
-`AMD64 Architecture Programmer’s
- Manual Volume 3:
- General-Purpose and System Instructions`.  For the case where the opSize
- 0x66 prefix is present with a `rep*` prefix, the `rep*` prefix is used
- to encode `pp`.
+`AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions`.
+For the case where the opSize 0x66 prefix is present with a `rep*` prefix, the `rep*` prefix is used
+to encode `pp`.
 
-For the `VEX*` and `XOP*` maps this is the encoded opcode from
-the instruction shifted left by 4 bits orred with `pp`.
+For the `VEX*` maps this is the encoded opcode from
+the instruction shifted left by 4 bits or'ed with `pp`.
 
 #### Identifying sets of instructions
 
-For most instructions, the opCodeExt will uniquely identify the instruction.
+For most instructions, the `opCodeExt` will uniquely identify the instruction.
 
-For many instructions, `modrm.reg` is used to uniquely identify the instruction.
+For many instructions, `modrm.reg` is used to help uniquely identify the instruction.
 These instruction typically change mnemonic and behavior as `modrm.reg`
 changes. These become problematic, when the form of these instructions vary.
 
-For a few other instructions the `L`, `W`, `vvvv` value may the instruction
-change behavior. Usually these do not change mnemonic.
+For a few other instructions the `L`, `W`, `vvvv` values may change the instruction
+behavior. Usually these do not change mnemonic.
 
 The set of instructions is therefore usually grouped by the opcode map and
 `opCodeExt` generated above.  For these a change in `opCodeExt` or `map`
@@ -291,7 +307,9 @@ set based on the encoding flags.  These are the `sometimesFlags`
 `sometimesFlags`, check each rule by calling `TestHypothesis`.  This
 determines if the rule corresponds to the set of observations.
 
-Encode the rule as a string.
+Encode the rule as a string. The rule might encode that the `W` bit or `L`
+bit causes a different memory/immediate behavior for the particular
+`<map, opCodeExt>` entry.
 
 Add the rule to the set of all observed rules.
 Add the set's rule with comment to a dictionary.
@@ -328,7 +346,7 @@ samples, is restricted to a max compilation/link unit size. Early drafts
 were generating more instructions, and couldn't be compiled.
 
 However, there is no restriction that all the samples must come from
-single executable.  These could easily be separated by opcode map...
+single executable.  These could easily be separated by opcode map.
 
 ## Risks
 
@@ -342,7 +360,7 @@ Until a reasonably featured disassembler is created, the new instruction
 set can not be supported by this methodology.
 
 The previous methodology of manually encoding these new instruction set
-would still be possible....
+would still be possible.
 
 ### Disassembler errors
 
@@ -351,6 +369,7 @@ of the disassembler may have disassembly bugs.  Using newer disassemblers
 would mitigate this to some extent.
 
 ### Bugs
+
 - Inadequate samples.  Are there other bits which modify instruction
 behavior which we missed?
 - Parser/Table generator implementation bugs. Does the parser do what it
@@ -368,4 +387,4 @@ Regenerate and compare.
 
 ### New debugger feature requires more metadata
 
-Add new feature code, regenerate
+Add new feature code, regenerate.