Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utilities/tests for generating x86-64 instructions #3

Merged
merged 9 commits into from
Sep 2, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions doc/emitter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Emitter
x86-64 has a lot of instructions. They are described in Volume 2 of the 5 Volume "Intel® 64 and IA-32 Architectures Software Developer’s Manual". Just this volume alone is over 2000 pages, which would take forever to fully implement. As a result, we will use only a subset of these instructions. This the rough plan:

- Most instructions like `add` will only be implemented with `r64 r64` versions.
- To accomplish something like `add rax, 1`, we will use a temporary register `X`
- `mov X, 1`
- `add rax, X`
- The constant propagation system will be able to provide enough information that we could eventually use `add r64 immX` and similar if needed.
- Register allocation should handle the case `(set! x (+ 3 y))` as:
- `mov x, 3`
- `add x, y`
- but `(set! x (+ y 3))`, in cases where `y` is needed after and `x` can't take its place, will become the inefficient
- `mov x, y`
- `mov rtemp, 3`
- `add x, rtemp`
- Loading constants into registers will be done efficiently, using the same strategy used by modern versions of `gcc` and `clang`.
- Memory access will be done in the form `mov rdest, [roff + raddr]` where `roff` is the offset register. Doing memory access in this form was found to be much faster in simple benchmark test.
- Memory access to the stack will have an extra `sub` and more complicated dereference. GOAL code seems to avoid using the stack in most places, and I suspect the programmers attempted to avoid stack spills.
- `mov rdest, rsp` : coloring move for upcoming subtract
- `sub rdest, roff` : convert real pointer to GOAL pointer
- `mov rdest, [rdest + roff + variable_offset]` : access memory through normal GOAL deref.
- Note - we should check that the register allocator gets this right always, and eliminates moves and avoid using a temporary register.
- Again, the constant propagation should give use enough information, if we ever want/need to implement a more efficient `mov rdest, [rsp + varaible_offset]` type instructions.
- Memory access to static data should use `rip` addressing, like `mov rdest, [rip + offset]`. And creating pointers to static data could be `lea rdest, [rip - roff + offset]`
111 changes: 111 additions & 0 deletions doc/registers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
## Registers
Although modern computers are much faster than the PS2, and we could probably get away with a really inefficient register allocation scheme, I think it's worth it to get this right.


## Register differences between MIPS and x86-64
The PS2's MIPS processor has these categories of register:
- General Purpose. They are 128-bit, but usually only lower 64 bits are used. 32 registers, each 128-bits.
- Floating point registers. 32 registers, each for a 32-bit float.
- Vector float registers. 32 registers, each for 4x 32-bit floats. Used only in inline assembly
- `vi` registers. 16 registers, each a 16-bit integer. Used very rarely in inline assembly

There are also some control/special registers too (`Q`, `R`...), but code using these will be manually ported.

In comparison, x86-64 has much fewer registers:
- 16 General Purpose. Each 64-bits
- 16 `xmm` registers. 128-bits, and can store either 128-bit integers or 4x 32-bit floats

Here is the mapping:
- MIPS GPR (lower 64 bits only) - x86-64 GPR
- MIPS GPR (128-bits, only special cases) - x64-64 `xmm`
- MIPS floating point - x64-64 `xmm` (lower 32-bits)
- MIPS vector float - x64-64 `xmm` (packed single)
- MIPS `vi` - manually handled??

Here is the MIPS GPR map
- `r0` or `zero` : always zero
- `r1` or `at`: assembler temporary, not saved, not used by compiler
- `r2` or `v0`: return value, not saved
- `r3` or `v1`: not saved
- `r4` or `a0`: not saved, argument 0
- `r5` or `a1`: not saved, argument 1
- `r6` or `a2`: not saved, argument 2
- `r7` or `a3`: not saved, argument 3
- `r8` or `t0`: not saved, argument 4
- `r9` or `t1`: not saved, argument 5
- `r10` or `t2`: not saved, argument 6
- `r11` or `t3`: not saved, argument 7
- `r12` or `t4`: not saved
- `r13` or `t5`: not saved
- `r14` or `t6`: not saved
- `r15` or `t7`: not saved
- `r16` or `s0`: saved
- `r17` or `s1`: saved
- `r18` or `s2`: saved
- `r19` or `s3`: saved
- `r20` or `s4`: saved
- `r21` or `s5`: saved
- `r22` or `s6`: saved, process pointer
- `r23` or `s7`: saved, symbol pointer
- `r24` or `t8`: not saved
- `r25` or `t9`: function call pointer
- `r26` or `k0`: kernel reserved (unused)
- `r27` or `k1`: kernel reserved (unused)
- `r28` or `gp`: saved
- `r29` or `sp`: stack pointer
- `r30` or `fp`: current function pointer
- `r31` or `ra`: return address pointer


And the x86-64 GPR map
- `rax`: return value
- `rcx`: argument 3
- `rdx`: argument 2
- `rbx`: saved
- `rsp`: stack pointer
- `rbp`: saved
- `rsi`: argument 1
- `rdi`: argument 0
- `r8`: argument 4
- `r9`: argument 5
- `r10`: argument 6, saved if not argument
- `r11`: argument 7, saved if not argument
- `r12`: saved
- `r13`: process pointer
- `r14`: symbol table
- `r15`: offset pointer


### Plan for Memory Access
The PS2 uses 32-bit pointers, and changing the pointer size is likely to introduce bugs, so we will keep using 32-bit pointers. Also, GOAL has some hardcoded checks on the value for pointers, so we need to make sure the memory appears to the program at the correct address.

To do this, we have separate "GOAL Pointers" and "real pointers". The "real pointers" are just normal x86-64 pointers, and the "GOAL Pointer" is an offset into a main memory array. A "real pointer" to the main memory array is stored in `r15` (offset pointer) when GOAL code is executing, and the GOAL compiler will automatically add this to all memory accesses.

The overhead from doing this is not as bad as you might expect - x86 has nice addressing modes (Scale Index Base) which are quite fast, and don't require the use of temporary registers. If this does turn out to be much slower than I expect, we can introduce the concept of real pointers in GOAL code, and use them in places where we are limited in accessing memory.

The main RAM is mapped at `0x0` on the PS2, with the first 1 MB reserved for the kernel. We should make sure that the first 1 MB of GOAL main memory will cause a segfault if read/written/executed, to catch null pointer bugs.

In the C Kernel code, the `r15` pointer doesn't exist. Instead, `g_ee_main_memory` is a global which points to the beginning of GOAL main memory. The `Ptr<T>` template class takes care of converting GOAL and C++ pointers in a convenient way, and catches null pointer access.

The GOAL stack pointer should likely be a real pointer, for performance reasons. This makes pushing/popping/calling/returning/accessing stack variables much faster, with the only cost being getting a GOAL stack pointer requiring some extra work. The stack pointer's value is read/written extremely rarely, so this seems like a good tradeoff.

The other registers are less clear. The process pointer can probably be a real pointer. But the symbol table could go a few ways:
1. Make it a real pointer. Symbol value access is fast, but comparison against false requires two extra operations.
2. Make it a GOAL pointer. Symbol value access requires more complicated addressing modes, but comparison against false is fast.

Right now I'm leaning toward 1, but making it a configurable option in case I'm wrong. It should only be a change in a few places (emitter + where it's set up in the runtime).

### Plan for Function Call and Arguments
In GOAL for MIPS, function calls are weird. Functions are always called by register using `t9`. There seems to be a different register allocator for function pointers, as nested function calls have really wacky register allocation. In GOAL-x86-64, this restriction will be removed, and a function can be called from any register. (see next section for why we can do this)

Unfortunately, GOAL's 128-bit function arguments present a big challenge. When calling a function, we can't know if the function we're calling is expecting an integer, float, or 128-bit integer. In fact, the caller may not even know if it has an integer, float, or 128-bit integer. The easy and foolproof way to get this right is to use 128-bit `xmm` registers for all arguments and return values, but this will cause a massive performance hit and increase code size, as we'll have to move values between register types constantly. The current plan is this:

- Floats go in GPRs for arguments/return values. GOAL does this too, and takes the hit of converting between registers as well. Probably the impact on a modern CPU is even worse, but we can live with it.
- We'll compromise


### Plan for Static Data

### Plan for Memory

### Other details
3 changes: 1 addition & 2 deletions game/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ set(CMAKE_CXX_FLAGS "-O0 -ggdb -Wall \

enable_language(ASM_NASM)
set(RUNTIME_SOURCE
main.cpp
runtime.cpp
system/SystemThread.cpp
system/IOP_Kernel.cpp
Expand Down Expand Up @@ -49,7 +48,7 @@ set(RUNTIME_SOURCE
overlord/stream.cpp)

# the runtime should be built without any static/dynamic libraries.
add_executable(gk ${RUNTIME_SOURCE})
add_executable(gk ${RUNTIME_SOURCE} main.cpp)

# we also build a runtime library for testing. This version is likely unable to call GOAL code correctly, but
# can be used to test other things.
Expand Down
15 changes: 15 additions & 0 deletions game/kernel/fileio.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ char* basename_goal(char* s) {
}
}

/* Original code, has memory bug.
// back up...
for (;;) {
if (pt < input) {
Expand All @@ -211,6 +212,20 @@ char* basename_goal(char* s) {
return pt + 1; // and return one past
}
}
*/

// back up...
for (;;) {
if (pt <= input) {
return input;
}
pt--;
char c = *pt;
// until we hit a slash.
if (c == '\\' || c == '/') { // slashes
return pt + 1; // and return one past
}
}
}

/*!
Expand Down
4 changes: 2 additions & 2 deletions game/system/Deci2Server.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ class Deci2Server {
void accept_thread_func();
bool kill_accept_thread = false;
char* buffer = nullptr;
int server_fd;
int server_fd = -1;
sockaddr_in addr;
int new_sock;
int new_sock = -1;
bool server_initialized = false;
bool accept_thread_running = false;
bool server_connected = false;
Expand Down
4 changes: 2 additions & 2 deletions goalc/emitter/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
add_library(emitter
CodeTester.cpp
registers.cpp)
Register.cpp
CodeTester.cpp)
89 changes: 76 additions & 13 deletions goalc/emitter/CodeTester.cpp
Original file line number Diff line number Diff line change
@@ -1,40 +1,61 @@
/*!
* @file CodeTester.cpp
* The CodeTester is a utility to run the output of the compiler as part of a unit test.
* This is effective for tests which try all combinations of registers, etc.
*
* The CodeTester can't be used for tests requiring the full GOAL language/linking.
*/

#include <sys/mman.h>
#include <cstdio>
#include "CodeTester.h"
#include "Instruction.h"
#include "IGen.h"

namespace goal {
namespace emitter {

CodeTester::CodeTester() : m_info(RegisterInfo::make_register_info()) {}

std::string CodeTester::dump_to_hex_string() {
/*!
* Convert to a string for comparison against an assembler or tests.
*/
std::string CodeTester::dump_to_hex_string(bool nospace) {
std::string result;
char buff[32];
for (int i = 0; i < code_buffer_size; i++) {
sprintf(buff, "%02x ", code_buffer[i]);
if (nospace) {
sprintf(buff, "%02X", code_buffer[i]);
} else {
sprintf(buff, "%02x ", code_buffer[i]);
}

result += buff;
}

// remove trailing space
if (!result.empty()) {
if (!nospace && !result.empty()) {
result.pop_back();
}
return result;
}

/*!
* Add an instruction to the buffer.
*/
void CodeTester::emit(const Instruction& instr) {
code_buffer_size += instr.emit(code_buffer + code_buffer_size);
assert(code_buffer_size <= code_buffer_capacity);
}

void CodeTester::emit_set_gpr_as_return(X86R gpr) {
assert(is_gpr(gpr));
emit(IGen::mov_gpr64_gpr64(RAX, gpr));
}

/*!
* Add a return instruction to the buffer.
*/
void CodeTester::emit_return() {
emit(IGen::ret());
}

/*!
* Pop all GPRs off of the stack. Optionally exclude rax.
* Pops RSP always, which is weird, but doesn't cause issues.
*/
void CodeTester::emit_pop_all_gprs(bool exclude_rax) {
for (int i = 16; i-- > 0;) {
if (i != RAX || !exclude_rax) {
Expand All @@ -43,6 +64,10 @@ void CodeTester::emit_pop_all_gprs(bool exclude_rax) {
}
}

/*!
* Push all GPRs onto the stack. Optionally exclude RAX.
* Pushes RSP always, which is weird, but doesn't cause issues.
*/
void CodeTester::emit_push_all_gprs(bool exclude_rax) {
for (int i = 0; i < 16; i++) {
if (i != RAX || !exclude_rax) {
Expand All @@ -51,14 +76,53 @@ void CodeTester::emit_push_all_gprs(bool exclude_rax) {
}
}

/*!
* Push all xmm registers (all 128-bits) to the stack.
*/
void CodeTester::emit_push_all_xmms() {
emit(IGen::sub_gpr64_imm8s(RSP, 8));
for (int i = 0; i < 16; i++) {
emit(IGen::sub_gpr64_imm8s(RSP, 16));
emit(IGen::store128_gpr64_xmm128(RSP, XMM0 + i));
}
}

/*!
* Pop all xmm registers (all 128-bits) from the stack
*/
void CodeTester::emit_pop_all_xmms() {
for (int i = 0; i < 16; i++) {
emit(IGen::load128_xmm128_gpr64(XMM0 + i, RSP));
emit(IGen::add_gpr64_imm8s(RSP, 16));
}
emit(IGen::add_gpr64_imm8s(RSP, 8));
}

/*!
* Remove everything from the code buffer
*/
void CodeTester::clear() {
code_buffer_size = 0;
}

/*!
* Execute the buffered code with no arguments, return the value of RAX.
*/
u64 CodeTester::execute() {
return ((u64(*)())code_buffer)();
}

/*!
* Execute code buffer with arguments. Use get_c_abi_arg to figure out which registers the
* arguments will appear in (will handle windows/linux differences)
*/
u64 CodeTester::execute(u64 in0, u64 in1, u64 in2, u64 in3) {
return ((u64(*)(u64, u64, u64, u64))code_buffer)(in0, in1, in2, in3);
}

/*!
* Allocate a code buffer of the given size.
*/
void CodeTester::init_code_buffer(int capacity) {
code_buffer = (u8*)mmap(nullptr, capacity, PROT_EXEC | PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
Expand All @@ -76,5 +140,4 @@ CodeTester::~CodeTester() {
munmap(code_buffer, code_buffer_capacity);
}
}

} // namespace goal
} // namespace emitter
Loading