open-goal · water111 · Sep 2, 2020 · Aug 29, 2020 · Aug 29, 2020 · Aug 30, 2020
diff --git a/doc/emitter.md b/doc/emitter.md
@@ -0,0 +1,24 @@
+# Emitter
+x86-64 has a lot of instructions.  They are described in Volume 2 of the 5 Volume "Intel® 64 and IA-32 Architectures Software Developer’s Manual". Just this volume alone is over 2000 pages, which would take forever to fully implement.  As a result, we will use only a subset of these instructions.  This the rough plan:
+
+- Most instructions like `add` will only be implemented with `r64 r64` versions.
+- To accomplish something like `add rax, 1`, we will use a temporary register `X`
+  - `mov X, 1`
+  - `add rax, X`
+  - The constant propagation system will be able to provide enough information that we could eventually use `add r64 immX` and similar if needed.
+  - Register allocation should handle the case `(set! x (+ 3 y))` as:
+     - `mov x, 3`
+     - `add x, y`
+  - but `(set! x (+ y 3))`, in cases where `y` is needed after and `x` can't take its place, will become the inefficient
+     - `mov x, y`
+     - `mov rtemp, 3`
+     - `add x, rtemp`
+- Loading constants into registers will be done efficiently, using the same strategy used by modern versions of `gcc` and `clang`.
+- Memory access will be done in the form `mov rdest, [roff + raddr]` where `roff` is the offset register. Doing memory access in this form was found to be much faster in simple benchmark test.
+- Memory access to the stack will have an extra `sub` and more complicated dereference.  GOAL code seems to avoid using the stack in most places, and I suspect the programmers attempted to avoid stack spills.
+  - `mov rdest, rsp` : coloring move for upcoming subtract
+  - `sub rdest, roff` : convert real pointer to GOAL pointer
+  - `mov rdest, [rdest + roff + variable_offset]` : access memory through normal GOAL deref.
+  - Note - we should check that the register allocator gets this right always, and eliminates moves and avoid using a temporary register.
+  - Again, the constant propagation should give use enough information, if we ever want/need to implement a more efficient `mov rdest, [rsp + varaible_offset]` type instructions.
+- Memory access to static data should use `rip` addressing, like `mov rdest, [rip + offset]`. And creating pointers to static data could be `lea rdest, [rip - roff + offset]`
diff --git a/doc/registers.md b/doc/registers.md
@@ -0,0 +1,111 @@
+## Registers
+Although modern computers are much faster than the PS2, and we could probably get away with a really inefficient register allocation scheme, I think it's worth it to get this right.
+
+
+## Register differences between MIPS and x86-64
+The PS2's MIPS processor has these categories of register:
+- General Purpose. They are 128-bit, but usually only lower 64 bits are used. 32 registers, each 128-bits.
+- Floating point registers. 32 registers, each for a 32-bit float.
+- Vector float registers. 32 registers, each for 4x 32-bit floats. Used only in inline assembly
+- `vi` registers. 16 registers, each a 16-bit integer. Used very rarely in inline assembly
+
+There are also some control/special registers too (`Q`, `R`...), but code using these will be manually ported.
+
+In comparison, x86-64 has much fewer registers:
+- 16 General Purpose. Each 64-bits
+- 16 `xmm` registers. 128-bits, and can store either 128-bit integers or 4x 32-bit floats
+
+Here is the mapping:
+- MIPS GPR (lower 64 bits only) - x86-64 GPR
+- MIPS GPR (128-bits, only special cases) - x64-64 `xmm`
+- MIPS floating point - x64-64 `xmm` (lower 32-bits)
+- MIPS vector float - x64-64 `xmm` (packed single)
+- MIPS `vi` - manually handled??
+
+Here is the MIPS GPR map
+- `r0` or `zero` : always zero
+- `r1` or `at`: assembler temporary, not saved, not used by compiler
+- `r2` or `v0`: return value, not saved
+- `r3` or `v1`: not saved
+- `r4` or `a0`: not saved, argument 0
+- `r5` or `a1`: not saved, argument 1
+- `r6` or `a2`: not saved, argument 2
+- `r7` or `a3`: not saved, argument 3
+- `r8` or `t0`: not saved, argument 4
+- `r9` or `t1`: not saved, argument 5
+- `r10` or `t2`: not saved, argument 6
+- `r11` or `t3`: not saved, argument 7
+- `r12` or `t4`: not saved
+- `r13` or `t5`: not saved
+- `r14` or `t6`: not saved
+- `r15` or `t7`: not saved
+- `r16` or `s0`: saved
+- `r17` or `s1`: saved
+- `r18` or `s2`: saved
+- `r19` or `s3`: saved
+- `r20` or `s4`: saved
+- `r21` or `s5`: saved
+- `r22` or `s6`: saved, process pointer
+- `r23` or `s7`: saved, symbol pointer
+- `r24` or `t8`: not saved
+- `r25` or `t9`: function call pointer
+- `r26` or `k0`: kernel reserved (unused)
+- `r27` or `k1`: kernel reserved (unused)
+- `r28` or `gp`: saved
+- `r29` or `sp`: stack pointer
+- `r30` or `fp`: current function pointer
+- `r31` or `ra`: return address pointer
+
+
+And the x86-64 GPR map
+- `rax`: return value
+- `rcx`: argument 3
+- `rdx`: argument 2
+- `rbx`: saved
+- `rsp`: stack pointer
+- `rbp`: saved
+- `rsi`: argument 1
+- `rdi`: argument 0
+- `r8`: argument 4
+- `r9`: argument 5
+- `r10`: argument 6, saved if not argument
+- `r11`: argument 7, saved if not argument
+- `r12`: saved
+- `r13`: process pointer
+- `r14`: symbol table
+- `r15`: offset pointer
+
+
+### Plan for Memory Access
+The PS2 uses 32-bit pointers, and changing the pointer size is likely to introduce bugs, so we will keep using 32-bit pointers.  Also, GOAL has some hardcoded checks on the value for pointers, so we need to make sure the memory appears to the program at the correct address.
+
+To do this, we have separate "GOAL Pointers" and "real pointers".  The "real pointers" are just normal x86-64 pointers, and the "GOAL Pointer" is an offset into a main memory array.  A "real pointer" to the main memory array is stored in `r15` (offset pointer) when GOAL code is executing, and the GOAL compiler will automatically add this to all memory accesses.
+
+The overhead from doing this is not as bad as you might expect - x86 has nice addressing modes (Scale Index Base) which are quite fast, and don't require the use of temporary registers. If this does turn out to be much slower than I expect, we can introduce the concept of real pointers in GOAL code, and use them in places where we are limited in accessing memory.
+
+The main RAM is mapped at `0x0` on the PS2, with the first 1 MB reserved for the kernel.  We should make sure that the first 1 MB of GOAL main memory will cause a segfault if read/written/executed, to catch null pointer bugs.
+
+In the C Kernel code, the `r15` pointer doesn't exist. Instead, `g_ee_main_memory` is a global which points to the beginning of GOAL main memory.  The `Ptr<T>` template class takes care of converting GOAL and C++ pointers in a convenient way, and catches null pointer access.
+
+The GOAL stack pointer should likely be a real pointer, for performance reasons.  This makes pushing/popping/calling/returning/accessing stack variables much faster, with the only cost being getting a GOAL stack pointer requiring some extra work. The stack pointer's value is read/written extremely rarely, so this seems like a good tradeoff.
+
+The other registers are less clear.  The process pointer can probably be a real pointer.  But the symbol table could go a few ways:
+1. Make it a real pointer.  Symbol value access is fast, but comparison against false requires two extra operations.
+2. Make it a GOAL pointer. Symbol value access requires more complicated addressing modes, but comparison against false is fast.
+
+Right now I'm leaning toward 1, but making it a configurable option in case I'm wrong. It should only be a change in a few places (emitter + where it's set up in the runtime).
+
+### Plan for Function Call and Arguments
+In GOAL for MIPS, function calls are weird.  Functions are always called by register using `t9`. There seems to be a different register allocator for function pointers, as nested function calls have really wacky register allocation.  In GOAL-x86-64, this restriction will be removed, and a function can be called from any register. (see next section for why we can do this)
+
+Unfortunately, GOAL's 128-bit function arguments present a big challenge.  When calling a function, we can't know if the function we're calling is expecting an integer, float, or 128-bit integer. In fact, the caller may not even know if it has an integer, float, or 128-bit integer. The easy and foolproof way to get this right is to use 128-bit `xmm` registers for all arguments and return values, but this will cause a massive performance hit and increase code size, as we'll have to move values between register types constantly. The current plan is this:
+
+- Floats go in GPRs for arguments/return values. GOAL does this too, and takes the hit of converting between registers as well. Probably the impact on a modern CPU is even worse, but we can live with it.
+- We'll compromise 
+
+
+### Plan for Static Data
+
+### Plan for Memory
+
+### Other details
diff --git a/game/CMakeLists.txt b/game/CMakeLists.txt
@@ -7,7 +7,6 @@ set(CMAKE_CXX_FLAGS "-O0 -ggdb -Wall \
 
 enable_language(ASM_NASM)
 set(RUNTIME_SOURCE
-        main.cpp
         runtime.cpp
         system/SystemThread.cpp
         system/IOP_Kernel.cpp
@@ -49,7 +48,7 @@ set(RUNTIME_SOURCE
         overlord/stream.cpp)
 
 # the runtime should be built without any static/dynamic libraries.
-add_executable(gk ${RUNTIME_SOURCE})
+add_executable(gk ${RUNTIME_SOURCE} main.cpp)
 
 # we also build a runtime library for testing. This version is likely unable to call GOAL code correctly, but
 # can be used to test other things.

diff --git a/game/kernel/fileio.cpp b/game/kernel/fileio.cpp
@@ -199,6 +199,7 @@ char* basename_goal(char* s) {
     }
   }
 
+  /* Original code, has memory bug.
   // back up...
   for (;;) {
     if (pt < input) {
@@ -211,6 +212,20 @@ char* basename_goal(char* s) {
       return pt + 1;              // and return one past
     }
   }
+   */
+
+  // back up...
+  for (;;) {
+    if (pt <= input) {
+      return input;
+    }
+    pt--;
+    char c = *pt;
+    // until we hit a slash.
+    if (c == '\\' || c == '/') {  // slashes
+      return pt + 1;              // and return one past
+    }
+  }
 }
 
 /*!

diff --git a/game/system/Deci2Server.h b/game/system/Deci2Server.h
@@ -34,9 +34,9 @@ class Deci2Server {
   void accept_thread_func();
   bool kill_accept_thread = false;
   char* buffer = nullptr;
-  int server_fd;
+  int server_fd = -1;
   sockaddr_in addr;
-  int new_sock;
+  int new_sock = -1;
   bool server_initialized = false;
   bool accept_thread_running = false;
   bool server_connected = false;

diff --git a/goalc/emitter/CMakeLists.txt b/goalc/emitter/CMakeLists.txt
@@ -1,3 +1,3 @@
 add_library(emitter
-        CodeTester.cpp
-        registers.cpp)
+        Register.cpp
+        CodeTester.cpp)
diff --git a/goalc/emitter/CodeTester.cpp b/goalc/emitter/CodeTester.cpp
@@ -1,40 +1,61 @@
+/*!
+ * @file CodeTester.cpp
+ * The CodeTester is a utility to run the output of the compiler as part of a unit test.
+ * This is effective for tests which try all combinations of registers, etc.
+ *
+ * The CodeTester can't be used for tests requiring the full GOAL language/linking.
+ */
+
 #include <sys/mman.h>
-#include <cstdio>
 #include "CodeTester.h"
-#include "Instruction.h"
 #include "IGen.h"
 
-namespace goal {
+namespace emitter {
+
+CodeTester::CodeTester() : m_info(RegisterInfo::make_register_info()) {}
 
-std::string CodeTester::dump_to_hex_string() {
+/*!
+ * Convert to a string for comparison against an assembler or tests.
+ */
+std::string CodeTester::dump_to_hex_string(bool nospace) {
   std::string result;
   char buff[32];
   for (int i = 0; i < code_buffer_size; i++) {
-    sprintf(buff, "%02x ", code_buffer[i]);
+    if (nospace) {
+      sprintf(buff, "%02X", code_buffer[i]);
+    } else {
+      sprintf(buff, "%02x ", code_buffer[i]);
+    }
+
     result += buff;
   }
 
   // remove trailing space
-  if (!result.empty()) {
+  if (!nospace && !result.empty()) {
     result.pop_back();
   }
   return result;
 }
 
+/*!
+ * Add an instruction to the buffer.
+ */
 void CodeTester::emit(const Instruction& instr) {
   code_buffer_size += instr.emit(code_buffer + code_buffer_size);
   assert(code_buffer_size <= code_buffer_capacity);
 }
 
-void CodeTester::emit_set_gpr_as_return(X86R gpr) {
-  assert(is_gpr(gpr));
-  emit(IGen::mov_gpr64_gpr64(RAX, gpr));
-}
-
+/*!
+ * Add a return instruction to the buffer.
+ */
 void CodeTester::emit_return() {
   emit(IGen::ret());
 }
 
+/*!
+ * Pop all GPRs off of the stack. Optionally exclude rax.
+ * Pops RSP always, which is weird, but doesn't cause issues.
+ */
 void CodeTester::emit_pop_all_gprs(bool exclude_rax) {
   for (int i = 16; i-- > 0;) {
     if (i != RAX || !exclude_rax) {
@@ -43,6 +64,10 @@ void CodeTester::emit_pop_all_gprs(bool exclude_rax) {
   }
 }
 
+/*!
+ * Push all GPRs onto the stack. Optionally exclude RAX.
+ * Pushes RSP always, which is weird, but doesn't cause issues.
+ */
 void CodeTester::emit_push_all_gprs(bool exclude_rax) {
   for (int i = 0; i < 16; i++) {
     if (i != RAX || !exclude_rax) {
@@ -51,14 +76,53 @@ void CodeTester::emit_push_all_gprs(bool exclude_rax) {
   }
 }
 
+/*!
+ * Push all xmm registers (all 128-bits) to the stack.
+ */
+void CodeTester::emit_push_all_xmms() {
+  emit(IGen::sub_gpr64_imm8s(RSP, 8));
+  for (int i = 0; i < 16; i++) {
+    emit(IGen::sub_gpr64_imm8s(RSP, 16));
+    emit(IGen::store128_gpr64_xmm128(RSP, XMM0 + i));
+  }
+}
+
+/*!
+ * Pop all xmm registers (all 128-bits) from the stack
+ */
+void CodeTester::emit_pop_all_xmms() {
+  for (int i = 0; i < 16; i++) {
+    emit(IGen::load128_xmm128_gpr64(XMM0 + i, RSP));
+    emit(IGen::add_gpr64_imm8s(RSP, 16));
+  }
+  emit(IGen::add_gpr64_imm8s(RSP, 8));
+}
+
+/*!
+ * Remove everything from the code buffer
+ */
 void CodeTester::clear() {
   code_buffer_size = 0;
 }
 
+/*!
+ * Execute the buffered code with no arguments, return the value of RAX.
+ */
 u64 CodeTester::execute() {
   return ((u64(*)())code_buffer)();
 }
 
+/*!
+ * Execute code buffer with arguments. Use get_c_abi_arg to figure out which registers the
+ * arguments will appear in (will handle windows/linux differences)
+ */
+u64 CodeTester::execute(u64 in0, u64 in1, u64 in2, u64 in3) {
+  return ((u64(*)(u64, u64, u64, u64))code_buffer)(in0, in1, in2, in3);
+}
+
+/*!
+ * Allocate a code buffer of the given size.
+ */
 void CodeTester::init_code_buffer(int capacity) {
   code_buffer = (u8*)mmap(nullptr, capacity, PROT_EXEC | PROT_READ | PROT_WRITE,
                           MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
@@ -76,5 +140,4 @@ CodeTester::~CodeTester() {
     munmap(code_buffer, code_buffer_capacity);
   }
 }
-
-}  // namespace goal
+}  // namespace emitter