Skip to content

Commit

Permalink
Add ObjectGenerator and Register Allocator (#10)
Browse files Browse the repository at this point in the history
* start the ObjectFileGenerator

* finish v3 generation

* add analysis for register allocator

* add register allocator

* fix const

* fix build

* fix formatting for clang-format

* attempt to fix windows build

* windows 2

* windows 3

* windows 4

* windows 5

* windows 6
  • Loading branch information
water111 authored Sep 5, 2020
1 parent 660eeda commit 2075dd6
Show file tree
Hide file tree
Showing 37 changed files with 2,815 additions and 20 deletions.
1 change: 1 addition & 0 deletions common/type_system/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
add_library(type_system
SHARED
TypeSystem.cpp
Type.cpp
TypeSpec.cpp)
Expand Down
2 changes: 1 addition & 1 deletion common/type_system/TypeSystem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ DerefInfo TypeSystem::get_deref_info(const TypeSpec& ts) {
* Create a simple typespec. The type must be defined or forward declared for this to work.
* If you really need a TypeSpec which refers to a non-existent type, just construct your own.
*/
TypeSpec TypeSystem::make_typespec(const std::string& name) {
TypeSpec TypeSystem::make_typespec(const std::string& name) const {
if (m_types.find(name) != m_types.end() ||
m_forward_declared_types.find(name) != m_forward_declared_types.end()) {
return TypeSpec(name);
Expand Down
2 changes: 1 addition & 1 deletion common/type_system/TypeSystem.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ class TypeSystem {

DerefInfo get_deref_info(const TypeSpec& ts);

TypeSpec make_typespec(const std::string& name);
TypeSpec make_typespec(const std::string& name) const;
TypeSpec make_function_typespec(const std::vector<std::string>& arg_types,
const std::string& return_type);

Expand Down
78 changes: 78 additions & 0 deletions doc/object_file_generation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# CGO/DGO Files
The CGO/DGO file format is exactly the same - the only difference is the name of the file. The DGO name indicates that the file contains all the data for a level. The engine will load these files into a level heap, which can then be cleared and replaced with a different level.

I suspect that the DGO file name came first, as a package containing all the data in the level which can be loaded very quickly. Names in the code all say `dgo`, and the `MakeFileName` system shows that both CGO and DGO files are stored in the `game/dgo` folder. Probably the engine and kernel were packed into a CGO file after the file format was created for loading levels.

Each CGO/DGO file contains a bunch of individual object files. Each file has a name. There are some duplicate names - sometimes the two files with the same names are very different (example, code for an enemy, art for an enemy), and other times they are very similar (tiny differences in code/data). The files come in two versions, v4 and v3, and both CGOs and DGOs contain both versions. If an object file has code in it, it is always a v3. It is possible to have a v3 file with just data, but usually the data is pretty small. The v4 files tend to have a lot of data in them. My theory is that the compiler creates v3 files out of GOAL source code files, and that other tools for creating things like textures/levels/art-groups generate v4 objects. There are a number of optimizations in the loading process for v4 objects that are better suited for larger files. To stay at 60 fps always, a v3 object must be smaller than around 750 kB. A v4 object does not have this limitation.

# The V3 format
The v3 format is divided into three segments:
1. Main: this contains all of the functions/data that will be used by the game.
2. Debug: this is only loaded in debug mode, and is always stored on a separate `kdebugheap`.
3. Top Level: this contains some initialization code to add functions/variables to the symbol table, and any user-written initialization. It runs once, immediately after the object is loaded, then is thrown away.

Each segment also has linking data, which tells the linker how to link references to symbols, types, and memory (possibly in a different segment).

This format will be different between the PC and PS2 versions, as linking data for x86-64 will need to look different from MIPS.

Each segments can contain functions and data. The top-level segment must start with a function which will be run to initialize the object. All the data here goes through the GOAL compiler and type system.

# The V4 format
The V4 format contains just data. Like v3, the data is GOAL objects, but was probably generated by a tool that wasn't the compiler. A V4 object has no segments, but must start with a `basic` object. After being linked, the `relocate` method of this `basic` will be called, which should do any additional linking required for the specific object.

Because this is just data, there's no reason for the PC version to change this format. This means we can also check the

Note: you may see references to v2 format in the code. I believe v4 format is identical to v2, except the linking data is stored at the end, to enable a "don't copy the last object" optimization. The game's code uses the `work_v2` function on v4 objects as a result, and some of my comments may refer to v2, when I really mean v4. I believe there are no v2 objects in any games.

# Plan
- Create a library for generating obj files in V3/V4 format
- V4 should match game exactly. Doesn't support code.
- V3 is our own thing. Must support code.

We'll eventually create tools which use the library in V4 mode to generate object files for rebuilding levels and textures. We may need to wait until more about these formats is understood before trying this.

The compiler will use the library in V3 mode to generate object files for each `gc` (GOAL source code) file.

# CGO files
The only CGO files read are `KERNEL.CGO` and `GAME.CGO`.

The `KERNEL.CGO` contains the GOAL kernel and some very basic libraries (`gcommon`, `gstring`, `gstate`, ...). I believe that `KERNEL` was always loaded on boot during development, as its required for the Listener to function.

The `GAME.CGO` file combines the contents of the `ENGINE`, `COMMON` and `ART` CGO files. `ENGINE` contains the game engine code, `COMMON` contains level-specific code (outside of the game engine) that is always loaded. If code is used in basically all the levels, it makes sense to put in in `COMMON`, so it doesn't have to be loaded for each currently active level. The `ART` CGO contains common art/textures/models, like Jak and his animations.

The `JUNGLE.CGO`, `MAINCAVE.CGO`, `SUNKEN.CGO` file contains some copies of files used in the jungle, cave, LPC levels. Some are a tiny bit different. I believe it is unused.

The `L1.CGO` file contains basically all the level-specific code/Jak animations and some textures. It doesn't seem to contain any 3D models. It's unused, but I'm still interested in understanding its format, as the Jak 1 demos have this file.

The `RACERP.CGO` file contains (I think) everything needed for the Zoomer. Unused. The same data appears in the levels as needed, maybe with some slight differences.

The `VILLAGEP.CGO` file contains common code shared in village levels, which isn't much (oracle, warp gate). Unused. The same data appears in the levels as needed.

The `WATER-AN.CGO` file contains some small code/data for water animations. Unused. The same data appears in the levels as needed.

# CGO/DGO Loading Process
A CGO/DGO file is loaded onto a "heap", which is just a chunk of contiguous memory. The loading process is designed to be fast, and also able to fill the entire heap, and allow each object to allocate memory after it is loaded. The process works like this:

1. Two temporary buffers are allocated at the end of the heap. These are sized so that they can fit the largest object file, not including the last object file.
2. The IOP begins loading, and is permitted to load the first two object files to the two temporary buffers
3. The main CPU waits for the first object file to be loaded.
4. While the second object file being loaded, the first object is "linked". The first step to linking is to copy the object file data from the temporary buffer to the bottom of the heap, kicking out all the other data in the process. The linking data is checked to see if it is in the top of the heap, and is moved there if it isn't already. The run-once initialization code is copied to another temporary allocation on top of the heap and the debug data is copied to the debug heap.
5. Still, while the second object file is being loaded, the linker runs on the first object file.
6. Still, while the second object file is being loaded, the second object's initialization code is run (located in top of the heap). The second object may allocate from this heap, and will get valid memory without creating gaps in the heap.
7. Memory allocated from the top during linking is freed.
8. The IOP is allowed to load into the first buffer again.
9. The main CPU waits for the second object to be loaded, if the IOP hasn't finished yet.
10. This double-buffering pattern continues - while one object is loaded into a buffer, the other one will be copied to the bottom of the heap, linked, and initialized. When the second to last object is loaded, the IOP will wait an extra time until the main CPU has finished linking it until loading the last object (one additional wait) because the last object has a special case.
11. The last object will be loaded directly onto the bottom of the heap, as there may not be enough memory to use the temporary buffers and load the last object. The temporary buffers are freed.
12. If the last object is a v3, its linking data will be moved to the top-level, and the object data will be moved to fill in the gap left behind. If the last object is a v2, the main data will be at the beginning of the object data, so there is an optimization that will avoid copying the object data to save time, if the data is already close to being in the right place.


Generally the last file in a level DGO will be the largest v4 object. You can only have one file larger than a temporary buffer, and it must come last. The last file also doesn't have to be copied after being loaded into memory if it is a v4.

V3 max size:
A V3 object is copied all at once with a single `ultimate-memcpy`. Usually linking gets to run for around 3 to 5% of a total frame. The `ultimate-memcpy` routine does a to/from scratchpad transfer. In practice, mem/spr transfers are around 1800 MB/sec, and the data has to be copied twice, so the effective bandwidth is 900 MB/sec.

`900 MB / second * (0.04 * 0.0167 seconds) = 601 kilobytes`

This estimate is backed up by the the chunk size of the v4 copy routine, which copies one chunk per frame. It picks 524 kB as the maximum amount that's safe to copy per frame.

24 changes: 18 additions & 6 deletions doc/registers.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,25 +87,37 @@ The main RAM is mapped at `0x0` on the PS2, with the first 1 MB reserved for the

In the C Kernel code, the `r15` pointer doesn't exist. Instead, `g_ee_main_memory` is a global which points to the beginning of GOAL main memory. The `Ptr<T>` template class takes care of converting GOAL and C++ pointers in a convenient way, and catches null pointer access.

The GOAL stack pointer should likely be a real pointer, for performance reasons. This makes pushing/popping/calling/returning/accessing stack variables much faster, with the only cost being getting a GOAL stack pointer requiring some extra work. The stack pointer's value is read/written extremely rarely, so this seems like a good tradeoff.
The GOAL stack pointer should likely be a real pointer, for performance reasons. This makes pushing/popping/calling/returning/accessing stack variables much faster (can use actual `push`, `pop`), with the only cost being getting a GOAL stack pointer requiring some extra work. The stack pointer's value is read/written extremely rarely (only in kernel code that will be rewritten anyway), so this seems like a good tradeoff.

The other registers are less clear. The process pointer can probably be a real pointer. But the symbol table could go a few ways:
1. Make it a real pointer. Symbol value access is fast, but comparison against false requires two extra operations.
2. Make it a GOAL pointer. Symbol value access requires more complicated addressing modes, but comparison against false is fast.
2. Make it a GOAL pointer. Symbol value access requires more complicated addressing modes to be one instruction, but comparison against false is fast.

Right now I'm leaning toward 1, but making it a configurable option in case I'm wrong. It should only be a change in a few places (emitter + where it's set up in the runtime).
Right now I'm leaning toward 2, but it shouldn't be a huge amount of work to change if I'm wrong.

### Plan for Function Call and Arguments
In GOAL for MIPS, function calls are weird. Functions are always called by register using `t9`. There seems to be a different register allocator for function pointers, as nested function calls have really wacky register allocation. In GOAL-x86-64, this restriction will be removed, and a function can be called from any register. (see next section for why we can do this)

Unfortunately, GOAL's 128-bit function arguments present a big challenge. When calling a function, we can't know if the function we're calling is expecting an integer, float, or 128-bit integer. In fact, the caller may not even know if it has an integer, float, or 128-bit integer. The easy and foolproof way to get this right is to use 128-bit `xmm` registers for all arguments and return values, but this will cause a massive performance hit and increase code size, as we'll have to move values between register types constantly. The current plan is this:

- Floats go in GPRs for arguments/return values. GOAL does this too, and takes the hit of converting between registers as well. Probably the impact on a modern CPU is even worse, but we can live with it.
- We'll compromise

- We'll compromise for 128-bit function calls. When the compiler can figure out that the function being called expects or returns a 128-bit value, it will use the 128-bit calling convention. In all other cases, it will use 64-bit. There aren't many places where 128-bit integer are used outside of inline assembly, so I suspect this will just work. If there are more complicated instances (call a function pointer and get either a 64 or 128-bit result), we will need to special case them.

### Plan for Static Data
The original GOAL implementation always called functions by using the `t9` register. So, on entry to a function, the `t9` register contains the address of the function. If the function needs to access static data, it will move this `fp`, then do `fp` relative addressing to load data. Example:
```
function-start:
daddiu sp, sp, -16 ;; allocate space on stack
sd fp, 8(sp) ;; back up old fp on stack
or fp, t9, r0 ;; set fp to address of function
lwc1 f0, L345(fp) ;; load relative to function start
```

To copy this exactly on x86 would require reserving two registers equivalent to `t9` and `gp`. A better approach for x86-64 is to use "RIP relative addressing". This can be used to load memory relative to the current instruction pointer. This addressing mode can be used with "load effective address" (`lea`) to create pointers to static data as well.

### Plan for Memory
Access memory by GOAL pointer in `rx` with constant offset (optionally zero):
```
mov rdest, [roff + rx + offset]
```

### Other details
4 changes: 4 additions & 0 deletions doc/runtime_todo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Runtime To-Do for Compiler Upgrade
- Handle `xmm`'s correctly for windows
- Change offset, etc
- Memory mapping so null pointer dereference causes a crash
25 changes: 22 additions & 3 deletions goalc/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,18 +1,37 @@
add_subdirectory(util)
add_subdirectory(goos)

IF (WIN32)
# TODO - implement windows listener
message("Windows Listener Not Implemented!")
ELSE()
add_subdirectory(listener)
ENDIF()
add_subdirectory(emitter)

add_library(compiler
SHARED
emitter/CodeTester.cpp
emitter/ObjectFileData.cpp
emitter/ObjectGenerator.cpp
emitter/Register.cpp
compiler/Compiler.cpp
compiler/Env.cpp
compiler/Val.cpp
compiler/IR.cpp
compiler/CodeGenerator.cpp
logger/Logger.cpp
regalloc/IRegister.cpp
regalloc/Allocator.cpp
regalloc/allocate.cpp
compiler/Compiler.cpp
)

add_executable(goalc main.cpp
compiler/Compiler.cpp)
)

IF (WIN32)
set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
target_link_libraries(compiler util goos type_system mman)
ENDIF()

target_link_libraries(goalc util goos type_system)
target_link_libraries(goalc util goos compiler type_system)
3 changes: 3 additions & 0 deletions goalc/compiler/CodeGenerator.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@


#include "CodeGenerator.h"
8 changes: 8 additions & 0 deletions goalc/compiler/CodeGenerator.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@


#ifndef JAK_CODEGENERATOR_H
#define JAK_CODEGENERATOR_H

class CodeGenerator {};

#endif // JAK_CODEGENERATOR_H
23 changes: 23 additions & 0 deletions goalc/compiler/Compiler.cpp
Original file line number Diff line number Diff line change
@@ -1 +1,24 @@
#include "Compiler.h"
#include "goalc/logger/Logger.h"

Compiler::Compiler() {
init_logger();
m_ts.add_builtin_types();
}

void Compiler::execute_repl() {}

Compiler::~Compiler() {
gLogger.close();
}

void Compiler::init_logger() {
gLogger.set_file("compiler.txt");
gLogger.config[MSG_COLOR].kind = LOG_FILE;
gLogger.config[MSG_DEBUG].kind = LOG_IGNORE;
gLogger.config[MSG_TGT].color = COLOR_GREEN;
gLogger.config[MSG_TGT_INFO].color = COLOR_BLUE;
gLogger.config[MSG_WARN].color = COLOR_RED;
gLogger.config[MSG_ICE].color = COLOR_RED;
gLogger.config[MSG_ERR].color = COLOR_RED;
}
6 changes: 6 additions & 0 deletions goalc/compiler/Compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,13 @@

class Compiler {
public:
Compiler();
~Compiler();
void execute_repl();

private:
void init_logger();

TypeSystem m_ts;
};

Expand Down
3 changes: 3 additions & 0 deletions goalc/compiler/Env.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@


#include "Env.h"
19 changes: 19 additions & 0 deletions goalc/compiler/Env.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@


#ifndef JAK_ENV_H
#define JAK_ENV_H

class Env {};

// global
// noemit
// objectfile
// configuration
// function
// block
// lexical
// label
// symbolmacro
// get parent env of type.

#endif // JAK_ENV_H
3 changes: 3 additions & 0 deletions goalc/compiler/IR.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@


#include "IR.h"
21 changes: 21 additions & 0 deletions goalc/compiler/IR.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#ifndef JAK_IR_H
#define JAK_IR_H

#include <string>
#include "CodeGenerator.h"
#include "goalc/regalloc/allocate.h"

class IR {
public:
virtual std::string print() = 0;
virtual RegAllocInstr to_rai() = 0;
virtual void do_codegen(CodeGenerator* gen) = 0;
};

class IR_Set : public IR {
public:
std::string print() override;
RegAllocInstr to_rai() override;
};

#endif // JAK_IR_H
22 changes: 22 additions & 0 deletions goalc/compiler/Val.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#include "Val.h"

/*!
* Fallback to_gpr if a more optimized one is not provided.
*/
RegVal* Val::to_gpr(FunctionEnv* fe) const {
(void)fe;
throw std::runtime_error("Val::to_gpr NYI");
}

/*!
* Fallback to_xmm if a more optimized one is not provided.
*/
RegVal* Val::to_xmm(FunctionEnv* fe) const {
(void)fe;
throw std::runtime_error("Val::to_xmm NYI");
}

RegVal* None::to_reg(FunctionEnv* fe) const {
(void)fe;
throw std::runtime_error("Cannot put None into a register.");
}
Loading

0 comments on commit 2075dd6

Please sign in to comment.