-
Notifications
You must be signed in to change notification settings - Fork 7
SW Stack Arch
- RISC-V unprivileged ISA chapter 25 (Assembly programmer's handbook)
Sizes and offsets are calculated for 2k D_MEM. Use relative size to calculate the size for other sizes
Offset | Size | Region | Relative size |
---|---|---|---|
1936 | 112 | Stack[3] | 7/128 |
1824 | 112 | Stack[2] | 7/128 |
1712 | 112 | Stack[1] | 7/128 |
1600 | 112 | Stack[0] | 7/128 |
1264 | 336 | Local data[3] | 21/128 |
928 | 336 | Local data[2] | 21/128 |
592 | 336 | Local data[1] | 21/128 |
256 | 336 | Local data[0] | 21/128 |
128 | 256 | Global data | 1/8 (including GPC params) |
96 | 32 | GPC params[3] | Absolute size |
64 | 32 | GPC params[2] | Absolute size |
32 | 32 | GPC params[1] | Absolute size |
0 | 32 | GPC params[0] | Absolute size |
Notes about global data:
- GPC params are stolen from the global data. Its relative size includes the GPC params
- GPC params have an absolute size
- Global data must be big enough to contain GPC params. For the default configuration this implies that D_MEM must be at least 1k
-
BASE
- base address of data memory (defined in HW) -
GLOBAL_DATA_BASE
- base address of data memory
GLOBAL_DATA_BASE = BASE + 4 * GPC_PARAMS_SIZE
-
GLOBAL_DATA_SIZE
- size of shared memory region -
GPC_PARAMS_BASE
- base address of GPC params
GPC_PARAMS_BASE = GLOBAL_DATA_BASE
-
GPC_PARAMS_SIZE
- size of a single GPC params struct
GPC_PARAMS_SIZE = 32
-
LOCAL_DATA_BASE
- start of thread-local storage
LOCAL_DATA_BASE = GLOBAL_DATA_BASE + GLOBAL_DATA_SIZE
-
LOCAL_DATA_SIZE
- size of one thread-local storage area (there are 4) -
STACK_BASE
- start of stacks region -
STACK_SIZE
- size of single thread stack
STACK_BASE = LOCAL_DATA_BASE + 4 * LOCAL_DATA_SIZE
Register names in this section are taken from Assembly programmer's handbook in RISC-V spec. Refer to the table there for register names.
Architectural constraints: memory is loaded into I-MEM and PC (instruction pointer) points to address 0 on all threads. Therefore, the code must be built such that the entry point will be at zero.
Entry point flow is divided as follows:
- Reset code: common per-thread initialization (initialize registers and local memory)
- Global initialization code: primary thread initializes stuff and invokes the app's global setup handler. Other threads wait
- Local initialization code: primary thread wakes up other threads. All threads call the app's local setup handler
- All threads call the app's loop handler in an infinite loop (each invocation represents a single iteration)
- Clear pipeline with
nop
s
5
nop
s should be enough for now
- Initialize registers to zero
TODO: is this really needed? Can be removed if not
- Read thread ID from CR space (annotate as
tid
)
See hardware-software interface page for offset
- Reset stack pointer:
sp = BASE + STACK_BASE + tid * STACK_SIZE + STACK_SIZE - 4
- Reset frame pointer:
fp = sp
- Calculate GPC params offset:
GPC_PARAMS = GPC_PARAMS_BASE * tid * GPC_PARAMS_SIZE
- Fill GPC params (see other calculations below)
- Reset global pointer:
gp = GLOBAL_DATA_BASE + 4 * GPC_PARAMS_SIZE
- Reset thread pointer:
tp = LOBAL_DATA_BASE + tid * LOCAL_DATA_SIZE
TODO: revise everything below (will we have this scheme in the end?)
- If bootstrap thread:
- Call global setup handler
- Wake up other threads
- Else:
- Wait until woken up
- Call local setup handler
- Call loop handler in an infinite loop
TODO: how should reset be implemented?
/**
* Global setup entry point
*
* Invoked by a single thread on reset. Other threads are asleep
*
* @param gpc_params GPC parameters
*/
void gpc_global_setup(const gpc_params_t *gpc_params);
/**
* Local setup entry point
*
* Executed once per thread after global setup
*
* @param gpc_params GPC parameters
*/
void gpc_local_setup(const gpc_params_t *gpc_params);
/**
* Loop entry point: executed in an infinite loop after setup
*
* @param gpc_params GPC parameters
*/
void gpc_loop(const gpc_params_t *gpc_params);
Name | Offset | Description |
---|---|---|
tid | 0 | Current thread ID |
global_data | 4 | Global data base |
global_data_size | 8 | Global data size |
local_data_base | 12 | Local data base |
local_data_size | 16 | Local data size |
reserved[3] | 20 | Reserved fields (must be zero) |
- Current design seems broken: there is probably no way to guarantee that all threads will wait until global initialization is finished (there is no way to ensure that shared memory is initialized when all threads reach that point). Perhaps it requires a change in hardware (e.g. start only one thread per core and let it start the others when ready)
- Application data after warm reset: currently D_MEM and I_MEM don't change on warm reset. This breaks global data that is initialized on load time (e.g. global variables, stuff from ".rodata"), because their modified values will remain instead of the expected initial values