Skip to content

SW Stack Arch

DL8 edited this page Jun 21, 2021 · 10 revisions

References

  • RISC-V unprivileged ISA chapter 25 (Assembly programmer's handbook)

Default Data Memory Layout

Sizes and offsets are calculated for 2k D_MEM. Use relative size to calculate the size for other sizes

Offset Size Region Relative size
1936 112 Stack[3] 7/128
1824 112 Stack[2] 7/128
1712 112 Stack[1] 7/128
1600 112 Stack[0] 7/128
1264 336 Local data[3] 21/128
928 336 Local data[2] 21/128
592 336 Local data[1] 21/128
256 336 Local data[0] 21/128
128 256 Global data 1/8 (including GPC params)
96 32 GPC params[3] Absolute size
64 32 GPC params[2] Absolute size
32 32 GPC params[1] Absolute size
0 32 GPC params[0] Absolute size

Notes about global data:

  • GPC params are stolen from the global data. Its relative size includes the GPC params
  • GPC params have an absolute size
  • Global data must be big enough to contain GPC params. For the default configuration this implies that D_MEM must be at least 1k

Definitions

Variables

  • BASE - base address of data memory (defined in HW)
  • GLOBAL_DATA_BASE - base address of data memory

GLOBAL_DATA_BASE = BASE + 4 * GPC_PARAMS_SIZE

  • GLOBAL_DATA_SIZE - size of shared memory region
  • GPC_PARAMS_BASE - base address of GPC params

GPC_PARAMS_BASE = GLOBAL_DATA_BASE

  • GPC_PARAMS_SIZE - size of a single GPC params struct

GPC_PARAMS_SIZE = 32

  • LOCAL_DATA_BASE - start of thread-local storage

LOCAL_DATA_BASE = GLOBAL_DATA_BASE + GLOBAL_DATA_SIZE

  • LOCAL_DATA_SIZE - size of one thread-local storage area (there are 4)
  • STACK_BASE - start of stacks region
  • STACK_SIZE - size of single thread stack

STACK_BASE = LOCAL_DATA_BASE + 4 * LOCAL_DATA_SIZE

Software stack work

Register names in this section are taken from Assembly programmer's handbook in RISC-V spec. Refer to the table there for register names.

Architectural constraints: memory is loaded into I-MEM and PC (instruction pointer) points to address 0 on all threads. Therefore, the code must be built such that the entry point will be at zero.

Entry point flow is divided as follows:

  1. Reset code: common per-thread initialization (initialize registers and local memory)
  2. Global initialization code: primary thread initializes stuff and invokes the app's global setup handler. Other threads wait
  3. Local initialization code: primary thread wakes up other threads. All threads call the app's local setup handler
  4. All threads call the app's loop handler in an infinite loop (each invocation represents a single iteration)

Entry point flow

  1. Clear pipeline with nops

5 nops should be enough for now

  1. Initialize registers to zero

TODO: is this really needed? Can be removed if not

  1. Read thread ID from CR space (annotate as tid)

See hardware-software interface page for offset

  1. Reset stack pointer: sp = BASE + STACK_BASE + tid * STACK_SIZE + STACK_SIZE - 4
  2. Reset frame pointer: fp = sp
  3. Calculate GPC params offset: GPC_PARAMS = GPC_PARAMS_BASE * tid * GPC_PARAMS_SIZE
  4. Fill GPC params (see other calculations below)
  5. Reset global pointer: gp = GLOBAL_DATA_BASE + 4 * GPC_PARAMS_SIZE
  6. Reset thread pointer: tp = LOBAL_DATA_BASE + tid * LOCAL_DATA_SIZE

TODO: revise everything below (will we have this scheme in the end?)

  1. If bootstrap thread:
    1. Call global setup handler
    2. Wake up other threads
  2. Else:
    1. Wait until woken up
  3. Call local setup handler
  4. Call loop handler in an infinite loop

TODO: how should reset be implemented?

User code entry points

/**
 * Global setup entry point
 *
 * Invoked by a single thread on reset. Other threads are asleep
 *
 * @param gpc_params GPC parameters
 */
void gpc_global_setup(const gpc_params_t *gpc_params);

/**
 * Local setup entry point
 *
 * Executed once per thread after global setup
 *
 * @param gpc_params GPC parameters
 */
void gpc_local_setup(const gpc_params_t *gpc_params);

/**
 * Loop entry point: executed in an infinite loop after setup
 *
 * @param gpc_params GPC parameters
 */
void gpc_loop(const gpc_params_t *gpc_params);

GPC params

Name Offset Description
tid 0 Current thread ID
global_data 4 Global data base
global_data_size 8 Global data size
local_data_base 12 Local data base
local_data_size 16 Local data size
reserved[3] 20 Reserved fields (must be zero)

Opens and future ideas

  • Current design seems broken: there is probably no way to guarantee that all threads will wait until global initialization is finished (there is no way to ensure that shared memory is initialized when all threads reach that point). Perhaps it requires a change in hardware (e.g. start only one thread per core and let it start the others when ready)
  • Application data after warm reset: currently D_MEM and I_MEM don't change on warm reset. This breaks global data that is initialized on load time (e.g. global variables, stuff from ".rodata"), because their modified values will remain instead of the expected initial values