Elbrus 2000 (Elbrus or e2k for short), is a SPARC-inspired VLIW architecture developed by the Moscow Center for SPARC Technology (MCST).
Elbrus machine code is organized into very long instruction words (VLIW), which consist of multiple so-called syllables that are executed together.
Several useful documents about Elbrus are available on the internet, albeit mostly in Russian.
- Руководство по эффективному программированию на платформе «Эльбрус» (elbrus-prog)
- Микропроцессоры и вычислительные комплексы семейства «Эльбрус»
- A series of articles about porting Embox to Elbrus: introduction, part 1, part 2, part 3
- Pictures of various Elbrus mainboards
Most operations in Elbrus code either:
- Take the values of one or more registers, compute a function, and write the result to another register, or
- Load a value from memory into a register or store a value from a register into memory.
The 256 general-purpose registers of the Register File (RF/РгФ) are divided into two categories:
- 224 registers are part of the procedure stack in a windowed way. They can become available or unavailable during procedure calls and returns. (See also elbrus-prog chapter 9.3.1.1)
- 32 registers are global registers. They are available during the whole runtime of a program.
32-bit | 64-bit | description |
---|---|---|
%g0 |
%dg0 |
Global register (0-31) |
%r0 |
%dr0 |
Procedure stack register, relative to start of current window |
%b[0] |
%db[0] |
Mobile base registers, relative to the start of the current window, plus BR |
TODO: last eight global registers are designated rotatable area
The procedure stack contains parameters and local data of procedures. Its top area is stored in the register file (RF). On overflow or underflow of the register file, its contents are automatically swapped in/out of memory. Launch of a new procedure allocates a window on the procedure stack, which may overlap with the calling procedure's window.
Stack of return addresses. It can only be manipulated by the operating system and the hardware. Its top area is stored in CF (chain file) registers.
On this stack the following information is encoded in two quad words:
- return address
- compilation unit index descriptor (CUIR)
- window base (wbs) in the register file
- presence of real 80 (?)
- predicate file
- user stack descriptor
- rotatable area base
- processor status register
On overflow or underflow of the chain file, its contents are automatically swapped in/out of memory.
Comparison operations produce one-bit results (true or false) that can be stored in the predicate registers.
Predicates can be used in conditional control transfers (jumps/calls), or in the conditional execution of individual operations.
There are 32 predicate registers in the predicate file, which appear as
%pred0
to %pred31
in assembly code.
Special purpose registers can be read using the rrs
and rrd
operations, and
writing using the rws
and rwd
operations.
Name | Description |
---|---|
CUIR | compilation unit index register, индекс дескрипторов модуля компиляции |
PSHTP | procedure stack hardware top pointer |
PSP | procedure stack pointer - contains the virtual base address of the procedure stack. |
WD | window descriptor - contains the base and the size of the current procedure's window into the procedure stack. |
PCSHTP | procedure chain stack hardware top pointer |
USBR | user stack base pointer, РгБСП |
USD | user stack descriptor, ДСП |
Elbrus' wide instructions (широкая команда, ШК) are comprised of a header syllable and zero or more additional syllables. Wide instructions are 8 byte aligned and up to 16 words (64 bytes) long.
Abbreviation | Description |
---|---|
HS | Header syllable - it encodes length and structure of a wide instruction |
SS | Stubs syllable - short operations that take only a few bits to encode |
ALS | Arithmetic logic channel syllable |
CS | Control syllable |
ALES | Arithmetic logic extension channel semi-syllable. They extend corresponding ALS. ALES2 and ALES5 are only available on Elbrus v4 and higher. |
AAS | Array access semi-syllable |
LTS | Literal syllable - literals to be used as operands |
PLS | Predicate logic syllable - processing of boolean values |
CDS | Conditional syllable - specified which operations are to be executed under which condition |
The first syllable is the header syllable. It is always present. Presence of other syllables depend on the purpose of the command. Syllables occur in the following order:
- HS
- SS
- ALS0, ALS1, ALS2, ALS3, ALS4, ALS5
- CS0
- ALES2, ALES5
- CS1
- ALES0, ALES1, ALES3, ALES4
- AAS0, AAS1, AAS2, AAS3, AAS4, AAS5
- LTS3, LTS2, LTS1, LTS0
- PLS2, PLS1, PLS0
- CDS2, CDS1, CDS0
Semi-syllables ALES and AAS are a half-word (2 bytes) long. All other syllables are one word (4 bytes) long.
Syllables SS, ALS* and CS0 occur as indicated in the header syllable in the order described above. They are packed, e.g. if header bits indicate presence of ALS0 and ALS2 but not SS nor ALS1, then the syllable ALS0 follows directly after HS and ALS2 follows directly after ALS0.
If presence of ALES2 or ALES5 is indicated, then a whole word is allocated for them, whether both are present or not. The first of both to be present occupies the more significant half of the word, the second is encoded in the less significant half. For example, when looking at the syllables as bytes, if ALES2 and ALES5 are present, then the first two bytes of the little endian word contain ALES5 and the last two bytes contain ALES2. If only ALES5 is present, the first two bytes are empty and the last two bytes contain ALES5.
CS1 may follow right after the previously described syllables.
ALES{0,1,3,4} and AAS* start at the word indicated by the "middle pointer" from the header syllable. Their ordering is the same as for ALES2 and ALES5 (high half first, low half second) but they are all packed. This means that any two syllables of ALES{0,1,3,4} and AAS{0,1} may share a word. ALES* may not share a word with AAS{2,3,4,5} because presence of the latter implies presence of AAS0 and/or AAS1. For example, if ALES0, ALES1, ALES4, AAS0 and AAS2 are indicated, then they are encoded as ALES1, ALES0, AAS0, ALES4, two bytes left empty, and finally AAS2.
LTS*, PLS* and CDS* are decoded starting from the end of the wide command. CDS* and PLS* are not indicated by individual flags but rather by their number. For example, there cannot be a PLS2 without a PLS0 and PLS1. LTS take any remaining words between the other syllables. For example, if after the AAS there are five words remaining in the wide command and two CDS and one PLS are indicated, then two words for LTS are left. They would be encoded as LTS1, LTS0, PLS0, CDS1, CDS0.
We do not know what happens if more syllables are indicated than there is space allocated or if syllables are encoded to overlap.
Bit | Name | Description |
---|---|---|
31 | ALS5 | arithmetic-logic syllable 5 presence |
30 | ALS4 | arithmetic-logic syllable 4 presence |
29 | ALS3 | arithmetic-logic syllable 3 presence |
28 | ALS2 | arithmetic-logic syllable 2 presence |
27 | ALS1 | arithmetic-logic syllable 1 presence |
26 | ALS0 | arithmetic-logic syllable 0 presence |
25 | ALES5 | arithmetic-logic extension syllable 5 presence |
24 | ALES4 | arithmetic-logic extension syllable 4 presence |
23 | ALES3 | arithmetic-logic extension syllable 3 presence |
22 | ALES2 | arithmetic-logic extension syllable 2 presence |
21 | ALES1 | arithmetic-logic extension syllable 1 presence |
20 | ALES0 | arithmetic-logic extension syllable 0 presence |
19:18 | PLS | number of predicate logic syllables |
17:16 | CDS | number of conditional execution syllables |
15 | CS1 | control syllable 1 presence |
14 | CS0 | control syllable 0 presence |
13 | set_mark | |
12 | SS | stub syllable presence |
11 | -- | unused |
10 | loop_mode | |
9:7 | nop | |
6:4 | Length of instruction, in multiples of 8 bytes, minus 8 bytes | |
3:0 | Number of words occupied by SS, ALS, CS, ALES2, ALES5 - called "middle pointer" |
Bit | Name | Description |
---|---|---|
31:30 | ipd | instruction prefetch depth |
29 | eap | end array prefetch |
28 | bap | begin array prefetch |
27 | srp | |
26 | vfdi | |
25 | crp (?) | |
24 | abgi | |
23 | abgd | |
22 | abnf | |
21 | abnt | |
20 | type | type is 0 for SF1 |
19 | abpf | |
18 | abpt | |
17 | alcf | |
16 | alct | |
15 | array access syllable 0 and 2 presence | |
14 | array access syllable 0 and 3 presence | |
13 | array access syllable 1 and 4 presence | |
12 | array access syllable 1 and 5 presence | |
11:10 | ctop | ctpr number used in control transfer (ct ) instructions |
9 | ? | |
8:0 | ctcond | condition code for control transfers (ct ) |
Bit | Name | Description |
---|---|---|
31:30 | ipd | instruction prefetch depth |
29:28 | encodes invts and flushts, see below | |
27 | srp (?) | |
26 | encodes invts and flushts, see below | |
25 | crp (?) | |
20 | type | type is 1 for SF2 |
4:0 | pred | pred num |
(ss >> 27 & 6) | (ss >> 26 & 1) |
Description |
---|---|
2 | invts |
3 | flushts |
6 | invts ? %predN |
7 | invts ? ~ %predN |
The condition code in the stubs syllable controls under which conditions a control transfer operation is executed.
Bit | description |
---|---|
4:0 | Predicate number (from pred0 to pred31 ) |
8:5 | Condition type |
Type | syntax | description |
---|---|---|
0 | -- | never |
1 | always | |
2 | ? %pred0 |
if predicate is true |
3 | ? ~ %pred0 |
if predicate is false |
4 | ? #LOOP_END |
|
5 | ? #NOT_LOOP_END |
|
6 | ? %pred0 || #LOOP_END |
|
7 | ? ~ %pred0 && #NOT_LOOP_END |
|
8 | (TODO, depends on syllable) | |
9 | (TODO, depends on syllable) | |
10 | (reserved) | |
11 | (reserved) | |
12 | (reserved) | |
13 | (reserved) | |
14 | ? ~ %pred0 || #LOOP_END |
|
15 | ? %pred0 && #NOT_LOOP_END |
#LOOP_END
and #NOT_LOOP_END
are sometimes spelled as %LOOP_END
and %NOT_LOOP_END
.
Bit | Description |
---|---|
31 | Speculative mode |
30:24 | Opcode |
23:16 | Operand src1, or opcode extension |
15:8 | Operand src2 |
7:0 | Operand src3, dst, or cmp opcode extension |
See chapter 'Arithmetic-logical operations' for more information on the operands.
Bit | Description |
---|---|
15:8 | Opcode2 |
7:0 | src3 (in ALEF1) or opcode extension 2 or cmp opcode extension (in ALEF2) |
CS0 and CS1 encode different operations.
Syllable | pattern | name | description |
---|---|---|---|
CS0, CS1 | 0xxxxxxx |
set* | setwd/setbn/setbp/settr |
CS1 | 1xxxxxxx |
vrfpsz | vrfpsz + setwd/setbn/setbp/settr |
CS0 | 2xxxxxxx |
puttsd | puttsd with a multiple-of-8 parameter relative to the start of the current instruction |
CS1 | 200000xx |
setei | |
CS1 | 28000000 |
setsft | |
CS0, CS1 | 300000xx |
wait | wait for specified kinds of operations to complete |
CS0 | 4xxxxxxx |
disp | prepare a relative jump in ctpr1 |
CS0 | 5xxxxxxx |
ldisp | prepare an array prefetch program (?) in ctpr1 |
CS0 | 6xxxxxxx |
sdisp | prepare a system call in ctpr1 |
CS0 | 70000000 |
return | prepare to return from procedure in ctpr1 |
CS0 | 8xxxxxxx+ |
-- | disp/ldisp/sdisp/return with ctpr2 |
CS0 | cxxxxxxx+ |
-- | disp/ldisp/sdisp/return with ctpr3 |
CS1 | 6xxxx000 |
setmas | Set memory address specifier for load and store operations |
The set* operation sets several parameters related to register windows. Most bits are encoded in the CS0 syllable itself, but some are also read from the LTS0 syllable.
According to ldis
, setwd is always performed, but settr, setbn, and setbp
have to be enabled by setting the corrsponding bits in CS0.
Syl. | bit | name | description |
---|---|---|---|
CS1 | 28 | enable vfrpsz | |
CS | 27 | enable settr | |
CS | 26 | enable setbn | |
CS | 25 | enable setbp | |
CS | 22:18 | setbp psz=x | |
CS | 17:12 | setbn rcur=x | |
CS | 11:6 | setbn rsz=x | |
CS | 5:0 | setbn rbs=x | |
LTS0 | 16:12 | vfrpsz rpsz=x | |
LTS0 | 11:5 | setwd wsz=x | |
LTS0 | 4 | setwd nfx=x | |
LTS0 | 3 | setwd dbl=x |
Bit | name | description |
---|---|---|
5 | ma_c |
wait for all previous memory access operations to complete |
4 | fl_c |
wait for all previous cache flush operations to complete |
3 | ls_c |
wait for all previous load operations to complete |
2 | st_c |
wait for all previous store operations to complete |
1 | all_e |
wait for all previous operations to issue all possible exceptions |
0 | all_c |
wait for all previous operations to complete |
The disp
operation prepares a jump to a different location by using one of
the control transfer preparation registers (ctpr1
to ctpr3
).
bit | description |
---|---|
31:30 | can be 1, 2, or 3 for ctpr1 , ctpr2 , or ctpr3 respectively |
29:28 | can be 0, 1, 2, or 3, for disp , ldisp , sdisp , or return respectively |
27:0 | offset or system call number |
For disp
and ldisp
, the offset is relative to the start of the current
instruction, and in multiples of eight bytes. For example, in an instruction at
0x1000
, with CS0=40000042
, we get disp %ctpr1, 0x1210
.
ldisp
is only allowed with ctpr2
.
For sdisp
, the system call number is not shifted. CS0=6000001a
is
sdisp %ctpr1, 0x1a
.
The return
operation doesn't take an offset. The offset field should be zero
in this case.
Memory address specifiers control multiple aspects of load and store operations. Their 7-bit format is described elsewhere.
The MAS can be independently specified for load and store operations, in CS1:
CS1 bits | description |
---|---|
27:21 | MAS for load operations |
20:14 | MAS for store operations |
Array prefetch instructions are run asynchronously on the array access unit.
They are always 16 bytes long.
To assemble array prefetch instructions, the mnemonic fapb
is used.
To call an array prefetch program, load its address with ldisp to %ctpr2 (no need to call or ct).
Even though array prefetch instructions should only ever be called by ldisp and are not processed using the same facilities as
regular instructions, they always seem to be terminated by a regular branch instruction.
The maximum length of an array prefetch program is 32 instructions.
ALU operations are generally identified by several aspects:
- The opcode field in the ALS
- If a corrsponding ALES exists, the opcode2 field in the ALES
- Opcode extension, opcode extension 2, and cmp opcode extension, depending on the opcode
- The ALUs in which the operation can be performed. Sometimes the same opcode can mean different operations in different ALUs (numbered from 0 to 5)
The format of an arithmetic-logical operation (ALOPF) is determined by opcode, channel, and presence of an ALES. The presence and location of additional identifying criteria of an operation as well as operands depend on the ALOPF.
Other variations:
- Some operations require two ALS
- Some operations require a Memory Address Specifier (MAS) in CS1
- Some operations have predicates. Some operations require additional data from CDS.
- ALOPF1, ALOPF2, ALOPF3, ALOPF7, ALOPF8 require no ALES, all others seem to require an ALES.
Field | encoded in | comment |
---|---|---|
opcode | ales[30:24] |
|
opcode2 | ales[15:8] |
|
opcode extension | als[23:16] |
|
opcode extension 2 | ales[7:0] |
|
cmp opcode extension | als[7:5] or ales[7:0] |
|
src1 | als[23:16] |
source operand 1 |
src2 | als[15:8] |
source operand 2 - can encode access to literal syllables (LTS) |
src3 | als[7:0] or ales[7:0] |
source operand 3 - for ALOPF3 and ALOPF13 it is in ALS, for ALOPF21 it is in ALES |
dst | als[7:0] , or als[4:0] for predicate registers |
destination register |
Pattern | Range | Description |
---|---|---|
0xxx xxxx | 00-7f | Rotatable area procedure stack register |
10xx xxxx | 80-bf | procedure stack register |
110x xxxx | c0-df | constant between 0 and 31 |
111x xxxx | e0-ff | global register |
src2 that are not status register numbers are encoded as follows:
Pattern | Range | Description |
---|---|---|
0xxx xxxx | 00-7f | Rotatable area procedure stack register |
10xx xxxx | 80-bf | procedure stack register |
1100 xxxx | c0-cf | constant between 0 and 15 |
1101 000x | d0-d1 | reference to 16 bit literal semi-syllable, low half of LTS0 or LTS1 |
1101 010x | d4-d5 | reference to 16 bit literal semi-syllable, high half of LTS0 and LTS1 |
1101 10xx | d8-db | reference to 32 bit literal syllable LTS0, LTS1, LTS2, or LTS3 |
1101 11xx | dc-de | reference to 64 bit literal syllable pair LTS1:LTS0, LTS2:LTS1, or LTS3:LTS2 |
111x xxxx | e0-ff | global register |
Literal half-syllables are sign-extended on access. Thus, values 0-0x7fff and 0xffff8000-0xffffffff (-0x8000 to -1) can be encoded in a literal half-syllable.
Pattern | Range | Description |
---|---|---|
0xxx xxxx | 00-7f | Rotatable area procedure stack register |
10xx xxxx | 80-bf | procedure stack register |
111x xxxx | e0-ff | global register |
dst that are not predicate register numbers or status register numbers are encoded as follows:
Pattern | Range | Description |
---|---|---|
0xxx xxxx | 00-7f | Rotatable area procedure stack register |
10xx xxxx | 80-bf | procedure stack register |
1100 1101 | cd | %tst |
1100 1110 | ce | %tc |
1100 1111 | cf | %tcd |
1101 0001 | d1 | %ctpr1 |
1101 0010 | d2 | %ctpr2 |
1101 0011 | d3 | %ctpr3 |
1101 1110 | de | %empty.lo |
1101 1111 | df | %empty.hi |
111x xxxx | e0-ff | global register |
Opcode2 | Name |
---|---|
0x01 | EXT |
0x02 | EXT1 |
0x03 | EXT2 |
0x04 | FLB |
0x05 | FLH |
0x06 | FLW |
0x07 | FLD |
0x08 | ICMB0 |
0x09 | ICMB1 |
0x0a | ICMB2 |
0x0b | ICMB3 |
0x0c | FCMB0 |
0x0d | FCMB1 |
0x0e | PFCMB0 |
0x0f | PFCMB1 |
0x10 | LCMBD0 |
0x11 | LCMBD1 |
0x12 | LCMBQ0 |
0x13 | LCMBQ1 |
0x16 | QPFCMB0 |
0x17 | QPFCMB1 |
Several operand formats are defined.
Format | Has ALES? | src1 | src2 | src3 | dst | opcode ext | opcode ext 2 | cmp opcode ext | Example | Comment |
---|---|---|---|---|---|---|---|---|---|---|
1 | x | x | x | adds, ld{b,h,w,d} | ||||||
2 | x | x | x | movx, popcnts | ||||||
3 | x | x | als[7:0] |
st{b,h,w,d} | ||||||
7 | x | x | x | als[7:5] |
cmposb | dst is a predicate register | ||||
8 | x | x | als[7:5] |
cctopo | dst is a predicate register | |||||
11 | x | x | x | x | x | muls | ||||
11 (with literal) | x | x | x | x | psllqh | These opcodes require a literal in ales[7:0] |
||||
12 | x | x | x | x | x | fsqrts | Opcode pshufh is special as it requires a literal in ales[7:0] . |
|||
13 | x | x | x | als[7:0] |
x | stq | ||||
15 | x | x | x | x | rws, rwd | dst is a status register; opcode2 is EXT; opcode extension 2 is 0xc0 | ||||
16 | x | x | x | x | rrs, rrd | src2 is a status register; opcode2 is EXT; opcode extension 2 is 0xc0 | ||||
17 | x | x | x | x | ales[7:0] |
pcmpeqbop | dst is a predicate register; opcode2 is EXT1 | |||
21 | x | x | x | ales[7:0] |
x | incs_fb | ||||
22 | x | x | x | x | x | movtq | opcode2 is EXT; ALES opcode extension is 0xc0 |
For the locations of operands where none is explicitly specified here, see table 'Operands and other fields'.
TODO: ALOPF5, ALOPF6, ALOPF7, ALOPF9, ALOPF10, ALOPF19
NOTE: ALOPF9 and ALOPF10 have a 16 bit opcode extension
The following tables are grouped by opcode2 and sorted by opcode.
Opcode | ALUs | name | ALS[23:16] | ALS[15:8] | ALS[7:0] | data width | description |
---|---|---|---|---|---|---|---|
0x00 | all | ands | src1 | src2 | dst | 32 bits | Compute bit-wise AND of src1 and src2, store result in dst |
0x01 | all | andd | src1 | src2 | dst | 64 bits | Compute bit-wise AND of src1 and src2, store result in dst |
0x10 | all | adds | src1 | src2 | dst | 32 bits | Compute bit-wise AND of src1 and src2, store result in dst |
0x11 | all | addd | src1 | src2 | dst | 64 bits | Compute bit-wise AND of src1 and src2, store result in dst |
0x24 | 25 | stb | src1 | src2 | src3 | 8 bits | store 8-bit value from src3 to address at src1+src2 |
0x25 | 25 | sth | src1 | src2 | src3 | 16 bits | store 16-bit value from src3 to address at src1+src2 |
0x26 | 25 | stw | src1 | src2 | src3 | 32 bits | store 32-bit value from src3 to address at src1+src2 |
0x26 | 0134 | bitrevs | 0xc0 | src2 | dst | 32 bits | |
0x27 | 25 | std | src1 | src2 | src3 | 64 bits | store 64-bit value from src3 to address at src1+src2 |
0x27 | 0134 | bitrevd | 0xc0 | src2 | dst | 64 bits | |
0x64 | 0235 | ldb | src1 | src2 | dst | 8 bits | load 8-bit value from address at src1+src2, store into dst |
0x65 | 0235 | ldh | src1 | src2 | dst | 16 bits | load 16-bit value from address at src1+src2, store into dst |
0x66 | 0235 | ldw | src1 | src2 | dst | 32 bits | load 32-bit value from address at src1+src2, store into dst |
0x67 | 0235 | ldd | src1 | src2 | dst | 64 bits | load 64-bit value from address at src1+src2, store into dst |
Opcode | ALUs | name | ALS[23:16] | ALS[15:8] | ALS[7:0] | ALES[7:0] | data width | description |
---|---|---|---|---|---|---|---|---|
0x58 | 0 | getsp | 0xec | src2 | dst | unused | 32 -> 64 | Add src2 to user stack pointer, store in user stack pointer and dst |