CAP: 0046-01 (formerly 0046)
Title: Soroban Runtime Environment
Working Group:
Owner: Graydon Hoare <@graydon>
Authors: Graydon Hoare <@graydon>
Consulted: Leigh McCulloch <@leighmcculloch>, Tomer Weller <@tomerweller>, Jon Jove <@jonjove>, Nicolas Barry <@MonsieurNicolas>, Thibault de Lacheze-Murel <@C0x41lch0x41>
Status: Final
Created: 2022-04-18
Discussion: https://groups.google.com/g/stellar-dev/c/X0oRzJoIr10
Protocol version: 20
This CAP specifies the lowest-level code execution and data model aspects of a WebAssembly-based (Wasm) "smart contract" system for the Stellar network, called Soroban. Wasm smart contract code runs as a guest inside of a virtual machine (VM) which is embedded in a host environment.
Higher-level components of a smart contract system such as ledger entries, host objects and host functions, and transactions to manage and invoke contracts will be specified in additional CAPs. This CAP focuses only on the lowest-level components.
No new operations or ledger entries are introduced in this CAP. Nothing observably changes in the protocol available to users. This CAP is best understood as a set of building blocks for later CAPs, introducing a vocabulary of concepts, data types and implementation components.
The design in this CAP is derived from a working and much more complete prototype that includes much that is left out of this CAP. This CAP is being proposed separately to facilitate early discussion of the building blocks, and to help decompose the inevitably-large volume of interrelated changes required for a complete smart contract system into smaller, more understandable pieces.
This protocol change was authored by Graydon Hoare, with input from the consulted individuals mentioned at the top of this document.
See the Soroban overview CAP.
The primary requirement for any smart contract system is to enable, within certain parameters, arbitrary new functionality to be added to a blockchain's state-transition function by users. This can be further decomposed to two requirements of concern in this CAP:
-
Code: stellar-core's state transition function must be extended with some means of executing, within parameters, some form of user-provided Turing-complete instruction code. Preferably in a compact form that can be stored within the ledger.
-
Data: stellar-core's model of data -- comprising transaction input, output, persistent state and temporary working memory -- must be extended to include data of concern to smart contracts: their input, output, persistent state, and temporary working memory during execution. Any transformations between each of these sorts of data must be specified, even if partially delegated to contract logic.
While the primary requirements seem simple enough to meet -- "just add a VM" -- there are many risks associated with a naive implementation. Therefore subsequent requirements take the form of parameters that constrain implementations in order to mitigate risks, including:
-
Secure: Soroban should be secure against benign or malicious smart contract code as well as contract-code input that could imperil system availability, integrity, or confidentiality (in the few cases where secret data exists). In particular at the level of this CAP, the design should guard against:
- The risk of resource exhaustion, leading to denial of service by validators.
- The risk of VM escape, leading to arbitrary Byzantine failures on validators, including data corruption or unauthorized transactions.
- The risk of side channels, allowing VM code to extract validator private keys or other secret data on validators.
- The risk of unintended contract behaviour due to invocation with malicious input data.
- The risk of unintended contract behaviour due to calls to or from malicious contracts.
-
Well-defined: Soroban should not compromise the network's bit-precise consensus or historical replay functions, and should have a well-defined and unambiguous semantics for any code or data added by users. Where possible this should be maintained by reference to existing, well-defined standards. In particular at the level of this CAP, the design should guard against:
- The risk of underspecified or nondeterministic VM code.
- The risk of underspecified or nondeterministic datatypes.
-
Performant: Soroban should not compromise the performance of the network, and should perform competitively with other smart contract systems. Users should not be subject to a significant performance penalty for using smart contracts instead of built-in transactions. In particular at the level of this CAP, the design should guard against:
- The risk of needing to load, compile, instantiate or run a large amount of VM code per transaction. Contracts should be small.
- The risk of contending on shared mutable data that may defeat parallel execution of transactions. Contracts should be isolated.
- The risk of requiring smart contract developers to do extensive optimization to achieve acceptable performance.
-
Interoperable: Soroban will necessarily introduce some new user-defined semantics which are by definition unknown to some users and 3rd parties. But beyond such necessary risks, Soroban should avoid introducing unnecessary hazards to interoperability, especially through choice of data encoding for input, output and persistent state. In particular at the level of this CAP, the design should guard against:
- The risk of being unable to share data between different contracts, or different versions of the same contract.
- The risk of being forced to write contracts in, or invoke contracts from, a single programming language.
- The risk of having no tools or only immature tools for working with any programming language targeting the VM.
- The risk of being unable to passively observe contract state for testing, debugging, diagnosis or monitoring.
- The risk of 3rd parties being unable to exchange data with contracts.
-
Simple: Soroban should be as simple as possible while achieving other requirements. It should not require excessive innovation or expensive engineering by either developers or users of stellar-core. Smart contracts are late in coming to the Stellar Network, there is plenty of prior art to draw from, and there is a limited window of time to complete the work. At the level of this CAP, the design should guard against:
- The risk of designing or implementing a novel VM, programming language, client library, or serialization format.
- The risk of selecting an existing platform that is incompatible with or causes major changes to stellar-core.
- The risk of delivering a system that is too challenging to learn for users or 3rd parties.
The specification consists of three parts:
-
A general description of the concepts of host and guest contexts, their relationships, constraints, and methods of implementation.
-
A specification of the new components that provide the host and guest contexts, their means of interaction, and their lifecycle phases.
-
A specification of the data model shared between host and guest.
This CAP specifies aspects of two separate but related contexts:
-
The host context: this consists of portions of the existing C++ code making up stellar-core that can be accessed by smart contracts, as well as some new C++ and Rust code implied by this CAP. New C++ and Rust code includes the implementation of a WebAssembly (Wasm) virtual machine, a set of host objects, and a host environment that contains and manages the lifecycle and interaction of the host objects and virtual machines. The host environment, like the rest of stellar-core, is compiled to native code and runs with full access to its enclosing operating system environment, the ledger, the network, etc. The term "host environment" here corresponds to the term with that name in the WebAssembly specification.
-
The guest context: this consists of Wasm code executed by a Wasm virtual machine embedded in the host environment. Guest code may originate in any programming language able to target Wasm, and will be provided by means unspecified in this CAP. Guest code has very limited access to its enclosing host environment: it can only consume CPU and memory resources to the extent that the host environment permits, and it can only call host functions that the host environment explicitly provides access to. The purpose of the guest context is to act as a so-called "sandbox" to attenuate potential harms caused by erroneous or malicious guest code, while allowing "just enough" programmability to satisfy the needs of users.
The guest and host contexts are provided by two new components added to stellar-core: a virtual machine and a host environment.
Code for a WebAssembly 1.0 virtual machine (VM) is embedded in stellar-core. The VM can be instantiated multiple times in the same stellar-core process, effectively supporting multiple separate guest contexts. The VM is configured with specific limits, and excludes support for any subsequent WebAssembly specification revisions or proposals.
Furthermore to limit potential nondeterminism risks (see below), floating point instructions are prohibited and any Wasm code that includes floating point instructions will not proceed past validation, but be rejected with an error.
Input guest code for a guest context is a single Wasm module in the specified Wasm binary format, and guest code will pass through all 4 semantic phases defined in the Wasm specification: decoding, validation, instantiation and execution. See the linked specification for details.
A new structure called a host environment is added to the transaction-processing subsystem of stellar-core. A host environment is a container carrying:
- Zero or more Wasm VMs.
- Any host objects that guest code in a Wasm VM can refer to.
- Any resource-accounting mechanisms for guest code.
- Any host functions that guest code in a Wasm VM can import.
- A set of in-memory XDR values called "storage", representing a portion of the ledger.
The interface between the host environment and guest code is very narrow and is defined as a subset of the Wasm specification of "embedding". A summary of some relevant aspects is repeated here:
-
Guest memory ("Wasm linear memory") is separated from host memory. The host may have a mechanism to access guest memory, but the guest has no mechanism to access host memory.
-
Wasm itself supports only 4 types of data value:
i32
,i64
,f32
, andf64
. To further simplify the interface, we restrict it to support exactly one type of data value:i64
. Everything Soroban passes back and forth between guest and host is encoded in one or morei64
values. The the bits comprising such ani64
may be interpreted in one of 3 ways depending on context: as a signed 64-bit 2s complement integer, as an unsigned 64-bit 2s complement integer, or as a polymorphic value type as described below in the "Data Model" section. -
Guest code modules carry a list of exported functions (that the guest provides and the host can call) and a list of imported "host functions" (that the host provides and the guest can call). Both imported and exported functions can only pass a sequence of
i64
parameters and return a singlei64
value, or a trap. The set of host functions available for import is detailed in CAP-0046-03 - Smart Contract Host Functions. -
Various error conditions may result in a guest trap condition, which is a terminal state for the Wasm VM running the guest code: no further VM execution can occur after it traps. A trap may be generated by guest code due to an execution error, or may be generated by a host function called from guest code. Therefore any call from guest to host or host to guest may produce a trap result rather than a value.
A host environment has its own lifecycle: it is created before any of the host objects or VMs it contains, and destroyed after any of the host objects or VMs it contains.
When a host environment is created, it contains no host objects and no VMs.
Adding a Wasm VM to a host environment involves passing Wasm code through the 4 lifecycle phases in the Wasm specification: decoding, validation, instantiation and invocation. If any phase fails, no further phases will be performed on the failed Wasm VM.
Multiple Wasm VMs can coexist in a single host environment. The intention is that one host environment and one Wasm VM will be created for an "outermost" invocation of a smart contract, and that "inner" contracts can be invoked by guest code calling a host function that constructs an additional VM and invokes a guest function in that new VM, within the same shared host environment. The specific mechanism of calling between contracts is not specified in this CAP.
Multiple Wasm VMs in the same host environment can refer to the same host objects: this is the mechanism for passing (immutable) information between different smart contracts.
The host environment's storage is initialized with some set of XDR objects loaded from the ledger. The set of XDR objects to load is statically declared by the transaction that causes instantiation of the host environment. After execution, when a host environment is being finalized, the modified portion of the host's storage is written back to the ledger. Between initialization and finalization, storage exists only in the host environment's memory. For more details on the semantics of storage see CAP 0046-05 Smart Contract Data.
The host maintains a per-transaction budget of CPU and memory resources, and as resources are consumed both by host functions and by the Wasm VM execution steps, the budget is reduced until it is exhausted. If the budget is exhausted before transaction completion, the host will trap with an error.
An important aspect of resource limiting is that it is performed against a deterministic model of the computational budget -- with "model" costs incrementally deducted from the budget model by explicit calls placed throughout the host function and VM code -- rather than by measuring real computational resources (time or memory) consumed during execution. This is necessary to maintain deterministic execution: any resource exhaustion that might occur must occur exactly the same way, at exactly the same instant, on every node in the Stellar network processing a Soroban transaction.
The detailed structure and logic for the budget is given in CAP-0046-10 - Smart Contract Budget Metering
Both guest code and any part of the host environment controlled by guest code must execute deterministically in response to inputs, and must be sufficiently well-specified that replaying historical guest code in an upgraded host environment (i.e. a new version of stellar-core) will produce observably-identical results. This includes the result of observable resource exhaustion within host-controlled CPU or memory limits, which implies the need for careful resource accounting on all guest-controlled actions.
The Wasm spec has carefully limited nondeterminism to a small set of cases, which we consider here:
- New features: only minor, fully deterministic Wasm features beyond the 1.0 spec are supported by Soroban. Specifically the
sign-ext
andmutable-globals
extensions, which are commonly included as target features in high level language compilers (eg. both Rust and C/C++ compilers). - Threads: not supported by Soroban.
- NaN-related behaviour for floating point: all floating point code is prohibited.
- SIMD-related behaviour: all SIMD extensions are prohibited.
- Environment-resource limit exhaustion: enforced through a deterministic budget model as discussed above.
This CAP defines a data model shared between guest and host environments. It consists of a set of values and a set of objects:
- Values can be packed into a 64-bit integer, and can therefore be easily passed back and forth between the host environment and guest code, as arguments or return values from imported or exported functions.
- Objects (also called "host objects") exist only in host memory, in the host context, and can only be referenced by guest code through values containing handles that refer to objects. If guest code wishes to perform an operation on a host object, it must call a host function with values containing handles that refer to any host object(s) to operate on.
Host objects are immutable: they cannot be changed once created. Any operation on a host object that implies a modification of the object's state will allocate a new object (with a new handle) containing the modified state, and return a value that refers to the new object by its new handle. Objects must therefore be relatively small. Objects are not necessarily unique; two objects may be equal (in the sense of containing the same data) but have different handles.
Values may also be considered "immutable" in some sense, but since they are typically machine primitives and any two equal values are indistinguishable, mutability or immutability is not a particularly meaningful concept for values.
The data model is specified in two separate forms:
- In XDR, for inclusion in serial forms such as transactions and ledger entries.
- In a set of "host types", of which the "host value type" is shared between host and guest.
The rationale for the two separate forms is given below, in the rationale section.
See the new XDR files in CAP-0046 - Soroban overview for a complete listing.
One XDR union type, and its variants, are worth discussing in this CAP: SCVal.
SCVal
is a new XDR type. Its name is short for smart contract ("SC") value. It is a general, polymorphic type in the sense that it is a union with many possible cases: numbers, strings, booleans, maps, vectors, error codes, and several special cases. It exists because many subsystems of the smart contract system, as well as many smart contracts themselves, must often act on values of interest to contracts without knowing their specific types ahead of time.
For example, the smart contract transaction invocation path must pass user-provided values to a contract and return values from a contract, and must do so generically without knowledge of the types of those values, so it accepts and returns SCVal
s. Similarly the smart contract storage system allows loading and storing SCVal
s in the ledger. And within a contract's own code, often some logic wishes to deal with values without knowing their precise type, such as forwarding values from one contract to another or extracting them from containers.
SCVal
is keyed by the enum SCValType
which has 22 variants. They are described in comments in Stellar-contract.x
.
The host value type -- in the Rust host and SDK code this is simply called Val
-- is a 64-bit integer carrying a bit-packed disjoint union of several cases, each identified by a different Tag
value.
The low 8 bits of a Val
are referred to as the tag and the remaining high 56 bits are referred to as the body. The tag's value determines the interpretation of the body. In some cases the body is itself further subdivided into 24 low bits, called the body's minor component, and 32 high bits, called the body's major component.
In other words, a value schematically looks like one of the following two cases:
bit 64 56 48 40 32 24 16 8 0
+-------+-------+-------+-------+-------+-------+-------+-------+
| body | tag |
+-------+-------+-------+-------+-------+-------+-------+-------+
bit 64 56 48 40 32 24 16 8 0
+-------+-------+-------+-------+-------+-------+-------+-------+
| major | minor | tag |
+-------+-------+-------+-------+-------+-------+-------+-------+
When accessing the body, the bit pattern may be considered as either a signed or unsigned 64-bit value. If signed, the body is extracted by a signed (arithmetic) right shift, properly sign-extending from 56 to 64 bits any negative values stored in the body. Similarly the major component may be treated as a signed or unsigned 32-bit integer. The minor component is only ever treated as an unsigned 32-bit integer, and is zero-extended from 24 to 32 bits on access.
The different cases of the XDR value type SCVal
are differentiated by the XDR enum SCValType
, which is subsequently encoded as Tag
s in a Val
, though the mapping is 1:N rather than 1:1. Specifically, for each 1 SCVal
case (i.e. SCValType
code) at the XDR level, there may be N (usually 1 or 2) different refinements of that type as a specialized Tag
case in the host value type, usually to enable a more compact representation when small special cases of SCVal
are projected into host values.
Tag
values are organized in two contiguous blocks:
- A low-valued block (initially between values 0 and 15 inclusive) that covers "small"
Val
s, where the entire semantic content of theVal
is contained in its body. - A high-valued block (initially between values 64 and 77 inclusive) that covers "object handle" values, where the body of the
Val
just carries an object handle in its "major" component.
The two blocks are kept separate to enable an efficient single-comparison Tag
test for all object handle values. The split between blocks happens at tag value 64 rather than 128 (as might be expected given the 8 bit range of Tag
) so that all initially assigned tags are less than 127, which is the maximum size of a single Wasm ULEB128 code unit (another minor space optimization). We anticipate the system will grow to support some additional tags in the future, but believe the available tag space will be sufficient to accommodate such growth.
The specific Tag
values are:
Tag::False = 0
, a refinement of theSCVal
case forSCV_BOOL
encoding just boolean false. The body is zero.Tag::True = 1
, a refinement of theSCVal
case forSCV_BOOL
encoding just boolean true. The body is zero.Tag::Void = 2
, corresponding to theSCVal
case forSCV_VOID
. The body is zero.Tag::Error = 3
, corresponding to theSCVal
case forSCV_ERROR
. The body takes the major/minor form:- The minor component is an "error type", one of the values of the XDR enumeration
SCErrorType
. - The major component is an "error code":
- If the "error type" is
SCE_CONTRACT
, the major component is theuint32
error code in theSCE_CONTRACT
case ofSCError
, a contract-defined error code with no specific meaning to the runtime. - Otherwise the major component is the
SCErrorCode
value of the correspondingSCE_*
case ofSCError
.
- If the "error type" is
- The minor component is an "error type", one of the values of the XDR enumeration
Tag::U32Val = 4
, corresponding to theSCVal
case forSCV_U32
. The major component carries an unsigned 32-bit integer.Tag::I32Val = 5
, corresponding to theSCVal
case forSCV_I32
. The major component carries a signed 32-bit integer.Tag::U64Small = 6
, a refinement of theSCVal
case forSCV_U64
for unsigned 64-bit integer values that are small enough to fit in the 56 bits of theVal
's body without data loss. Specifically those values in the range from0
to0x00ff_ffff_ffff_ffff
inclusive.Tag::I64Small = 7
, a refinement of theSCVal
case forSCV_I64
for signed 64-bit integer values that are small enough to fit in the 56 bits of theVal
's body without data loss. Specifically thoseint64
values in the range from-36_028_797_018_963_968
to36_028_797_018_963_967
inclusive.Tag::TimepointSmall = 8
, the same asU64Small
but for theSCVal
case forSCV_TIMEPOINT
.Tag::DurationSmall = 9
, the same asU64Small
but for theSCVal
case forSCV_DURATION
.Tag::U128Small = 10
, the same asU64Small
but for theSCVal
case forSCV_U128
.Tag::I128Small = 11
, the same asI64Small
but for theSCVal
case forSCV_I128
.Tag::U256Small = 12
, the same asU64Small
but for theSCVal
case forSCV_U256
.Tag::I256Small = 13
, the same asI64Small
but for theSCVal
case forSCV_I256
.Tag::SymbolSmall = 14
, a refinement of theSCVal
case forSCV_SYMBOL
for small symbols up to 9 characters long. The body of theVal
contains between 0 and 9 characters, with each character encoded as a 6-bit, 1-based code that indexes into the 63-character repertoire allowed by the generalSCV_SYMBOL
type:[_0-9-A-Za-z]
. That is, the character_
is coded by the six bits0b00_0001
, the character0
is coded by the six bits0b00_0010
, and so on, with the final allowed characterz
coded by the six bits0b11_1111
. Then these 6-bit codes are packed into the 56 bit body such that the lowest 6 bits of the body always code for the last character in the symbol, and if the symbol is less than 9 characters long then the body's high bits are padded with all-zero 6-bit codes (this representation optimizes for encoding in Wasm's ULEB128 format).Tag::LedgerKeyContractInstance = 15
, a refinement of theSCVal
case forSCV_LEDGER_KEY_CONTRACT_INSTANCE
, a special value reserved for use as a key identifying contract instances in the storage system. The body is zero.Tag::U64Object = 64
, for object-handleVal
s referring to theSCVal
case forSCV_U64
, typically only used when theuint64
is larger than 56 bits and so cannot fit in aU64Small
, though small integers stored inU64Object
are legal. The body's major component is a 32-bit object handle, referring to a host object. The minor component is zero.Tag::I64Object = 65
, the same asU64Object
but for theSCVal
case forSCV_I64
.Tag::TimepointObject = 66
, the same asU64Object
but for theSCVal
case forSCV_TIMEPOINT
.Tag::DurationObject = 67
, the same asU64Object
but for theSCVal
case forSCV_DURATION
.Tag::U128Object = 68
, the same asU64Object
but for theSCVal
case forSCV_U128
.Tag::I128Object = 69
, the same asU64Object
but for theSCVal
case forSCV_I128
.Tag::U256Object = 70
, the same asU64Object
but for theSCVal
case forSCV_U256
.Tag::I256Object = 71
, the same asU64Object
but for theSCVal
case forSCV_I256
.Tag::BytesObject = 72
, for object-handleVal
s referring to theScVal
case forSCV_BYTES
.Tag::StringObject = 73
, for object-handleVal
s referring to theScVal
case forSCV_STRING
.Tag::SymbolObject = 74
, for object-handleVal
s referring to theScVal
case forSCV_SYMBOL
, typically only used when the symbol is longer than 9 characters, so cannot fit in aSymbolSmall
.Tag::VecObject = 75
, for object-handleVal
s referring to theScVal
case forSCV_VEC
.Tag::MapObject = 76
, for object-handleVal
s referring to theScVal
case forSCV_MAP
.Tag::AddressObject = 77
, for object-handleVal
s referring to theScVal
case forSCV_ADDRESS
.
The Rust code defining the Tag
datatype includes some additional symbolic names for the boundaries of the assigned tag codes, as well as a sentinel for unassigned tags, but these are not part of the interface specified by this CAP. All tag values not described above are reserved for future use.
There are many different host object types, and we refer to the disjoint union of all possible host object types as the host object type. This may be implemented in terms of a variant type, an object hierarchy, or any other similar mechanism in the host.
Every host object is held in host memory and cannot be accessed directly from guest code. Host objects can be referred to by host values in either host or guest code: specifically those values with tags between 64
and 77
inclusive refer to host objects by handle.
Host object handles are integers that identify host objects. They come in two forms: relative handles and absolute handles. Relative handles are, as their name suggests, only meaningful relative to a specific Wasm VM: they are indexes into an indirection table attached to each Wasm VM that maps relative handles to absolute handles. Absolute handles identify host objects within the host independently of any Wasm VM. When guest code running in a Wasm VM has a value of some object-handle type, it is always a relative handle. When guest code calls the host, any relative handle being passed is translated to an absolute handle, and when an absolute handle is returned from the host to the guest it is translated from an absolute to a relative handle. This way guests never see absolute handles, and cannot access any host objects that they have not explicitly been passed references to (eg. as invocation arguments or return values from host functions).
If a host object is accessed through an invalid handle -- a number that does not identify an object -- the access fails with an error.
If a host object is accessed through a value with a tag that does not match the actual type of the underlying host object, the access fails with an error. While not strictly necessary -- it would be possible to simply ignore the tag -- this helps catch coding errors. Similarly if a host function expects a host object handle argument with a specific tag, and is passed a value with a different tag, it is rejected with an error even if the object handle number is valid.
The specific operations that can be performed on each host object are defined by host functions, described in CAP-0046-03 - Smart Contract Host Functions.
Values and objects in the data model have a total order. When comparing two values A and B:
- If both values have an equal bit-pattern, their order is equal.
- If either value is an object-handle type, they are compared through object comparison (via the host function
obj_cmp
) as described below. - Otherwise A and B are both small-value types:
- If A's
Tag
differs from B'sTag
, they are ordered by numericTag
value (which, for small values, match the order of the corresponding XDRSCValType
s). - Otherwise A and B have the same
Tag
value:- If A and B have common tag
Tag::False
,Tag::True
,Tag::Void
, orTag::LedgerKeyContractInstance
, A and B are equal. - If A and B have common tag
Tag::Error
, A and B are ordered first by their minor components (the "error type"), then by their major components (the "error code"), both treated as unsigned 32-bit integers. - If A and B have common tag
Tag::U32Val
, A and B are ordered by their major components, treated as unsigned 32-bit integers. - If A and B have common tag
Tag::I32Val
, A and B are ordered by their major components, treated as signed 32-bit integers. - If A and B have common tag
Tag::U64Small
,Tag::U128Small
orTag::U256Small
, A and B are ordered by their bodies, treated as unsigned 64-bit integers. - If A and B have common tag
Tag::I64Small
,Tag::I128Small
orTag::I256Small
, A and B are ordered by their bodies, treated as signed 64-bit integers.
- If A and B have common tag
- If A's
Object comparison can be accessed by either guest or host: it is provided to guests as a host function obj_cmp
via the host environment interface. It performs a recursive structural comparison of objects, as well as values embedded in objects, using the following rules:
- If A and B have the same
Tag
value, they are directly compared as objects:- If A and B have common tag
Tag::VecObject
, they are ordered by lexicographic extension of the value order. - If A and B have common tag
Tag::MapObject
objects, they are ordered lexicographically as ordered vectors of (key, value) pairs. - If A and B have common tag
Tag::U64Object
,Tag::I64Object
,Tag::U128Object
,Tag::I128Object
,Tag::U256Object
orTag::I256Object
, they are ordered using the numerical order for those types. - If A and B have common tag
Tag::BytesObject
,Tag::StringObject
,Tag::SymbolObject
, orTag::Address
they are ordered (recursively) in the natural order of their corresponding XDR representations: lexicographically by structure field order, sequence order, union discriminant and structure field numerical orders.
- If A and B have common tag
- Otherwise only one of A or B are object handles:
- If either has tag
Tag::U64Small
and the other has tagTag::U64Object
, both are compared as their underlying unsigned 64-bit integers. - Similarly when comparing a combination of tags
Tag::I64Small
andTag::I64Object
, orTag::TimepointSmall
andTag::TimePoint
, orTag::DurationSmall
andTag::Duration
, orTag::U128Small
andTag::U128Object
, orTag::I128Small
andTag::I128Object
, orTag::U256Small
andTag::U256Object
, orTag::I256Small
andTag::I256Object
, a small-value case and large-value case of the same underlying numeric type are compared in terms of that underlying numeric type. - Similarly if either has tag
Tag::SymbolSmall
and the other has tagTag::SymbolObject
, both are compared lexicographically as the underlying sequence of characters in each symbol. - Otherwise some object type and an unrelated non-object type are being compared, so their actual values are ignored and they are compared by the numerical value of the
SCValType
of the un-refined XDRSCVal
type they represent (i.e. bothTag::I64Small
andTag::I64Object
are projected to theirSCValType
SCV_I64
for numerical code-comparison with theSCValType
of the other value).
- If either has tag
The following additional validity constraints are imposed on the XDR types. Values not conforming to these constraints are rejected during conversion to host form:
SCVal.sym
must consist only of the characters[_0-9A-Za-z]
and be no longer thanSCSYMBOL_LIMIT
(currently 32 characters).SCVal.map
andSCVal.vec
must not be empty (they are optional in the XDR only to enable type-recursion)SCVal.map
must be populated bySCMapEntry
pairs in increasingkey
-order, with no duplicate keys.
Conversion from an XDR SCVal
to a host value Val
is as follows:
- The
true
andfalse
cases ofSCV_BOOL
are separately encoded asVal
s withTag::True
orTag::False
, and zero bodies. - The
SCV_VOID
andSCV_LEDGER_KEY_CONTRACT_INSTANCE
cases are encoded asVal
s withTag::Void
andTag::LedgerKeyContractInstance
, respectively, and zero bodies. - The
SCV_ERROR
case is encoded as aVal
withTag::Error
, with theSCErrorType
stored in theVal
's minor component and the major component either storing:- The
uint32
in thecontractCode
field, if theSCError
is in caseSCE_CONTRACT
- Otherwise the numeric value of the
SCErrorCode
in thecode
field of all otherSCE_*
cases.
- The
- Case
SCV_U32
is encoded as aVal
withTag::U32
, with theu32
field stored in its major component. - Case
SCV_I32
is encoded as aVal
withTag::I32
, with thei32
field stored in its major component. - Cases
SCV_U64
,SCV_TIMEPOINT
,SCV_DURATION
,SCV_U128
,SCV_U256
are encoded by first considering whether the underlying numeric value, when considered as an unsigned 64-bit value, fits in 56 bits. If so, it is encoded as aVal
withTag::U64Small
,Tag::TimepointSmall
,Tag::DurationSmall
,Tag::U128Small
orTag::U256Small
respectively, with the small unsigned integer value packed into the body. Otherwise they are stored as new host objects and the handle to the object is stored in the major component of aVal
withTag::U64Object
,Tag::TimepointObject
,Tag::DurationObject
,Tag::U128Object
orTag::U256Object
respectively. - Similarly cases
SCV_I64
,SCV_I128
, andSCV_I256
are encoded either as the 56-bit body ofVal
s with their corresponding small value tagsTag::I64Small
,Tag::I128Small
orTag::I256Small
or as object handles in the 32-bit major component ofVal
s with their corresponding general object tagsTag::I64Object
,Tag::I128Object
,Tag::I256Object
depending on whether thir underlying numeric value, when considered as a signed 64 bit value, can be encoded in 56 bits without data loss. - Similarly case
SCV_SYMBOL
is bit-packed as 6 bit codes (as described above) in the body of aVal
withTag::SymbolSmall
if the symbols length is 9 characters or less, otherwise it's stored as a new host object with its handle stored in the major component of aVal
withTag::SymbolObject
. - Cases
SCV_BYTES
,SCV_STRING
andSCV_ADDRESS
are each stored unconditionally as new host object, with the object handle stored as the major component of aVal
withTag::Bytes
,Tag::String
,Tag::Map
,Tag::Vec
andTag::Address
respectively. EachSCVal
contained within themap
orvec
components of the container types, they are converted to host values recursively. - Case
SCV_VEC
unconditionally stores a new host object, with the object handle stored as the major component of aVal
withTag::Vec
, but only after recursively converting its containedSCVal
s toVal
s using the same rules specified here. In other words the host object stores a vector of convertedVal
s, not unconvertedSCVal
s. - Similarly case
SCV_MAP
unconditionally stores a new host object, with the object handle stored as the major component of aVal
withTag::Map
, and only after recursively converting its containedSCMapEntry
s to pairs ofVal
s using the same rules specified here. In other words the host object stores a vector of pairs of convertedVal
s, not unconvertedSCMapEntry
s orSCVal
s. - Cases
SCV_LEDGER_KEY_NONCE
andSCV_CONTRACT_INSTANCE
are reserved for host-managed storage keys, and are only ever represented in their XDR form. They therefore do not have corresponding cases inTag
, so attempted conversion toVal
fails with an error.
Conversion from a host value Val
to an XDR SCVal
is as follows:
Val
s withTag::True
orTag::False
are encoded as booleans inSCVal
caseSCV_BOOL
Val
s withTag::Void
andTag::LedgerKeyContractInstance
are encoded as the voidSCVal
casesSCV_VOID
andSCV_LEDGER_KEY_CONTRACT_INSTANCE
, respectively.Val
s with caseTag::Error
are encoded as caseSCV_ERROR
withSCError
cases chosen by theVal
's major component interpreted as anSCErrorType
:- In case
SCE_CONTRACT
, the minor component becomes theuint32
fieldcontractCode
- In all other
SCE_*
cases, the minor component becomes theSCErrorCode
fieldcode
- In case
Val
s withTag::U32
are encoded as caseSCV_U32
with theu32
field taken from theVal
's major component interpreted as an unsigned 32-bit integer.Val
s withTag::I32
are encoded as caseSCV_U32
with thei32
field taken from theVal
's major component interpreted as an signed 32-bit integer.Val
s withTag::U64Small
,Tag::TimepointSmall
,Tag::DurationSmall
,Tag::U128Small
, orTag::U256Small
are encoded asSCV_U64
,SCV_TIMEPOINT
,SCV_DURATION
,SCV_U128
andSCV_U256
with their numeric values taken from theVal
's body interpreted as an unsigned 64-bit integer.- Similarly,
Val
s withTag::I64Small
,Tag::I128Small
, orTag::I256Small
are encoded asSCV_I64
,SCV_I128
andSCV_I256
with their numeric values taken from theVal
's body interpreted as a signed 64-bit integer. Val
s withTag::SymbolSmall
are encoded asSCV_SYMBOL
with characters extracted from the sequence of characters bit-packed into the body of theVal
.Val
s that encode object handles are dereferenced and the underlying object is converted back to its uniqueSCVal
case:Tag::U64Object
toSCV_U64
,Tag::I64Object
toSCV_I64
,Tag::TimepointObject
toSCV_TIMEPOINT
,Tag::DurationObject
toSCV_DURATION
,Tag::U128Object
toSCV_U128
,Tag::I128Object
toSCV_I128
,Tag::U256Object
toSCV_U256
,Tag::I256Object
toSCV_I256
,Tag::SymbolObject
toSCV_SYMBOL
,Tag::BytesObject
toSCV_BYTES
,Tag::StringObject
toSCV_STRING
,Tag::VecObject
toSCV_VEC
,Tag::MapObject
toSCV_MAP
, andTag::Address
toSCV_ADDRESS
. As with conversion intoVal
, converting the container typesTag::Vec
andTag::Map
back toSCVal
s first recursively convert their containedVal
elements toSCVal
s, using the same rules described here.
WebAssembly was chosen as a basis for this CAP after extensive evaluation of alternative virtual machines. See "choosing wasm" for details, or the underlying stack selection criteria document.
Relative to requirements listed in this CAP, Wasm addresses many of them:
- Secure:
- Resource limits: Wasm has good (though not ideal) mechanisms for enforcing resource limits.
- VM escape and side channels: Wasm is designed as a secure sandbox and has a good security track record so far.
- Well-defined:
- Wasm has a rigorous formal semantics and conformance testsuite, it is well specified.
- Wasm's nondeterminism is narrowly circumscribed and this CAP excludes all cases.
- Performance:
- Code size: Wasm code is compact but low level, risks being large. The host-centric data model in this CAP minimizes code size.
- Optimization: stock compilers emit efficient Wasm code.
- Interoperable:
- Multi-language: many PLs have at least preliminary Wasm target support, though only a few are mature enough to use.
- Tool maturity: languages targeting Wasm -- especially Rust -- have high quality, mature tools.
- Simple:
- Non-novelty: Wasm is a complete, mature, well-supported spec with many off-the-shelf implementations to choose from.
- Compatibility: many Wasm interpreters are written in C++ and/or Rust, can be embedded easily in stellar-core.
- Learnability: Wasm is not as familiar as EVM but is relatively widely known and appears easy to learn.
The split between host value types (Val
s that can traverse the host/guest interface) and host objects (that remain on the host side, are identified only by handles, and are managed by host functions) is justified as a response to a number of observations we made when considering existing blockchains:
-
Many systems spend a lot of guest code footprint (time and space) implementing data serialization and deserialization to and from opaque byte arrays. This code suffers from a variety of problems:
- It is often to and from an opaque, non-standard or contract-specific format, making a contract's data difficult to browse or debug, and making SDKs that invoke contracts need to carry special code to serialize and deserialize data for the contract.
- It is often coupled to a specific version or layout of a data structure, such that data cannot be easily be migrated between versions of a contract.
- It requires that a contract potentially contains extra copies of serialization support code for the formats used by any contracts it calls.
- It is often intermixed with argument processing and contract logic, representing a significant class of security problems in contracts.
- It is usually unshared code: each contract implements its own copy of serialization and deserialization, and does so inefficiently in the guest rather than efficiently on the host.
-
Similarly, when guest code is CPU-intensive it is often performing numerical or cryptographic operations which would be better supported by a common library of efficient (native) host functions.
-
As of this writing, Wasm defines no standardized, mature, widely-supported mechanism of directly sharing code, which makes it impossible to reuse common guest functions needed by many contracts. Possibly in the future the Wasm component model may present such a mechanism for sharing code between modules, but at present it is still incomplete and not widely implemented. Sharing common host functions is comparatively straightforward, and much more so if we define a common data model on which host functions operate.
-
The more time is spent in the guest, the more the overall system performance depends directly on the speed of the guest VM's bytecode-dispatch mechanism (a.k.a. the VM's "inner loop"). By contrast, if the guest VM spends most of its time making a sequence of host calls, the bytecode-dispatch speed of the guest VM is less of a concern. This gives us much more flexibility in choice of VM, for example to choose simple, low-latency and comparatively-secure interpreters rather than complex, high-latency and fragile JITs.
Some systems mitigate these issues by providing byte-buffers of data to guests in a guaranteed input format, such as JSON. This eliminates some of the interoperability concerns but none of the efficiency concerns: the guest still spends too much time parsing input and building data structures.
Ultimately we settled on an approach in which the system will spend as little time in the guest as possible, and will furnish the guest with a rich enough repertoire of host objects that it should not need many or any of its own guest-local data structures. Our experience suggests that many guests will be able to run without a guest memory allocator at all.
There are various costs and benefits to this strategy. We compared in detail to many other blockchains with different approaches before settling on this one.
Costs:
- Larger host-object API attack surface to defend.
- Larger host-object API compatibility surface to maintain.
- More challenging task to quantify memory and CPU costs.
- More specification work to do defining host interface.
- Risks redundant work, guest may choose to ignore host objects.
Benefits:
- Much faster execution due to more logic being in natively-compiled host Rust code.
- Smaller guest input-parsing attack surfaces to defend.
- Smaller guest data compatibility surfaces to maintain.
- Much smaller guest code, minimizing storage and instantiation costs:
- Little or no code to serialize or deserialize data in guest.
- Little or no common memory-management or data structure code in guest.
- Auxiliary benefits from common data model:
- Easier to browse contract data by 3rd party tools.
- Easier to debug contracts by inspecting state.
- Easier to test contracts by generating / capturing data.
- Easier to pass data from one contract to another.
- Easier to use same data model from different source languages.
It is especially important to note that the (enlarged) attack and maintenance surfaces on the host are costs borne by Soroban's developers, while the (diminished) attack and maintenance surfaces are benefits that accrue to smart contract developers. We believe this is a desirable balance of costs and benefits, as contract developers are likely to significantly outnumber Soroban developers.
These are chosen based on two criteria:
- Reasonably-foreseeable use in a large number of smart contracts.
- Widely-available implementations with efficient immutable forms.
In addition, values are constrained by the ability to be packed into a 64-bit tagged disjoint union. Special cases for common small values such as symbols, booleans, integer types and error codes are provided on the basis of presumed utility in a variety of contexts.
The value repertoire includes signed and unsigned integer types as its sole number types:
- 32 and 64-bit types, as these are standard Wasm types and useful for most purposes
- 128-bit types, which are natively supported by Rust (the host and guest language Soroban ships with support for). This type is also large enough to act as a very high precision fixed-point number for currency calculations: 19 decimal digits on either side of the decimal point. As this is larger than the standard 18 decimal places used by default by Ethereum's ERC20 token standard, 128-bit integers are used by Soroban's native contract interface as a common type for expressing quantities.
- 256-bit types, which are useful for two distinct reasons:
- For interoperation with Ethereum or other 256-bit integer blockchains
- To store and operate on various cryptographic values as scalars: several hash functions and encryption functions use 256-bit values as inputs or outputs, and it is frequently convenient to perform 256-bit integer-arithmetic or bitwise operations when working with those functions.
Two additional integral-wrapper types -- Duration
and TimePoint
-- exist merely for the sake of avoiding errors and meaningful display formatting when working with time values (eg. to hint to a user interface to display a TimePoint
as 2023-08-24T04:00:18+00:00
rather than 1692874818
). Internally both types are u64
.
Floating-point arithmetic is disabled in the Wasm VM, and floating-point types are not used anywhere in the SCVal
value repertoire or the host interface, out of concern for nondeterminism and survey feedback from potential users that they would not be used.
Fixed-point arithmetic functions could potentially be provided in the host, but feedback during development indicated that most users would be doing fixed point calculations with the 128-bit type, which is expected to remain on the guest as a 128-bit guest arithmetic operation costs roughly the same amount of CPU work as a host call. Users are therefore encouraged to simply include their own fixed-point library code in contracts. Some support code for this may be added to the Soroban guest SDK.
Implementations of the map and vector object types are based on Rust's standard vector type, are always precisely sized to their data and immutable once constructed. The map type is a sorted vector of key-value pairs that is binary searched during map lookup, but otherwise lacks any advanced structure.
Earlier versions of this CAP suggested the use of container objects with "shared substructure" such as HAMTs, functional red-black trees or RRBs. These were used early in Soroban's development, but it was observed that most host objects were small due to pressure from the persistent storage system and transaction system, and the overhead of objects with shared substructure exceeded the cost of a simpler approach of merely duplicating objects in full every time they are modified. As a result, the simpler approach was adopted.
Containers are nonetheless converted from their XDR forms to internal forms. The host's internal form of an SCVec
is a vector of Val
host values, each only 64 bits, rather than a vector of arbitrarily large SCVal
s. Similar the host's internal form of an SCMap
is a map of pairs of Val
host values. In both cases this helps minimize the size overhead of the (frequently duplicated) host containers, and simplifies accounting for operations on them, since all Val
s within them are the same small size.
Three types in the SCVal
/ Val
repertoire are all variations on "a byte buffer":
Bytes
which carries no implication about its content. This is the most general type.String
which carries an implication that its content is text in some format (most likely UTF-8 unicode). No structure is mandated forString
but at a user-interface level it is often helpful to parse and display text differently from general byte sequences.Symbol
is likeString
but imposes additional constraints: a maximum size of 32 characters, and a repertoire of characters drawn from the set[a-zA-Z0-9_]
. The size limit is imposed to help supportSymbol
s in guest code without needing a heap allocator. The limited repertoire is chosen for several reasons:- It is visually unambiguous in many typefaces, and so reduces the security risks from confusible Unicode codepoints or non-canonical code sequences, which can result in
String
s that "look the same" but contain different bytes. - It has only 63 codes, which (combined with a code for null) is small enough to be packed into 6 bits, which in turn enables bit-packing small 9 character XDR
Symbol
s into the body of theSymbolSmall
case of the hostVal
type, an important space optimization asSymbol
s are relatively ubiquitous. - It is a widely-used repertoire in surveys of the ecosystem and legacy systems: it covers most program identifiers, such as datatype and function names, as well as most asset identifier codes.
- It is visually unambiguous in many typefaces, and so reduces the security risks from confusible Unicode codepoints or non-canonical code sequences, which can result in
It would be possible to store all data in memory in the host in its XDR format, but we choose instead to define a separate "host form" for both values and objects in this specification for the following reasons:
-
In the host form, values are bit-packed in order to fit in exactly 64 bits. This bit-packing is implemented in Rust code in the Soroban host (and partially available to Rust guest code) but many parts of it are host-specific, and quite delicate, and would in any case be undesirable to reimplement in every client SDK and data browser. In the XDR form, the various cases that make up the value union are represented in a standard XDR union, which is automatically supported by many languages' XDR bindings.
-
In the host form, objects and values are separated for reasons explained above, and their separation is mediated through object handles and the host environment that maps references to objects. In the XDR form, objects and values are not separated, because they should not be: there is no implicit context in which to resolve handles, and even if there were it would introduce a new category of potential handle-mismatch error in the serialized form to support it. Instead, in the XDR form values directly contain objects.
-
As mentioned above, containers in the host form are actually be more efficient and simpler to work with having been converted from containers of XDR
SCVal
s to containers of hostVal
s.
We considered the potential costs and benefits of immutable objects, and decided in favor of them.
Costs:
- More memory allocation.
- Risk of referring to an old/stale object rather than a fresh/new one.
Benefits:
- Reduced risk of error through mutating a shared object.
- Stable total order, for using structured values as map keys.
- Simple model of security: no covert channels, only passed values.
- Simple model for transactions: discard objects on rollback.
Since we expect smart contracts to run to completion very quickly, and then free all objects allocated, we do not consider the additional memory allocation cost a likely problem in practice. Furthermore as mentioned in the object-repertoire rationale above, most objects are small.
Therefore the only real risk we foresee is the increased risk of unintentionally referring to an old/stale object, and we believe this is outweighed by the reduced risk of unintentionally referring to a shared mutable object that it mutated through an alias.
The initial protocol upgrade to enable Soroban is outside the scope of this CAP, as it will simply enable Soroban transaction types where no previous Soroban transactions were allowed.
Subsequent protocol upgrades must be carefully managed to ensure compatibility. Specifically the following mechanisms will assist in maintaining compatibility across upgrades:
- Every contract must carry a custom Wasm section called
contractenvmetav0
. This section must contain the serialized bytes of a sequence of the XDR typeSCEnvMetaEntry
which is a union switching onSCEnvMetaEntryKind
that, initially, only contains a single possible caseSC_ENV_META_KIND_INTERFACE_VERSION
. This carries auint64
that defines an "interface version" of the contract, which encodes both a protocol version number (in the high 32 bits) and a prerelease number (in the low 32 bits). The prerelease number is only meaningful during Soroban's development and must be zero once Soroban is enabled. The SDK currently arranges to include this information automatically, based on the version of the Rustsoroban-env-common
crate it is compiled against. - A contract's protocol number indicates the minimum required protocol for a contract to run, and is checked by the host when instantiating the contract: instantiating a contract with an unsupported protocol number results in an error before execution.
- Extensions to the host interface will always be accompanied by a protocol change. This allows contracts to be deployed before they are fully supported, and to activate only when the network votes to support new features.
- If the host needs to intentionally deprecate or change the behaviour of any host function or any other aspect of the host interface, it should also accompany this change with a protocol change. Since historical ledgers always specify the protocol number they were recorded under, marking different ledgers with different protocols is the intended (and only reliable) way to enable the host to switch between different forms of logic, replaying old ledgers on old backward-compatibility logic and new ledgers on new logic.
- To minimize the risk of unintentional changes to the host's logic (and divergence among versions) entering the network due to, say, periodic software maintenance and dependency updates, the host is designed to support (and stellar-core is equipped to provide) multiversioning: to embed two full copies of the entire transitive tree of software dependencies of the host in process simultaneously, and to "switch over" between one version and another instantaneously, during a protocol upgrade. This allows delaying and then grouping together "all potentially risky" changes to dependencies until the next protocol-upgrade boundary, and then deploying them all simultaneously across the network. In other words, it is expected that the Soroban host will remain relatively static between protocol versions, only taking very minor updates that we have high certainty in the identical observable semantics of.
The process of safely upgrading the network with Soroban enabled is described in more detail in this document inside the stellar-core repository.
This CAP does not introduce any backward incompatibilities.
TBD. Performance evaluation is ongoing on in-progress implementation.
In order to describe the security implications of this CAP we use the STRIDE methodology. This is a common framework used in the industry to identify security threats. For each categories we use attack scenarios to better explain the threat.
- Spoofing: Attackers are able to let the system believe they are privileged users
- A logical vulnerability exists in the Wasm code of the smart contract and lets a standard user perform privileged tasks
- A logical vulnerability exists in a host function and leads to a failure in access control checks
- Tampering: Attackers are able to modify unauthorized data in the ledger database
- A write-anywhere vulnerability exists in the Wasm interpreter. A specially crafted Wasm code triggers this bug and lets a user write custom data in the host memory which then get reflected in the database
- A write-anywhere vulnerability exists in a host function. A smart-contract code calls the vulnerable host function and triggers the vulnerability. A user calls the smart-contract and uses it to write custom data in the host memory or directly in the database
- A logical vulnerability exists in the implementation of the serialization and deserialization of the data model. A smart-contract code instantiates specific objects on the host side and triggers the vulnerable part of the serializer to tamper with the data saved in the database
- Repudiation: Not applicable here
- Information disclosure: Attackers are able to access unauthorized information on the validators (secret seed for example), on the ledger database (other smart contract data) or guest memory data from another contract:
- A read-anywhere vulnerability exists in the Wasm interpreter. A specially crafted Wasm code triggers this vulnerability and lets a user read custom data in the host memory
- A read-anywhere vulnerability exists in a host function. A smart-contract code calls the vulnerable host function and triggers the vulnerability. A user calls the smart-contract and uses it to read custom data in the host memory
- During a smart contract execution a function from another smart contract is called. This call exploits a read-anywhere vulnerability in the access control checks of new contract data. This result in the caller contract being able to programmatically access the data of the callee contract. This is an issue for contracts like Oracles.
- Denial of Service: Network halts because consensus cannot be reached
- A logical vulnerability exists in the implementation which validates that only deterministic Wasm code is executed. A specially crafted Wasm code triggers this vulnerability and creates nondeterminism across the network
- A logical vulnerability exists in the implementation which compute the amount of gas needed to execute a smart-contract code. A smart-contract code exploits this vulnerability and requires too many computing resources for the validators, preventing them to close the ledger in an acceptable time frame
- Elevation of privilege: Attackers are able to execute non authorized code on the validators
- A code execution vulnerability exists in the Wasm interpreter. A specially crafted Wasm code triggers this vulnerability and lets a user execute code within the host context (stellar-core process)
- A code execution vulnerability exists in a host function. A smart-contract code calls the vulnerable host function and triggers the vulnerability. A user calls the smart-contract and uses it to execute code within the host context (stellar-core process)
TBD. See in-progress implementation.
An implementation is provided in two parts:
- The rs-soroban-env repository which contains three Rust crates defining:
-
soroban-env-host
: a Rust implementation of the host environment -soroban-env-guest
: a Rust interface for Rust guest code to interact with the host environment -soroban-env-common
: a set of definitions common to both - The stellar-core repository which contains (by reference) the XDR definitions above and provides an embedding of the
soroban-env-host
crate insidestellar-core
.