Preamble

CAP: 0046-01 (formerly 0046)
Title: Soroban Runtime Environment
Working Group:
    Owner: Graydon Hoare <@graydon>
    Authors: Graydon Hoare <@graydon>
    Consulted: Leigh McCulloch <@leighmcculloch>, Tomer Weller <@tomerweller>, Jon Jove <@jonjove>, Nicolas Barry <@MonsieurNicolas>, Thibault de Lacheze-Murel <@C0x41lch0x41>
Status: Final
Created: 2022-04-18
Discussion: https://groups.google.com/g/stellar-dev/c/X0oRzJoIr10
Protocol version: 20

Simple Summary

This CAP specifies the lowest-level code execution and data model aspects of a WebAssembly-based (Wasm) "smart contract" system for the Stellar network, called Soroban. Wasm smart contract code runs as a guest inside of a virtual machine (VM) which is embedded in a host environment.

Higher-level components of a smart contract system such as ledger entries, host objects and host functions, and transactions to manage and invoke contracts will be specified in additional CAPs. This CAP focuses only on the lowest-level components.

No new operations or ledger entries are introduced in this CAP. Nothing observably changes in the protocol available to users. This CAP is best understood as a set of building blocks for later CAPs, introducing a vocabulary of concepts, data types and implementation components.

The design in this CAP is derived from a working and much more complete prototype that includes much that is left out of this CAP. This CAP is being proposed separately to facilitate early discussion of the building blocks, and to help decompose the inevitably-large volume of interrelated changes required for a complete smart contract system into smaller, more understandable pieces.

Working Group

This protocol change was authored by Graydon Hoare, with input from the consulted individuals mentioned at the top of this document.

Motivation and Goals Alignment

See the Soroban overview CAP.

Requirements

Primary requirements

The primary requirement for any smart contract system is to enable, within certain parameters, arbitrary new functionality to be added to a blockchain's state-transition function by users. This can be further decomposed to two requirements of concern in this CAP:

Code: stellar-core's state transition function must be extended with some means of executing, within parameters, some form of user-provided Turing-complete instruction code. Preferably in a compact form that can be stored within the ledger.
Data: stellar-core's model of data -- comprising transaction input, output, persistent state and temporary working memory -- must be extended to include data of concern to smart contracts: their input, output, persistent state, and temporary working memory during execution. Any transformations between each of these sorts of data must be specified, even if partially delegated to contract logic.

Required parameters to mitigate risks

While the primary requirements seem simple enough to meet -- "just add a VM" -- there are many risks associated with a naive implementation. Therefore subsequent requirements take the form of parameters that constrain implementations in order to mitigate risks, including:

Secure: Soroban should be secure against benign or malicious smart contract code as well as contract-code input that could imperil system availability, integrity, or confidentiality (in the few cases where secret data exists). In particular at the level of this CAP, the design should guard against:
- The risk of resource exhaustion, leading to denial of service by validators.
- The risk of VM escape, leading to arbitrary Byzantine failures on validators, including data corruption or unauthorized transactions.
- The risk of side channels, allowing VM code to extract validator private keys or other secret data on validators.
- The risk of unintended contract behaviour due to invocation with malicious input data.
- The risk of unintended contract behaviour due to calls to or from malicious contracts.
Well-defined: Soroban should not compromise the network's bit-precise consensus or historical replay functions, and should have a well-defined and unambiguous semantics for any code or data added by users. Where possible this should be maintained by reference to existing, well-defined standards. In particular at the level of this CAP, the design should guard against:
- The risk of underspecified or nondeterministic VM code.
- The risk of underspecified or nondeterministic datatypes.
Performant: Soroban should not compromise the performance of the network, and should perform competitively with other smart contract systems. Users should not be subject to a significant performance penalty for using smart contracts instead of built-in transactions. In particular at the level of this CAP, the design should guard against:
- The risk of needing to load, compile, instantiate or run a large amount of VM code per transaction. Contracts should be small.
- The risk of contending on shared mutable data that may defeat parallel execution of transactions. Contracts should be isolated.
- The risk of requiring smart contract developers to do extensive optimization to achieve acceptable performance.
Interoperable: Soroban will necessarily introduce some new user-defined semantics which are by definition unknown to some users and 3rd parties. But beyond such necessary risks, Soroban should avoid introducing unnecessary hazards to interoperability, especially through choice of data encoding for input, output and persistent state. In particular at the level of this CAP, the design should guard against:
- The risk of being unable to share data between different contracts, or different versions of the same contract.
- The risk of being forced to write contracts in, or invoke contracts from, a single programming language.
- The risk of having no tools or only immature tools for working with any programming language targeting the VM.
- The risk of being unable to passively observe contract state for testing, debugging, diagnosis or monitoring.
- The risk of 3rd parties being unable to exchange data with contracts.
Simple: Soroban should be as simple as possible while achieving other requirements. It should not require excessive innovation or expensive engineering by either developers or users of stellar-core. Smart contracts are late in coming to the Stellar Network, there is plenty of prior art to draw from, and there is a limited window of time to complete the work. At the level of this CAP, the design should guard against:
- The risk of designing or implementing a novel VM, programming language, client library, or serialization format.
- The risk of selecting an existing platform that is incompatible with or causes major changes to stellar-core.
- The risk of delivering a system that is too challenging to learn for users or 3rd parties.

Abstract

The specification consists of three parts:

A general description of the concepts of host and guest contexts, their relationships, constraints, and methods of implementation.
A specification of the new components that provide the host and guest contexts, their means of interaction, and their lifecycle phases.
A specification of the data model shared between host and guest.

Specification

Context

This CAP specifies aspects of two separate but related contexts:

The host context: this consists of portions of the existing C++ code making up stellar-core that can be accessed by smart contracts, as well as some new C++ and Rust code implied by this CAP. New C++ and Rust code includes the implementation of a WebAssembly (Wasm) virtual machine, a set of host objects, and a host environment that contains and manages the lifecycle and interaction of the host objects and virtual machines. The host environment, like the rest of stellar-core, is compiled to native code and runs with full access to its enclosing operating system environment, the ledger, the network, etc. The term "host environment" here corresponds to the term with that name in the WebAssembly specification.
The guest context: this consists of Wasm code executed by a Wasm virtual machine embedded in the host environment. Guest code may originate in any programming language able to target Wasm, and will be provided by means unspecified in this CAP. Guest code has very limited access to its enclosing host environment: it can only consume CPU and memory resources to the extent that the host environment permits, and it can only call host functions that the host environment explicitly provides access to. The purpose of the guest context is to act as a so-called "sandbox" to attenuate potential harms caused by erroneous or malicious guest code, while allowing "just enough" programmability to satisfy the needs of users.

Components

The guest and host contexts are provided by two new components added to stellar-core: a virtual machine and a host environment.

Virtual Machine

Code for a WebAssembly 1.0 virtual machine (VM) is embedded in stellar-core. The VM can be instantiated multiple times in the same stellar-core process, effectively supporting multiple separate guest contexts. The VM is configured with specific limits, and excludes support for any subsequent WebAssembly specification revisions or proposals.

Furthermore to limit potential nondeterminism risks (see below), floating point instructions are prohibited and any Wasm code that includes floating point instructions will not proceed past validation, but be rejected with an error.

Input guest code for a guest context is a single Wasm module in the specified Wasm binary format, and guest code will pass through all 4 semantic phases defined in the Wasm specification: decoding, validation, instantiation and execution. See the linked specification for details.

Host environment

A new structure called a host environment is added to the transaction-processing subsystem of stellar-core. A host environment is a container carrying:

Zero or more Wasm VMs.
Any host objects that guest code in a Wasm VM can refer to.
Any resource-accounting mechanisms for guest code.
Any host functions that guest code in a Wasm VM can import.
A set of in-memory XDR values called "storage", representing a portion of the ledger.

Interface

The interface between the host environment and guest code is very narrow and is defined as a subset of the Wasm specification of "embedding". A summary of some relevant aspects is repeated here:

Guest memory ("Wasm linear memory") is separated from host memory. The host may have a mechanism to access guest memory, but the guest has no mechanism to access host memory.
Wasm itself supports only 4 types of data value: i32, i64, f32, and f64. To further simplify the interface, we restrict it to support exactly one type of data value: i64. Everything Soroban passes back and forth between guest and host is encoded in one or more i64 values. The the bits comprising such an i64 may be interpreted in one of 3 ways depending on context: as a signed 64-bit 2s complement integer, as an unsigned 64-bit 2s complement integer, or as a polymorphic value type as described below in the "Data Model" section.
Guest code modules carry a list of exported functions (that the guest provides and the host can call) and a list of imported "host functions" (that the host provides and the guest can call). Both imported and exported functions can only pass a sequence of i64 parameters and return a single i64 value, or a trap. The set of host functions available for import is detailed in CAP-0046-03 - Smart Contract Host Functions.
Various error conditions may result in a guest trap condition, which is a terminal state for the Wasm VM running the guest code: no further VM execution can occur after it traps. A trap may be generated by guest code due to an execution error, or may be generated by a host function called from guest code. Therefore any call from guest to host or host to guest may produce a trap result rather than a value.

Lifecycles

A host environment has its own lifecycle: it is created before any of the host objects or VMs it contains, and destroyed after any of the host objects or VMs it contains.

When a host environment is created, it contains no host objects and no VMs.

Adding a Wasm VM to a host environment involves passing Wasm code through the 4 lifecycle phases in the Wasm specification: decoding, validation, instantiation and invocation. If any phase fails, no further phases will be performed on the failed Wasm VM.

Multiple Wasm VMs can coexist in a single host environment. The intention is that one host environment and one Wasm VM will be created for an "outermost" invocation of a smart contract, and that "inner" contracts can be invoked by guest code calling a host function that constructs an additional VM and invokes a guest function in that new VM, within the same shared host environment. The specific mechanism of calling between contracts is not specified in this CAP.

Multiple Wasm VMs in the same host environment can refer to the same host objects: this is the mechanism for passing (immutable) information between different smart contracts.

Storage

The host environment's storage is initialized with some set of XDR objects loaded from the ledger. The set of XDR objects to load is statically declared by the transaction that causes instantiation of the host environment. After execution, when a host environment is being finalized, the modified portion of the host's storage is written back to the ledger. Between initialization and finalization, storage exists only in the host environment's memory. For more details on the semantics of storage see CAP 0046-05 Smart Contract Data.

Limits

The host maintains a per-transaction budget of CPU and memory resources, and as resources are consumed both by host functions and by the Wasm VM execution steps, the budget is reduced until it is exhausted. If the budget is exhausted before transaction completion, the host will trap with an error.

An important aspect of resource limiting is that it is performed against a deterministic model of the computational budget -- with "model" costs incrementally deducted from the budget model by explicit calls placed throughout the host function and VM code -- rather than by measuring real computational resources (time or memory) consumed during execution. This is necessary to maintain deterministic execution: any resource exhaustion that might occur must occur exactly the same way, at exactly the same instant, on every node in the Stellar network processing a Soroban transaction.

The detailed structure and logic for the budget is given in CAP-0046-10 - Smart Contract Budget Metering

Determinism

Both guest code and any part of the host environment controlled by guest code must execute deterministically in response to inputs, and must be sufficiently well-specified that replaying historical guest code in an upgraded host environment (i.e. a new version of stellar-core) will produce observably-identical results. This includes the result of observable resource exhaustion within host-controlled CPU or memory limits, which implies the need for careful resource accounting on all guest-controlled actions.

The Wasm spec has carefully limited nondeterminism to a small set of cases, which we consider here:

New features: only minor, fully deterministic Wasm features beyond the 1.0 spec are supported by Soroban. Specifically the sign-ext and mutable-globals extensions, which are commonly included as target features in high level language compilers (eg. both Rust and C/C++ compilers).
Threads: not supported by Soroban.
NaN-related behaviour for floating point: all floating point code is prohibited.
SIMD-related behaviour: all SIMD extensions are prohibited.
Environment-resource limit exhaustion: enforced through a deterministic budget model as discussed above.

Data Model

This CAP defines a data model shared between guest and host environments. It consists of a set of values and a set of objects:

Values can be packed into a 64-bit integer, and can therefore be easily passed back and forth between the host environment and guest code, as arguments or return values from imported or exported functions.
Objects (also called "host objects") exist only in host memory, in the host context, and can only be referenced by guest code through values containing handles that refer to objects. If guest code wishes to perform an operation on a host object, it must call a host function with values containing handles that refer to any host object(s) to operate on.

Immutability

Host objects are immutable: they cannot be changed once created. Any operation on a host object that implies a modification of the object's state will allocate a new object (with a new handle) containing the modified state, and return a value that refers to the new object by its new handle. Objects must therefore be relatively small. Objects are not necessarily unique; two objects may be equal (in the sense of containing the same data) but have different handles.

Values may also be considered "immutable" in some sense, but since they are typically machine primitives and any two equal values are indistinguishable, mutability or immutability is not a particularly meaningful concept for values.

Forms

The data model is specified in two separate forms:

In XDR, for inclusion in serial forms such as transactions and ledger entries.
In a set of "host types", of which the "host value type" is shared between host and guest.

The rationale for the two separate forms is given below, in the rationale section.

XDR changes

See the new XDR files in CAP-0046 - Soroban overview for a complete listing.

One XDR union type, and its variants, are worth discussing in this CAP: SCVal.

SCVal

SCVal is a new XDR type. Its name is short for smart contract ("SC") value. It is a general, polymorphic type in the sense that it is a union with many possible cases: numbers, strings, booleans, maps, vectors, error codes, and several special cases. It exists because many subsystems of the smart contract system, as well as many smart contracts themselves, must often act on values of interest to contracts without knowing their specific types ahead of time.

For example, the smart contract transaction invocation path must pass user-provided values to a contract and return values from a contract, and must do so generically without knowledge of the types of those values, so it accepts and returns SCVals. Similarly the smart contract storage system allows loading and storing SCVals in the ledger. And within a contract's own code, often some logic wishes to deal with values without knowing their precise type, such as forwarding values from one contract to another or extracting them from containers.

SCVal is keyed by the enum SCValType which has 22 variants. They are described in comments in Stellar-contract.x.

Host value type

The host value type -- in the Rust host and SDK code this is simply called Val -- is a 64-bit integer carrying a bit-packed disjoint union of several cases, each identified by a different Tag value.

Bit-packed representation

The low 8 bits of a Val are referred to as the tag and the remaining high 56 bits are referred to as the body. The tag's value determines the interpretation of the body. In some cases the body is itself further subdivided into 24 low bits, called the body's minor component, and 32 high bits, called the body's major component.

In other words, a value schematically looks like one of the following two cases:

bit   64      56      48      40      32      24      16       8       0
       +-------+-------+-------+-------+-------+-------+-------+-------+
       |                           body                        |  tag  |
       +-------+-------+-------+-------+-------+-------+-------+-------+


bit   64      56      48      40      32      24      16       8       0
       +-------+-------+-------+-------+-------+-------+-------+-------+
       |             major             |         minor         |  tag  |
       +-------+-------+-------+-------+-------+-------+-------+-------+

When accessing the body, the bit pattern may be considered as either a signed or unsigned 64-bit value. If signed, the body is extracted by a signed (arithmetic) right shift, properly sign-extending from 56 to 64 bits any negative values stored in the body. Similarly the major component may be treated as a signed or unsigned 32-bit integer. The minor component is only ever treated as an unsigned 32-bit integer, and is zero-extended from 24 to 32 bits on access.

Tag values

The different cases of the XDR value type SCVal are differentiated by the XDR enum SCValType, which is subsequently encoded as Tags in a Val, though the mapping is 1:N rather than 1:1. Specifically, for each 1 SCVal case (i.e. SCValType code) at the XDR level, there may be N (usually 1 or 2) different refinements of that type as a specialized Tag case in the host value type, usually to enable a more compact representation when small special cases of SCVal are projected into host values.

Tag values are organized in two contiguous blocks:

A low-valued block (initially between values 0 and 15 inclusive) that covers "small" Vals, where the entire semantic content of the Val is contained in its body.
A high-valued block (initially between values 64 and 77 inclusive) that covers "object handle" values, where the body of the Val just carries an object handle in its "major" component.

The two blocks are kept separate to enable an efficient single-comparison Tag test for all object handle values. The split between blocks happens at tag value 64 rather than 128 (as might be expected given the 8 bit range of Tag) so that all initially assigned tags are less than 127, which is the maximum size of a single Wasm ULEB128 code unit (another minor space optimization). We anticipate the system will grow to support some additional tags in the future, but believe the available tag space will be sufficient to accommodate such growth.

The specific Tag values are:

Tag::False = 0, a refinement of the SCVal case for SCV_BOOL encoding just boolean false. The body is zero.
Tag::True = 1, a refinement of the SCVal case for SCV_BOOL encoding just boolean true. The body is zero.
Tag::Void = 2, corresponding to the SCVal case for SCV_VOID. The body is zero.
Tag::Error = 3, corresponding to the SCVal case for SCV_ERROR. The body takes the major/minor form:
- The minor component is an "error type", one of the values of the XDR enumeration SCErrorType.
- The major component is an "error code":
  - If the "error type" is SCE_CONTRACT, the major component is the uint32 error code in the SCE_CONTRACT case of SCError, a contract-defined error code with no specific meaning to the runtime.
  - Otherwise the major component is the SCErrorCode value of the corresponding SCE_* case of SCError.
Tag::U32Val = 4, corresponding to the SCVal case for SCV_U32. The major component carries an unsigned 32-bit integer.
Tag::I32Val = 5, corresponding to the SCVal case for SCV_I32. The major component carries a signed 32-bit integer.
Tag::U64Small = 6, a refinement of the SCVal case for SCV_U64 for unsigned 64-bit integer values that are small enough to fit in the 56 bits of the Val's body without data loss. Specifically those values in the range from 0 to 0x00ff_ffff_ffff_ffff inclusive.
Tag::I64Small = 7, a refinement of the SCVal case for SCV_I64 for signed 64-bit integer values that are small enough to fit in the 56 bits of the Val's body without data loss. Specifically those int64 values in the range from -36_028_797_018_963_968 to 36_028_797_018_963_967 inclusive.
Tag::TimepointSmall = 8, the same as U64Small but for the SCVal case for SCV_TIMEPOINT.
Tag::DurationSmall = 9, the same as U64Small but for the SCVal case for SCV_DURATION.
Tag::U128Small = 10, the same as U64Small but for the SCVal case for SCV_U128.
Tag::I128Small = 11, the same as I64Small but for the SCVal case for SCV_I128.
Tag::U256Small = 12, the same as U64Small but for the SCVal case for SCV_U256.
Tag::I256Small = 13, the same as I64Small but for the SCVal case for SCV_I256.
Tag::SymbolSmall = 14, a refinement of the SCVal case for SCV_SYMBOL for small symbols up to 9 characters long. The body of the Val contains between 0 and 9 characters, with each character encoded as a 6-bit, 1-based code that indexes into the 63-character repertoire allowed by the general SCV_SYMBOL type: [_0-9-A-Za-z]. That is, the character _ is coded by the six bits 0b00_0001, the character 0 is coded by the six bits 0b00_0010, and so on, with the final allowed character z coded by the six bits 0b11_1111. Then these 6-bit codes are packed into the 56 bit body such that the lowest 6 bits of the body always code for the last character in the symbol, and if the symbol is less than 9 characters long then the body's high bits are padded with all-zero 6-bit codes (this representation optimizes for encoding in Wasm's ULEB128 format).
Tag::LedgerKeyContractInstance = 15, a refinement of the SCVal case for SCV_LEDGER_KEY_CONTRACT_INSTANCE, a special value reserved for use as a key identifying contract instances in the storage system. The body is zero.
Tag::U64Object = 64, for object-handle Vals referring to the SCVal case for SCV_U64, typically only used when the uint64 is larger than 56 bits and so cannot fit in a U64Small, though small integers stored in U64Object are legal. The body's major component is a 32-bit object handle, referring to a host object. The minor component is zero.
Tag::I64Object = 65, the same as U64Object but for the SCVal case for SCV_I64.
Tag::TimepointObject = 66, the same as U64Object but for the SCVal case for SCV_TIMEPOINT.
Tag::DurationObject = 67, the same as U64Object but for the SCVal case for SCV_DURATION.
Tag::U128Object = 68, the same as U64Object but for the SCVal case for SCV_U128.
Tag::I128Object = 69, the same as U64Object but for the SCVal case for SCV_I128.
Tag::U256Object = 70, the same as U64Object but for the SCVal case for SCV_U256.
Tag::I256Object = 71, the same as U64Object but for the SCVal case for SCV_I256.
Tag::BytesObject = 72, for object-handle Vals referring to the ScVal case for SCV_BYTES.
Tag::StringObject = 73, for object-handle Vals referring to the ScVal case for SCV_STRING.
Tag::SymbolObject = 74, for object-handle Vals referring to the ScVal case for SCV_SYMBOL, typically only used when the symbol is longer than 9 characters, so cannot fit in a SymbolSmall.
Tag::VecObject = 75, for object-handle Vals referring to the ScVal case for SCV_VEC.
Tag::MapObject = 76, for object-handle Vals referring to the ScVal case for SCV_MAP.
Tag::AddressObject = 77, for object-handle Vals referring to the ScVal case for SCV_ADDRESS.

The Rust code defining the Tag datatype includes some additional symbolic names for the boundaries of the assigned tag codes, as well as a sentinel for unassigned tags, but these are not part of the interface specified by this CAP. All tag values not described above are reserved for future use.

Host object type(s)

There are many different host object types, and we refer to the disjoint union of all possible host object types as the host object type. This may be implemented in terms of a variant type, an object hierarchy, or any other similar mechanism in the host.

Every host object is held in host memory and cannot be accessed directly from guest code. Host objects can be referred to by host values in either host or guest code: specifically those values with tags between 64 and 77 inclusive refer to host objects by handle.

Host object handles are integers that identify host objects. They come in two forms: relative handles and absolute handles. Relative handles are, as their name suggests, only meaningful relative to a specific Wasm VM: they are indexes into an indirection table attached to each Wasm VM that maps relative handles to absolute handles. Absolute handles identify host objects within the host independently of any Wasm VM. When guest code running in a Wasm VM has a value of some object-handle type, it is always a relative handle. When guest code calls the host, any relative handle being passed is translated to an absolute handle, and when an absolute handle is returned from the host to the guest it is translated from an absolute to a relative handle. This way guests never see absolute handles, and cannot access any host objects that they have not explicitly been passed references to (eg. as invocation arguments or return values from host functions).

If a host object is accessed through an invalid handle -- a number that does not identify an object -- the access fails with an error.

If a host object is accessed through a value with a tag that does not match the actual type of the underlying host object, the access fails with an error. While not strictly necessary -- it would be possible to simply ignore the tag -- this helps catch coding errors. Similarly if a host function expects a host object handle argument with a specific tag, and is passed a value with a different tag, it is rejected with an error even if the object handle number is valid.

The specific operations that can be performed on each host object are defined by host functions, described in CAP-0046-03 - Smart Contract Host Functions.

Comparison

Values and objects in the data model have a total order. When comparing two values A and B:

If both values have an equal bit-pattern, their order is equal.
If either value is an object-handle type, they are compared through object comparison (via the host function obj_cmp) as described below.
Otherwise A and B are both small-value types:
- If A's Tag differs from B's Tag, they are ordered by numeric Tag value (which, for small values, match the order of the corresponding XDR SCValTypes).
- Otherwise A and B have the same Tag value:
  - If A and B have common tag Tag::False, Tag::True, Tag::Void, or Tag::LedgerKeyContractInstance, A and B are equal.
  - If A and B have common tag Tag::Error, A and B are ordered first by their minor components (the "error type"), then by their major components (the "error code"), both treated as unsigned 32-bit integers.
  - If A and B have common tag Tag::U32Val, A and B are ordered by their major components, treated as unsigned 32-bit integers.
  - If A and B have common tag Tag::I32Val, A and B are ordered by their major components, treated as signed 32-bit integers.
  - If A and B have common tag Tag::U64Small, Tag::U128Small or Tag::U256Small, A and B are ordered by their bodies, treated as unsigned 64-bit integers.
  - If A and B have common tag Tag::I64Small, Tag::I128Small or Tag::I256Small, A and B are ordered by their bodies, treated as signed 64-bit integers.

Object comparison can be accessed by either guest or host: it is provided to guests as a host function obj_cmp via the host environment interface. It performs a recursive structural comparison of objects, as well as values embedded in objects, using the following rules:

If A and B have the same Tag value, they are directly compared as objects:
- If A and B have common tag Tag::VecObject, they are ordered by lexicographic extension of the value order.
- If A and B have common tag Tag::MapObject objects, they are ordered lexicographically as ordered vectors of (key, value) pairs.
- If A and B have common tag Tag::U64Object, Tag::I64Object, Tag::U128Object, Tag::I128Object, Tag::U256Object or Tag::I256Object, they are ordered using the numerical order for those types.
- If A and B have common tag Tag::BytesObject, Tag::StringObject, Tag::SymbolObject, or Tag::Address they are ordered (recursively) in the natural order of their corresponding XDR representations: lexicographically by structure field order, sequence order, union discriminant and structure field numerical orders.
Otherwise only one of A or B are object handles:
- If either has tag Tag::U64Small and the other has tag Tag::U64Object, both are compared as their underlying unsigned 64-bit integers.
- Similarly when comparing a combination of tags Tag::I64Small and Tag::I64Object, or Tag::TimepointSmall and Tag::TimePoint, or Tag::DurationSmall and Tag::Duration, or Tag::U128Small and Tag::U128Object, or Tag::I128Small and Tag::I128Object, or Tag::U256Small and Tag::U256Object, or Tag::I256Small and Tag::I256Object, a small-value case and large-value case of the same underlying numeric type are compared in terms of that underlying numeric type.
- Similarly if either has tag Tag::SymbolSmall and the other has tag Tag::SymbolObject, both are compared lexicographically as the underlying sequence of characters in each symbol.
- Otherwise some object type and an unrelated non-object type are being compared, so their actual values are ignored and they are compared by the numerical value of the SCValType of the un-refined XDR SCVal type they represent (i.e. both Tag::I64Small and Tag::I64Object are projected to their SCValType SCV_I64 for numerical code-comparison with the SCValType of the other value).

Validity

The following additional validity constraints are imposed on the XDR types. Values not conforming to these constraints are rejected during conversion to host form:

SCVal.sym must consist only of the characters [_0-9A-Za-z] and be no longer than SCSYMBOL_LIMIT (currently 32 characters).
SCVal.map and SCVal.vec must not be empty (they are optional in the XDR only to enable type-recursion)
SCVal.map must be populated by SCMapEntry pairs in increasing key-order, with no duplicate keys.

Conversion

Conversion from an XDR SCVal to a host value Val is as follows:

The true and false cases of SCV_BOOL are separately encoded as Vals with Tag::True or Tag::False, and zero bodies.
The SCV_VOID and SCV_LEDGER_KEY_CONTRACT_INSTANCE cases are encoded as Vals with Tag::Void and Tag::LedgerKeyContractInstance, respectively, and zero bodies.
The SCV_ERROR case is encoded as a Val with Tag::Error, with the SCErrorType stored in the Val's minor component and the major component either storing:
- The uint32 in the contractCode field, if the SCError is in case SCE_CONTRACT
- Otherwise the numeric value of the SCErrorCode in the code field of all other SCE_* cases.
Case SCV_U32 is encoded as a Val with Tag::U32, with the u32 field stored in its major component.
Case SCV_I32 is encoded as a Val with Tag::I32, with the i32 field stored in its major component.
Cases SCV_U64, SCV_TIMEPOINT, SCV_DURATION, SCV_U128, SCV_U256 are encoded by first considering whether the underlying numeric value, when considered as an unsigned 64-bit value, fits in 56 bits. If so, it is encoded as a Val with Tag::U64Small, Tag::TimepointSmall, Tag::DurationSmall, Tag::U128Small or Tag::U256Small respectively, with the small unsigned integer value packed into the body. Otherwise they are stored as new host objects and the handle to the object is stored in the major component of a Val with Tag::U64Object, Tag::TimepointObject, Tag::DurationObject, Tag::U128Object or Tag::U256Object respectively.
Similarly cases SCV_I64, SCV_I128, and SCV_I256 are encoded either as the 56-bit body of Vals with their corresponding small value tags Tag::I64Small, Tag::I128Small or Tag::I256Small or as object handles in the 32-bit major component of Vals with their corresponding general object tags Tag::I64Object, Tag::I128Object, Tag::I256Object depending on whether thir underlying numeric value, when considered as a signed 64 bit value, can be encoded in 56 bits without data loss.
Similarly case SCV_SYMBOL is bit-packed as 6 bit codes (as described above) in the body of a Val with Tag::SymbolSmall if the symbols length is 9 characters or less, otherwise it's stored as a new host object with its handle stored in the major component of a Val with Tag::SymbolObject.
Cases SCV_BYTES, SCV_STRING and SCV_ADDRESS are each stored unconditionally as new host object, with the object handle stored as the major component of a Val with Tag::Bytes, Tag::String, Tag::Map, Tag::Vec and Tag::Address respectively. Each SCVal contained within the map or vec components of the container types, they are converted to host values recursively.
Case SCV_VEC unconditionally stores a new host object, with the object handle stored as the major component of a Val with Tag::Vec, but only after recursively converting its contained SCVals to Vals using the same rules specified here. In other words the host object stores a vector of converted Vals, not unconverted SCVals.
Similarly case SCV_MAP unconditionally stores a new host object, with the object handle stored as the major component of a Val with Tag::Map, and only after recursively converting its contained SCMapEntrys to pairs of Vals using the same rules specified here. In other words the host object stores a vector of pairs of converted Vals, not unconverted SCMapEntrys or SCVals.
Cases SCV_LEDGER_KEY_NONCE and SCV_CONTRACT_INSTANCE are reserved for host-managed storage keys, and are only ever represented in their XDR form. They therefore do not have corresponding cases in Tag, so attempted conversion to Val fails with an error.

Conversion from a host value Val to an XDR SCVal is as follows:

Vals with Tag::True or Tag::False are encoded as booleans in SCVal case SCV_BOOL
Vals with Tag::Void and Tag::LedgerKeyContractInstance are encoded as the void SCVal cases SCV_VOID and SCV_LEDGER_KEY_CONTRACT_INSTANCE, respectively.
Vals with case Tag::Error are encoded as case SCV_ERROR with SCError cases chosen by the Val's major component interpreted as an SCErrorType:
- In case SCE_CONTRACT, the minor component becomes the uint32 field contractCode
- In all other SCE_* cases, the minor component becomes the SCErrorCode field code
Vals with Tag::U32 are encoded as case SCV_U32 with the u32 field taken from the Val's major component interpreted as an unsigned 32-bit integer.
Vals with Tag::I32 are encoded as case SCV_U32 with the i32 field taken from the Val's major component interpreted as an signed 32-bit integer.
Vals with Tag::U64Small, Tag::TimepointSmall, Tag::DurationSmall, Tag::U128Small, or Tag::U256Small are encoded as SCV_U64, SCV_TIMEPOINT, SCV_DURATION, SCV_U128 and SCV_U256 with their numeric values taken from the Val's body interpreted as an unsigned 64-bit integer.
Similarly, Vals with Tag::I64Small, Tag::I128Small, or Tag::I256Small are encoded as SCV_I64, SCV_I128 and SCV_I256 with their numeric values taken from the Val's body interpreted as a signed 64-bit integer.
Vals with Tag::SymbolSmall are encoded as SCV_SYMBOL with characters extracted from the sequence of characters bit-packed into the body of the Val.
Vals that encode object handles are dereferenced and the underlying object is converted back to its unique SCVal case: Tag::U64Object to SCV_U64, Tag::I64Object to SCV_I64, Tag::TimepointObject to SCV_TIMEPOINT, Tag::DurationObject to SCV_DURATION, Tag::U128Object to SCV_U128, Tag::I128Object to SCV_I128, Tag::U256Object to SCV_U256, Tag::I256Object to SCV_I256, Tag::SymbolObject to SCV_SYMBOL, Tag::BytesObject to SCV_BYTES, Tag::StringObject to SCV_STRING, Tag::VecObject to SCV_VEC, Tag::MapObject to SCV_MAP, and Tag::Address to SCV_ADDRESS. As with conversion into Val, converting the container types Tag::Vec and Tag::Map back to SCVals first recursively convert their contained Val elements to SCVals, using the same rules described here.

Design Rationale

Rationale for Wasm

WebAssembly was chosen as a basis for this CAP after extensive evaluation of alternative virtual machines. See "choosing wasm" for details, or the underlying stack selection criteria document.

Relative to requirements listed in this CAP, Wasm addresses many of them:

Secure:
- Resource limits: Wasm has good (though not ideal) mechanisms for enforcing resource limits.
- VM escape and side channels: Wasm is designed as a secure sandbox and has a good security track record so far.
Well-defined:
- Wasm has a rigorous formal semantics and conformance testsuite, it is well specified.
- Wasm's nondeterminism is narrowly circumscribed and this CAP excludes all cases.
Performance:
- Code size: Wasm code is compact but low level, risks being large. The host-centric data model in this CAP minimizes code size.
- Optimization: stock compilers emit efficient Wasm code.
Interoperable:
- Multi-language: many PLs have at least preliminary Wasm target support, though only a few are mature enough to use.
- Tool maturity: languages targeting Wasm -- especially Rust -- have high quality, mature tools.
Simple:
- Non-novelty: Wasm is a complete, mature, well-supported spec with many off-the-shelf implementations to choose from.
- Compatibility: many Wasm interpreters are written in C++ and/or Rust, can be embedded easily in stellar-core.
- Learnability: Wasm is not as familiar as EVM but is relatively widely known and appears easy to learn.

Rationale for host value / host object split

The split between host value types (Vals that can traverse the host/guest interface) and host objects (that remain on the host side, are identified only by handles, and are managed by host functions) is justified as a response to a number of observations we made when considering existing blockchains:

Many systems spend a lot of guest code footprint (time and space) implementing data serialization and deserialization to and from opaque byte arrays. This code suffers from a variety of problems:
- It is often to and from an opaque, non-standard or contract-specific format, making a contract's data difficult to browse or debug, and making SDKs that invoke contracts need to carry special code to serialize and deserialize data for the contract.
- It is often coupled to a specific version or layout of a data structure, such that data cannot be easily be migrated between versions of a contract.
- It requires that a contract potentially contains extra copies of serialization support code for the formats used by any contracts it calls.
- It is often intermixed with argument processing and contract logic, representing a significant class of security problems in contracts.
- It is usually unshared code: each contract implements its own copy of serialization and deserialization, and does so inefficiently in the guest rather than efficiently on the host.
Similarly, when guest code is CPU-intensive it is often performing numerical or cryptographic operations which would be better supported by a common library of efficient (native) host functions.
As of this writing, Wasm defines no standardized, mature, widely-supported mechanism of directly sharing code, which makes it impossible to reuse common guest functions needed by many contracts. Possibly in the future the Wasm component model may present such a mechanism for sharing code between modules, but at present it is still incomplete and not widely implemented. Sharing common host functions is comparatively straightforward, and much more so if we define a common data model on which host functions operate.
The more time is spent in the guest, the more the overall system performance depends directly on the speed of the guest VM's bytecode-dispatch mechanism (a.k.a. the VM's "inner loop"). By contrast, if the guest VM spends most of its time making a sequence of host calls, the bytecode-dispatch speed of the guest VM is less of a concern. This gives us much more flexibility in choice of VM, for example to choose simple, low-latency and comparatively-secure interpreters rather than complex, high-latency and fragile JITs.

Some systems mitigate these issues by providing byte-buffers of data to guests in a guaranteed input format, such as JSON. This eliminates some of the interoperability concerns but none of the efficiency concerns: the guest still spends too much time parsing input and building data structures.

Ultimately we settled on an approach in which the system will spend as little time in the guest as possible, and will furnish the guest with a rich enough repertoire of host objects that it should not need many or any of its own guest-local data structures. Our experience suggests that many guests will be able to run without a guest memory allocator at all.

There are various costs and benefits to this strategy. We compared in detail to many other blockchains with different approaches before settling on this one.

Costs:

Larger host-object API attack surface to defend.
Larger host-object API compatibility surface to maintain.
More challenging task to quantify memory and CPU costs.
More specification work to do defining host interface.
Risks redundant work, guest may choose to ignore host objects.

Benefits:

Much faster execution due to more logic being in natively-compiled host Rust code.
Smaller guest input-parsing attack surfaces to defend.
Smaller guest data compatibility surfaces to maintain.
Much smaller guest code, minimizing storage and instantiation costs:
- Little or no code to serialize or deserialize data in guest.
- Little or no common memory-management or data structure code in guest.
Auxiliary benefits from common data model:
- Easier to browse contract data by 3rd party tools.
- Easier to debug contracts by inspecting state.
- Easier to test contracts by generating / capturing data.
- Easier to pass data from one contract to another.
- Easier to use same data model from different source languages.

It is especially important to note that the (enlarged) attack and maintenance surfaces on the host are costs borne by Soroban's developers, while the (diminished) attack and maintenance surfaces are benefits that accrue to smart contract developers. We believe this is a desirable balance of costs and benefits, as contract developers are likely to significantly outnumber Soroban developers.

Rationale for value and object type repertoires

These are chosen based on two criteria:

Reasonably-foreseeable use in a large number of smart contracts.
Widely-available implementations with efficient immutable forms.

In addition, values are constrained by the ability to be packed into a 64-bit tagged disjoint union. Special cases for common small values such as symbols, booleans, integer types and error codes are provided on the basis of presumed utility in a variety of contexts.

Numeric types

The value repertoire includes signed and unsigned integer types as its sole number types:

32 and 64-bit types, as these are standard Wasm types and useful for most purposes
128-bit types, which are natively supported by Rust (the host and guest language Soroban ships with support for). This type is also large enough to act as a very high precision fixed-point number for currency calculations: 19 decimal digits on either side of the decimal point. As this is larger than the standard 18 decimal places used by default by Ethereum's ERC20 token standard, 128-bit integers are used by Soroban's native contract interface as a common type for expressing quantities.
256-bit types, which are useful for two distinct reasons:
- For interoperation with Ethereum or other 256-bit integer blockchains
- To store and operate on various cryptographic values as scalars: several hash functions and encryption functions use 256-bit values as inputs or outputs, and it is frequently convenient to perform 256-bit integer-arithmetic or bitwise operations when working with those functions.

Two additional integral-wrapper types -- Duration and TimePoint -- exist merely for the sake of avoiding errors and meaningful display formatting when working with time values (eg. to hint to a user interface to display a TimePoint as 2023-08-24T04:00:18+00:00 rather than 1692874818). Internally both types are u64.

Floating-point arithmetic is disabled in the Wasm VM, and floating-point types are not used anywhere in the SCVal value repertoire or the host interface, out of concern for nondeterminism and survey feedback from potential users that they would not be used.

Fixed-point arithmetic functions could potentially be provided in the host, but feedback during development indicated that most users would be doing fixed point calculations with the 128-bit type, which is expected to remain on the guest as a 128-bit guest arithmetic operation costs roughly the same amount of CPU work as a host call. Users are therefore encouraged to simply include their own fixed-point library code in contracts. Some support code for this may be added to the Soroban guest SDK.

Container types

Implementations of the map and vector object types are based on Rust's standard vector type, are always precisely sized to their data and immutable once constructed. The map type is a sorted vector of key-value pairs that is binary searched during map lookup, but otherwise lacks any advanced structure.

Earlier versions of this CAP suggested the use of container objects with "shared substructure" such as HAMTs, functional red-black trees or RRBs. These were used early in Soroban's development, but it was observed that most host objects were small due to pressure from the persistent storage system and transaction system, and the overhead of objects with shared substructure exceeded the cost of a simpler approach of merely duplicating objects in full every time they are modified. As a result, the simpler approach was adopted.

Containers are nonetheless converted from their XDR forms to internal forms. The host's internal form of an SCVec is a vector of Val host values, each only 64 bits, rather than a vector of arbitrarily large SCVals. Similar the host's internal form of an SCMap is a map of pairs of Val host values. In both cases this helps minimize the size overhead of the (frequently duplicated) host containers, and simplifies accounting for operations on them, since all Vals within them are the same small size.

Buffer types

Three types in the SCVal / Val repertoire are all variations on "a byte buffer":

Bytes which carries no implication about its content. This is the most general type.
String which carries an implication that its content is text in some format (most likely UTF-8 unicode). No structure is mandated for String but at a user-interface level it is often helpful to parse and display text differently from general byte sequences.
Symbol is like String but imposes additional constraints: a maximum size of 32 characters, and a repertoire of characters drawn from the set [a-zA-Z0-9_]. The size limit is imposed to help support Symbols in guest code without needing a heap allocator. The limited repertoire is chosen for several reasons:
- It is visually unambiguous in many typefaces, and so reduces the security risks from confusible Unicode codepoints or non-canonical code sequences, which can result in Strings that "look the same" but contain different bytes.
- It has only 63 codes, which (combined with a code for null) is small enough to be packed into 6 bits, which in turn enables bit-packing small 9 character XDR Symbols into the body of the SymbolSmall case of the host Val type, an important space optimization as Symbols are relatively ubiquitous.
- It is a widely-used repertoire in surveys of the ecosystem and legacy systems: it covers most program identifiers, such as datatype and function names, as well as most asset identifier codes.

Rationale for separate XDR and host forms

It would be possible to store all data in memory in the host in its XDR format, but we choose instead to define a separate "host form" for both values and objects in this specification for the following reasons:

In the host form, values are bit-packed in order to fit in exactly 64 bits. This bit-packing is implemented in Rust code in the Soroban host (and partially available to Rust guest code) but many parts of it are host-specific, and quite delicate, and would in any case be undesirable to reimplement in every client SDK and data browser. In the XDR form, the various cases that make up the value union are represented in a standard XDR union, which is automatically supported by many languages' XDR bindings.
In the host form, objects and values are separated for reasons explained above, and their separation is mediated through object handles and the host environment that maps references to objects. In the XDR form, objects and values are not separated, because they should not be: there is no implicit context in which to resolve handles, and even if there were it would introduce a new category of potential handle-mismatch error in the serialized form to support it. Instead, in the XDR form values directly contain objects.
As mentioned above, containers in the host form are actually be more efficient and simpler to work with having been converted from containers of XDR SCVals to containers of host Vals.

Rationale for immutable objects

We considered the potential costs and benefits of immutable objects, and decided in favor of them.

Costs:

More memory allocation.
Risk of referring to an old/stale object rather than a fresh/new one.

Benefits:

Reduced risk of error through mutating a shared object.
Stable total order, for using structured values as map keys.
Simple model of security: no covert channels, only passed values.
Simple model for transactions: discard objects on rollback.

Since we expect smart contracts to run to completion very quickly, and then free all objects allocated, we do not consider the additional memory allocation cost a likely problem in practice. Furthermore as mentioned in the object-repertoire rationale above, most objects are small.

Therefore the only real risk we foresee is the increased risk of unintentionally referring to an old/stale object, and we believe this is outweighed by the reduced risk of unintentionally referring to a shared mutable object that it mutated through an alias.

Protocol Upgrade Transition

The initial protocol upgrade to enable Soroban is outside the scope of this CAP, as it will simply enable Soroban transaction types where no previous Soroban transactions were allowed.

Subsequent protocol upgrades must be carefully managed to ensure compatibility. Specifically the following mechanisms will assist in maintaining compatibility across upgrades:

Every contract must carry a custom Wasm section called contractenvmetav0. This section must contain the serialized bytes of a sequence of the XDR type SCEnvMetaEntry which is a union switching on SCEnvMetaEntryKind that, initially, only contains a single possible case SC_ENV_META_KIND_INTERFACE_VERSION. This carries a uint64 that defines an "interface version" of the contract, which encodes both a protocol version number (in the high 32 bits) and a prerelease number (in the low 32 bits). The prerelease number is only meaningful during Soroban's development and must be zero once Soroban is enabled. The SDK currently arranges to include this information automatically, based on the version of the Rust soroban-env-common crate it is compiled against.
A contract's protocol number indicates the minimum required protocol for a contract to run, and is checked by the host when instantiating the contract: instantiating a contract with an unsupported protocol number results in an error before execution.
Extensions to the host interface will always be accompanied by a protocol change. This allows contracts to be deployed before they are fully supported, and to activate only when the network votes to support new features.
If the host needs to intentionally deprecate or change the behaviour of any host function or any other aspect of the host interface, it should also accompany this change with a protocol change. Since historical ledgers always specify the protocol number they were recorded under, marking different ledgers with different protocols is the intended (and only reliable) way to enable the host to switch between different forms of logic, replaying old ledgers on old backward-compatibility logic and new ledgers on new logic.
To minimize the risk of unintentional changes to the host's logic (and divergence among versions) entering the network due to, say, periodic software maintenance and dependency updates, the host is designed to support (and stellar-core is equipped to provide) multiversioning: to embed two full copies of the entire transitive tree of software dependencies of the host in process simultaneously, and to "switch over" between one version and another instantaneously, during a protocol upgrade. This allows delaying and then grouping together "all potentially risky" changes to dependencies until the next protocol-upgrade boundary, and then deploying them all simultaneously across the network. In other words, it is expected that the Soroban host will remain relatively static between protocol versions, only taking very minor updates that we have high certainty in the identical observable semantics of.

The process of safely upgrading the network with Soroban enabled is described in more detail in this document inside the stellar-core repository.

Backwards Incompatibilities

This CAP does not introduce any backward incompatibilities.

Resource Utilization

TBD. Performance evaluation is ongoing on in-progress implementation.

Security Concerns

In order to describe the security implications of this CAP we use the STRIDE methodology. This is a common framework used in the industry to identify security threats. For each categories we use attack scenarios to better explain the threat.

Spoofing: Attackers are able to let the system believe they are privileged users
- A logical vulnerability exists in the Wasm code of the smart contract and lets a standard user perform privileged tasks
- A logical vulnerability exists in a host function and leads to a failure in access control checks
Tampering: Attackers are able to modify unauthorized data in the ledger database
- A write-anywhere vulnerability exists in the Wasm interpreter. A specially crafted Wasm code triggers this bug and lets a user write custom data in the host memory which then get reflected in the database
- A write-anywhere vulnerability exists in a host function. A smart-contract code calls the vulnerable host function and triggers the vulnerability. A user calls the smart-contract and uses it to write custom data in the host memory or directly in the database
- A logical vulnerability exists in the implementation of the serialization and deserialization of the data model. A smart-contract code instantiates specific objects on the host side and triggers the vulnerable part of the serializer to tamper with the data saved in the database
Repudiation: Not applicable here
Information disclosure: Attackers are able to access unauthorized information on the validators (secret seed for example), on the ledger database (other smart contract data) or guest memory data from another contract:
- A read-anywhere vulnerability exists in the Wasm interpreter. A specially crafted Wasm code triggers this vulnerability and lets a user read custom data in the host memory
- A read-anywhere vulnerability exists in a host function. A smart-contract code calls the vulnerable host function and triggers the vulnerability. A user calls the smart-contract and uses it to read custom data in the host memory
- During a smart contract execution a function from another smart contract is called. This call exploits a read-anywhere vulnerability in the access control checks of new contract data. This result in the caller contract being able to programmatically access the data of the callee contract. This is an issue for contracts like Oracles.
Denial of Service: Network halts because consensus cannot be reached
- A logical vulnerability exists in the implementation which validates that only deterministic Wasm code is executed. A specially crafted Wasm code triggers this vulnerability and creates nondeterminism across the network
- A logical vulnerability exists in the implementation which compute the amount of gas needed to execute a smart-contract code. A smart-contract code exploits this vulnerability and requires too many computing resources for the validators, preventing them to close the ledger in an acceptable time frame
Elevation of privilege: Attackers are able to execute non authorized code on the validators
- A code execution vulnerability exists in the Wasm interpreter. A specially crafted Wasm code triggers this vulnerability and lets a user execute code within the host context (stellar-core process)
- A code execution vulnerability exists in a host function. A smart-contract code calls the vulnerable host function and triggers the vulnerability. A user calls the smart-contract and uses it to execute code within the host context (stellar-core process)

Test Cases

TBD. See in-progress implementation.

Implementation

An implementation is provided in two parts:

The rs-soroban-env repository which contains three Rust crates defining: - soroban-env-host: a Rust implementation of the host environment - soroban-env-guest: a Rust interface for Rust guest code to interact with the host environment - soroban-env-common: a set of definitions common to both
The stellar-core repository which contains (by reference) the XDR definitions above and provides an embedding of the soroban-env-host crate inside stellar-core.

Files

cap-0046-01.md

Latest commit

History