Skip to content

Latest commit

 

History

History
178 lines (111 loc) · 7.24 KB

evm.asciidoc

File metadata and controls

178 lines (111 loc) · 7.24 KB

Ethereum Virtual Machine

What is it?

Compare to

  • Virtual Machine (Virtualbox, QEMU, cloud computing)

  • Java VM

Virtual Machine technologies such as Virtualbox and QEMU/KVM differ from the EVM in that their purpose is to provide hypervisor functionality, or a software abstraction that handles system calls, task scheduling, and resource management between a guest OS and the underlying host OS and hardware.

Certain aspects of the Java VM (JVM) specification however, do contain similarities to the EVM. From a high-level overview, the JVM is designed to provide a runtime environment that is irrespective of the underlying host OS or hardware, enabling compatibility across a wide variety of systems. High level program languages such as Java or Scala that run on the JVM are compiled into the respective instruction set bytecode. This is comparable to compiling a Solidity source file to run on the EVM.

EVM Machine Language (Bytecode Operations)

The EVM Machine Language is divided into specific instruction set groups, such as arithmetic operations, logical and comparison operations, control flow, system calls, stack operations, and memory operations. In addition to the typical bytecode operations, the EVM must also manage account information (i.e. address and balance), current gas price, and block information.

Common Stack Operations

Opcode instructions for stack and memory management:

POP     //Pop item off the stack
PUSH    //Push item on the stack
MLOAD   //Load item into memory
MSTORE  //Store item in memory
JUMP    //Alter the location of program counter (PC)
PC      //Program counter
MSIZE   //Active memory size
GAS     //Amount of available gas for transaction
DUP     //Stack item duplication
SWAP    //Stack item exchange operation
Common System Operations

Opcode instructions for the system executing the program:

CREATE  //Create a new account
CALL    //Instruction for message passing between accounts
RETURN  //Execution halt
REVERT  //Execution halt, reverting state changes
SELFDESTRUCT //Execution halt, and flag account for deletion
Arithmetic Operations

Common arithmetic opcode instructions:

ADD      //Add
MUL      //Multiplication
SUB      //Subtraction
DIV      //Integer division
SDIV     //Signed integer division
MOD      //Modulo (Remainder) operation
SMOD     //Signed modulo operation
ADDMOD   //Modulo addition
MULMOD   //Modulo multiplication
EXP      //Exponent operation
STOP     //Halt operation

State

As with any computing system, the concept of state is an important one. Just like a CPU keeping track of a process in execution, the EVM must keep track of the status of various components in order to support a transaction. The status or state of these components ultimately drives the level of change in the overarching blockchain. This aspect leads to the description of Ethereum as a transaction-based state machine containing the following components:

World State

A mapping between 160-bit address identifiers and account state, maintained in an immutable Merkle Patricia Tree data structure.

Account State

Contains the following four components:

  • nonce: A value representing either the number of transactions sent or the number of contracts created from this respective account.

  • balance: The number of Wei owned by the account address.

  • storageRoot: A 256-bit hash of the Merkle Patricia Tree’s root node.

  • codeHash:: An immutable hash value of the EVM code for the respective account.

Storage State

Account specific state information maintained at runtime on the EVM.

Block Information

The state values needed for a transaction include the following:

  • blockhash: The hash of the most recently completed blocks.

  • coinbase: The address of the recipient.

  • timestamp: The current block’s timestamp.

  • number: The current block’s number.

  • difficulty: The current block’s difficulty.

  • gaslimit: The current block’s gas-limit.

Runtime Environment Information: Information used to facilitate transactions.

  • gasprice: Current gas price, which is specified by the transaction initiator.

  • codesize: Size of the transaction codebase.

  • caller: Address of the account executing the current transaction.

  • origin: Address of the current transactions original sender.

State transitions are calculated with the following functions:

Ethereum State Transition Function

Used to calculate a valid state transition.

Block Finalization State Transition Function

Used to determine the state of a finalized block as part of the mining process, including block reward.

Block Level State Transition Function

The resulting state of the Block Finalization State Transition Function when applied to a transaction state.

Compiling Solidity to EVM bytecode

Compiling a Solidity source file to EVM bytecode can be accomplished via the command line. For a list of additional compile options, simply run the following command:

$ solc --help

Generating the raw opcode stream of a Solidity source file is easily achieved with the --opcodes command line option. This opcode stream leaves out some information (the --asm option produces the full information), but is sufficient for this first introduction. For example, compiling an example Solidity file Example.sol and populating the opcode output into a directory named BytecodeDir is accomplished with the following command:

$ solc -o BytecodeOutputDir --opcodes Example.sol

The output opcode files generated will depend on the specific contracts contained within the Solidity source file. Our simple Solidity file Example.sol has only one contract named "example".

pragma solidity ^0.4.19;

contract example {

  address contractOwner;

  function example() {
    contractOwner = msg.sender;
  }
}

If you look in the BytecodeDir directory, you will see the opcode file example.opcode which contains the EVM machine language instructions of the "example" contract. Opening up the example.opcode file in a text editor will show the following:

PUSH1 0x60 PUSH1 0x40 MSTORE CALLVALUE ISZERO PUSH2 0xF JUMPI PUSH1 0x0 DUP1
/*snip*/

Let’s examine the first two instructions:

PUSH1 0x60 PUSH1 0x40

Here we have the mnemonic "PUSH1" followed with a raw byte of value "0x60". This corresponds to the EVM instruction of interpreting the single byte following the opcode as a literal value and pushing it onto the stack. It is possible to push values of size up to 32 bytes onto the stack. For example, the following bytecode pushes a 4 byte value onto the stack:

PUSH4 0x7f1baa12

The second push opcode stores "0x40" onto the stack (on top of "0x60" already present there).

Moving on to the next two instructions:

MSTORE CALLVALUE

MSTORE is a stack/memory operation that saves a value to memory, while CALLVALUE is an opcode that returns the deposited value of the executing message call.

Execution of EVM bytecode

Gas, Accounting

For every transaction, there is an associated gas-limit and gas-price which make up the fees of an EVM execution. These fees are used to facilitate the necessary resources of a transaction, such as computation and memory. Gas is also used for the creation of accounts and smart-contracts.

Turing Completeness and Gas