[WIP] the Implementation of Parallel EVM 2.0 #30

setunapo · 2022-05-26T13:02:12Z

Description

This is part 2 of the implementation of BEP 130:Parallel Transaction Execution
The implementation part 1 can be found at: Parallel 1.0 Implementation

The motivation and architecture design could refer the BEP-130 document.
As noted in Parallel 1.0, Parallel 2.0 is a performance enhancement version, it tries to improve the performance based on the architecture of Parallel 1.0 by introducing more advanced methodologies.

Specification

Architecture

The architecture of Parallel 2.0 is based on Parallel 1.0, it only touches the execution layer, mainly state_processor.go and state_db.go, state_object.go, the architecture can be briefly described with 3 diagrams too:

Module
Pipeline
Lifecycle of transaction

Module

Here is the major components of Parallel 2.0

Pipeline

Pipeline of Parallel 2.0

Take 8 concurrency as an example for a brief view.

Post the Pipeline of Parallel 1.0 for comparison.

The pipeline of 1.0 and 2.0 are quite different. There are lots of changes, the most obvious changes include:

There is no waiting state, a transaction can be executed without waiting for its previous transaction.
A new shadow routine is created for each slot.
Stage1 & Stage2 are introduced.
A new routine is created: RT Conflict Check.
Conflict Check, Finalize, Merge are all moved into the main routine dispatcher.

Lifecycle of transaction

Lifecycle of Parallel 2.0

Post the lifecycle of 1.0 for comparison.

As the transaction lifecycle, the main differences are:

No dispatch IPC cost.
No waiting state.
UnconfirmedAccess.
LightCopy.
NewSlotDB is now moved to execution routine, while conflict detect & finalize are now in main dispatcher routine.

Introduce features of 2.0

Streaming Pipeline
If a Tx's execution stage(EVM) is completed, it doesn't need to wait for its previous transaction's merge result. The transaction can queue its result to the shadow slot and move on to execute the next transaction in the pending queue.
Operations of ConflictDetect, Finalize, Tx Result Merge will all be done by the main dispatcher. And each execution slot will have a shadow slot, it is a backup slot, do exactly the same job as the primary slot. Shadow slot is used to make sure redo can be scheduled ASAP.

Universal Unconfirmed State Access
It is very complicate, with unconfirmed state access, there will be a priority to access StateObject:
Self Dirty ➝ UnConfirmed Dirty ➝ Main StateDB(dirty, pending, snapshot and trie tree)
In a word, it should try best to get the desired information to reduce conflict rate.

Conflict Detect 2.0
In Parallel 1.0, the conflict detecter is a simple "double for loop" to see if two SlotDB has overlapped state change. We mark the execution result as conflicted if it reads a state which has been changed by other transactions within the conflict window.
But in Parallel 2.0, we do conflict check based on read, we no longer care about what has been changed, the only thing we should care is to check what we read is correct or not. We will keep the detail read record and compare with the main StateDB on conflict Detect. It is more straightforward and accurate.
And a new routine call Stage2ConfirmLoop is added to do conflict detect in advance, when most if the transactions have been executed at least once and it is configurable.

Parallel KV Conflict Detect
It is CPU consuming to do conflict check, especially the storage check. We have to go through all the read address, each address could have lots of KV read record. It is one of the bottlenecks right now, we do KV conflict detect to speed it up.

Memory: Shared Memory Pool&LightCopy&ZeroCopy
According to the memory analysis for the parallel 1.0, CopyForSlot will allocate 62K memory every time. Since the memory mostly is costed by the maps, we can use sync.Pool to manage all the maps. We can recycle the maps used by the slot db asynchronously when the block is committing.
Parallel 1.0 use DeepCopy for Copy-On-Write, it is cpu&memory consuming when the storage contains lots of KV elements. We replace it by LightCopy to avoid redundant memory copy of StateObject. With LightCopy, we do not copy any of the storage, actually it is not an option, but a must if we use UnConfirmed Reference , since the storage would be accessed from different unconfirmed DB, we can not simply copy all the KV elements of a single state object.
And we use map in sequential mode and sync.map in parallel mode for concurrent StateObject access.

Trie prefetch In Advance
Trie prefetch is key to save the cost of validation, we will do trie prefetch even for unconfirmed results to make sure the trie prefetch can be scheduled ASAP.

Dispatcher 2.0
Parallel 2.0 actually removed the dispatch action, the dispatch channel IPC is no longer needed. Dispatch 2.0 has 2 parts:
static dispatch & dynamic dispatch.
Static dispatch is done at the beginning of block process, it is responsible to make sure potential conflict transactions are dispatched to the same slot and try best to make workload balance between slots.
Dynamic dispatch is for runtime, there is a stolen mode when a slot finished its static dispatched tasks, it can steal a transaction from other busy slot.

Corner Case

The behavior of parallel is somehow different from sequential and there are corners cases we have to handle specially.

don't panic if there is anything wrong reading state
skip system address's balance check
handle WBNB contract to reduce conflict rate by balance make up, a new interface GetBalanceOpCode is added.

Performance Test

I setup 2 instance to test the performance benefit, with parallel number 8 and --pipecommit enabled.
The 2 instances use same hardware configuration, with 16 cores, 64G memory, 7T SSD,
It ran for ~50hours , The total block process(execution, validation, commit) cost reduce by ~20% -> ~50%, the benefits varies for difference block pattern.

1.features of 2.0: ** Streaming Pipeline ** Implement universal unconfirmed state db reference, try best to get account object state. ** New conflict detect, check based on what it has read. ** Do parallel KV conflict check for large KV read ** new Interface StateDBer and ParallelStateDB ** shared memory pool for parallel objects ** use map in sequential mode and sync.map in parallel mode for concurrent StateObject access ** replace DeepCopy by LightCopy to avoid redundant memory copy of StateObject ** do trie prefetch in advance ** dispatcher 2.0 Static Dispatch & Dynamic Dispatch Stolen mode for TxReq when a slot finished its static dispatched tasks RealTime result confirm in Stage2, when most if the tx have been executed at least once Make it configurable 2.Handle of corner case: ** don't panic if there is anything wrong reading state ** handle system address, skip its balance check ** handle WBNB contract to reduce conflict rate by balance make up WBNB balance makeup by GetBalanceOpCode & depth add a lock to fix WBNB make up concurrent crash add a new interface GetBalanceOpCode

setunapo added 7 commits May 21, 2022 21:28

code prune rd:1

ad6765a

code prune rd:2

4be068d

code prune rd:3

f5df794

code prune rd:4

04f6fe8

code prune rd:5

9628a04

code prune rd:6, for review comments

21723fd

setunapo requested review from lunarblock, NashBC, qinglin89, richardrich975 and worldisreal May 26, 2022 13:04

code prune rd:7, typo fixups

e675112

setunapo force-pushed the Parallel_2.0_based_onv1.1.10 branch from f907c9b to e675112 Compare May 27, 2022 03:32

setunapo changed the title ~~[WIP] the Implementaion of Parallel EVM 2.0~~ [WIP] the Implementation of Parallel EVM 2.0 May 27, 2022

brilliant-lx closed this Oct 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] the Implementation of Parallel EVM 2.0 #30

[WIP] the Implementation of Parallel EVM 2.0 #30

setunapo commented May 26, 2022 •

edited

Loading

[WIP] the Implementation of Parallel EVM 2.0 #30

[WIP] the Implementation of Parallel EVM 2.0 #30

Conversation

setunapo commented May 26, 2022 • edited Loading

Description

Specification

Architecture

Module

Pipeline

Lifecycle of transaction

Introduce features of 2.0

Corner Case

Performance Test

setunapo commented May 26, 2022 •

edited

Loading