ADR-040: Storage and SMT State Commitments #8430

robert-zaremba · 2021-01-25T22:31:01Z

Description

In ADR-040 we propose a change of storage management without changing SDK store API. At the core of the proposed solution we decouple state commitments from state storage and propose to use LazyLedger SMT.

closes: #7100

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

Targeted PR against correct branch (see CONTRIBUTING.md)
Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
Code follows the module structure standards.
Wrote unit and integration tests
Updated relevant documentation (docs/) or specification (x/<module>/spec/)
Added relevant godoc comments.
Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
Re-reviewed Files changed in the Github PR explorer
Review Codecov Report in the comment section below once CI passes

tzdybal · 2021-01-26T09:14:55Z

docs/architecture/adr-040-storage-and-smt-state-commitments.md

+
+### Snapshots
+
+One of the Stargate core features are snapshots and fast sync. Currently this feature is implemented through IAVL.


I can't see a way to use snapshots/versions with BoltDB API. BoltDB provides "snapshot isolation", but do not support explicit creation/usage of snapshots.

That's a good point. We should double-check that the dbs mentioned below actually provide the desired snapshotting mechanism.

Thanks for checking. I'm not completely sure. We can check with the hashicorp team. I found this:

"If a long running read transaction (for example, a snapshot transaction) is needed, you might want to set DB.InitialMmapSize to a large enough value to avoid potential blocking of write transaction. " source

This sovled issue could be relevant, because waypoint is using boltDB and it support snapshots: Server Snapshot/Restore hashicorp/waypoint#682

I'm not sure if we should spend a time here to discuss the backend. I mean - we need to verify the proposals, but we should have a separate discussion about the the "recommended" DB backend.
So, what do you think about shifting the snapshot functionality to DB?

I'm wondering if using database snapshots to access old state roots is overkill, or even necessary for SMT.

By default, you can already access the old state roots in the LazyLedger SMT implementation, because the tree isn't garbage collected currently. Once garbage collection is added, it could be configured to only garbage collect state roots older than a certain version, which would be equivalent to snapshots, no?

"If a long running read transaction (for example, a snapshot transaction) is needed, you might want to set DB.InitialMmapSize to a large enough value to avoid potential blocking of write transaction. "

I saw this before, and I'm pretty sure it's about app-level logic executing app-level transaction (iteration over entire KV-store) to generate app-level snapshots.

I'm wondering if using database snapshots to access old state roots is overkill, or even necessary for SMT.

I think we discuss snapshots because of the "storage" part of proposal not "state commitment" part. For SMT, I also don't see a reason to use DB-level snapshots.

So, what do you think about shifting the snapshot functionality to DB?

I agree that it is more than reasonable to use capabilities of DB to ensure we can access&manage previous versions of state.

Good point.
We need to have old versions of SMT for snapshots. So, I see two options here:

Whenever we schedule a snapshot, we do it in a same way as for state snapshots: using DB builtin feature. If we need to create a new node, and orphan some other node, then we immediately remove orphan nodes. So, effectively, we won't have old roots because we will remove them immediately.

Extend the current LL SMT and add pruning.

PS: I assume in your current implementation, you don't update nodes - instead you create a new one, right? This allows you to keep all old versions. Why do you do that, instead of updating nodes?

We have to add pruning to LL SMT anyway, otherwise the state will keep growing.

PS: I assume in your current implementation, you don't update nodes - instead you create a new one, right? This allows you to keep all old versions. Why do you do that, instead of updating nodes?

Nodes are stored in a key => value store where the key is a hash of the node, and the value is the preimage of the hash (i.e. the hashes of the children of the node). Assuming the hashes are collision resistant, you can't "update" a node since the tree is immutable - you can only "create" a new tree with a new root.

But you have a path to the updated Leaf, so you can use it to remove old nodes:

Given key, we traverse the tree with path=hash(key)

While going down to a node, we can remember all nodes on the path

When going back and updating (or creating new ones) internal nodes, we can remove old node - we use the path we constructed in the step 2.

liamsi

Great start! Left some feedback while reding through it.

docs/architecture/adr-040-storage-and-smt-state-commitments.md

liamsi · 2021-01-26T10:28:21Z

docs/architecture/adr-040-storage-and-smt-state-commitments.md

+
+### Snapshots
+
+One of the Stargate core features are snapshots and fast sync. Currently this feature is implemented through IAVL.


That's a good point. We should double-check that the dbs mentioned below actually provide the desired snapshotting mechanism.

docs/architecture/adr-040-storage-and-smt-state-commitments.md

liamsi · 2021-01-26T10:37:21Z

docs/architecture/adr-040-storage-and-smt-state-commitments.md

+
+## Abstract
+
+Sparse Merke Tree (SMT) is a version of a Merkle Tree with various storage and performance optimizations. This ADR defines a separation of state commitments from data storage and the SDK transition from IAVL to SMT.


I'm wondering if this shouldn't in fact be two ADRs instead? One for separating storage and commitments and one about the SMT.

I was also thinking about it. But they are highly related - one cannot be done without other. Hence, I'm proposing here a general design and leave a space for future ADR for RDMS which will introduce SDK breaking changes.

Well we could separate the two with IAVL right? We don't need SMT for that AFAIK...

@aaronc, we could describe here only SMT, but it will only a half backed idea without a working solution:

keeping IAVL (in it's current implementation) with anything else doesn't make sense because we double the data.

the main value proposition here is to not store objects in SMT (we store only hashes).

Do you have something else in mind?

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

tzdybal · 2021-01-27T08:10:13Z

docs/architecture/adr-040-storage-and-smt-state-commitments.md

+
+### Snapshots
+
+One of the Stargate core features are snapshots and fast sync. Currently this feature is implemented through IAVL.


"If a long running read transaction (for example, a snapshot transaction) is needed, you might want to set DB.InitialMmapSize to a large enough value to avoid potential blocking of write transaction. "

I saw this before, and I'm pretty sure it's about app-level logic executing app-level transaction (iteration over entire KV-store) to generate app-level snapshots.

I'm wondering if using database snapshots to access old state roots is overkill, or even necessary for SMT.

I think we discuss snapshots because of the "storage" part of proposal not "state commitment" part. For SMT, I also don't see a reason to use DB-level snapshots.

So, what do you think about shifting the snapshot functionality to DB?

I agree that it is more than reasonable to use capabilities of DB to ensure we can access&manage previous versions of state.

docs/architecture/adr-040-storage-and-smt-state-commitments.md

Co-authored-by: Tomasz Zdybał <tomek@zdybal.lap.pl>

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

jackzampolin · 2021-02-03T19:00:51Z

Just found out about this work. Its very exciting. Where is the best place to find current state of this implementation and how that work is going?

robert-zaremba · 2021-02-03T21:32:41Z

@jackzampolin - here an the #8297 discssion.

The coding started and will be pushed to store/ll-smt branch

tzdybal · 2021-02-04T09:40:35Z

First SMT store PR is ready: #8507

hxrts · 2021-02-04T14:37:02Z

attn @jlandrews

docs/architecture/adr-040-storage-and-smt-state-commitments.md

colin-axner · 2021-04-29T16:37:29Z

docs/architecture/adr-040-storage-and-smt-state-commitments.md

+### Backwards Compatibility
+
+This ADR doesn't introduce any SDK level API changes.
+
+We change the storage layout of the state machine, a storage migration and network upgrade is required to incorporate these changes.
+


a storage migration and network upgrade is required to incorporate these changes.

Is this changing key paths as well?

Can you discuss backwards compatibility with respect to ics23? It sounds like SMT will be responsible maintaining compatibility? This is a little beyond my area of expertise but I guess my question is:

will SMT support ICS 23 (or does it already)?

will SMT require a different proof spec from the existing one?

I ask these questions, because changing how proofs are verified or the proof paths can possibly break IBC resulting in the best case an asynchronous network upgrade and in the worst case a synchronous network upgrade (all IBC chains need to synchronously upgrade to continue communication)

cc @AdityaSripal

Is this changing key paths as well?

Yes, it's essentially a hard fork.

will SMT support ICS 23?

It's not done - it's in the scope of work we defined few weeks ago. I will clarify it here.

will SMT require a different proof spec from the existing one?

No, it's a merkle tree after all.

proof paths can possibly break IBC

This sounds important. The proofs will change (because it will require a hard fork), however the verification will be the same. Shall we discuss the network upgrade mechanism in the ADR, or it's a separate discussion?
I think it should be following:

disable IBC

export chain

update binary and import genesis

enable IBC

Thanks for the response.

I'll defer to @AdityaSripal and @cwgoes to correct me, but I believe we need to consider the following cases:

IBC increases the complexity of upgrades because the counterparty needs to be informed of any changes in how the communication is being conducted.

Lets say we send a packet and construct a packet commitment under the path K. Normally the counterparty light client can verify a proof of the packet commitment, P, at some block height (ie VerifyProof(K, P)). In the code, our packet commitment proof path looks like this, a store prefix is also added before performing verification of this path. Until given further notice (unsupported atm), the counterparty will continue to assume the packet commit will only ever exist under key path K.

If the key paths are changing, then lets say K' represents the key path after the hard fork and P' represents the proof after the hard fork. My understanding is that K != K' but that P may equal P' (if the same value was to be proved at the same height, but just with different proof paths). K' would always fail on counterparties which are incapable of knowing the key path should be K'

Lets say we do the hard fork at height X, ie height X we are on the current code, height X + 1 we use SMT. The problem is all our counterparties need to be aware that at height X + 1 all key paths should use the new format. Not only do they need to be aware of that, but they need to be capable of using the new key paths for any proof heights >= X + 1. If the upgrade can be somewhat asynchronous, then chains need to be capable of supporting the old and new proof paths.

Re: disable/enable IBC

Unfortunately, there is no disabling/enabling IBC. We can disable/enable IBC transfers, but IBC itself is constantly moving forward. Clients may become expired if they are not successfully updated within a finite amount of time. This would result in funds essentially being "stuck". We do have recovery mechanisms, but ideally we coordinate a solution to avoid such a situation.

Currently, changing key paths/migrating IBC store is an unsupported upgrade, meaning existing IBC channels would be incapable of being used in the upgraded version. I believe it is entirely possible to add support for this, but we will need some time to specify the changes and implement/test them. In addition, all existing channels which cannot be lost between upgrades require all the counterparties to upgrade to the new IBC code which supports this upgrade route. Only then can the upgrade be performed without losing access to the channels.

Re: proof spec changes

I believe ics23 is a generic merkle proofs which uses proof specs for specific merkle tree representation. Changing the proof specs is an easier upgrade mechanism (still requires some coordination) than the above (it is supported), but this is would only be needed if it changed things like the hash function

This is a lot of unpack in a github comment and I think it will require more coordination/clarification - maybe best done in a live call. The primary concern is if a proof key path used in the new code cannot be verified using the old code (and vice versa). This requires the most difficult IBC upgrade path, but there may be clever tricks we can do to alleviate the difficulty over a period of time, if we are aware this is something that should be addressed in the near future

With regards to the ADR, if any the areas above will be effected, then it should be mentioned that these upgrade paths will need to be pursued and ibc-go may need to add more functionality, but the specific upgrade path itself is out of scope

Thanks for comment @colin-axner . I think we should move this to the ADR-40 discussion page, so let me answer it there: #8297 (comment)

TL;DR: we need to design a mechanism for IBC which will work support hard forks anyway. They are inevitable.

In my opinion, this should not block ADR-40, for me this is related, but more general problem: how to handle hard forks. That being said - it will block a release - we need to have a general solution before doing any hard fork.

robert-zaremba · 2021-04-30T10:36:16Z

It may be good to clarify how the state migration is expected to work.

We will need to do a hard fork and recreate the SMT when loading data. BTW: this is how all major updates have been done in the past.

roysc

I've suggested some edits to try to disambiguate references to historical versioning (which some DB engines call snapshots) vs. the snapshots used for ABCI state sync. Please correct me if I'm interpreting anything wrong, but I think it's important to be specific with the terminology, especially since the two concerns are related here.

I've also made comments and corrected some typos.

roysc · 2021-05-05T06:52:39Z

docs/architecture/adr-040-storage-and-smt-state-commitments.md

+Database snapshot is a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a snapshot mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.
+Some DB engines support snapshotting. Hence, we propose to reuse that functionality for the state sync and versioning (described below). It will the supported DB engines to ones which efficiently implement snapshots. In a final section we will discuss evaluated DBs.


Suggested change

Database snapshot is a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a snapshot mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.

Some DB engines support snapshotting. Hence, we propose to reuse that functionality for the state sync and versioning (described below). It will the supported DB engines to ones which efficiently implement snapshots. In a final section we will discuss evaluated DBs.

Database versioning provides a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a versioning mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.

Some DB engines support viewing past versions. Hence, we propose to reuse that functionality for the state sync snapshots and versioning (described below). It will limit the supported DB engines to ones which efficiently implement versioning. In a final section we will discuss DBs evaluated for this feature.

I'm defining database snapshot here, so I prefer to use snapshot mechanism here, so I prefer to keep the original language.

Okay I may have misunderstood. This is what some DBs call snapshots, and distinct from state sync snapshots as used in the ABCI, right? (although it can be used to implement ABCI snapshots)

Yes, here we are talking about a database engine mechanism.

docs/architecture/adr-040-storage-and-smt-state-commitments.md

roysc · 2021-05-05T07:18:13Z

docs/architecture/adr-040-storage-and-smt-state-commitments.md

+Database snapshot is a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a snapshot mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.
+Some DB engines support snapshotting. Hence, we propose to reuse that functionality for the state sync and versioning (described below). It will the supported DB engines to ones which efficiently implement snapshots. In a final section we will discuss evaluated DBs.
+
+New snapshot will be created in every `EndBlocker`. The `rootmulti.Store` keeps track of the version number and implements the `MultiStore` interface. `MultiStore` encapsulates a `Commiter` interface, which has the `Commit`, `SetPruning`, `GetPruning` functions which will be used for creating and removing snapshots. The `Store.Commit` function increments the version on each call, and checks if it needs to remove old versions. We will need to update the SMT interface to implement the `Commiter` interface.


For clarification, copy-on-write is used to maintain historical versions, but the state sync snapshot still involves copying the entire state store at the time of creation (at least, that is how it's currently implemented).

docs/architecture/adr-040-storage-and-smt-state-commitments.md

Co-authored-by: Roy Crihfield <30845198+roysc@users.noreply.github.com>

roysc · 2021-05-07T13:56:42Z

Thanks for the clarifications @robert-zaremba.

My only follow up questions: do snapshots and versioning refer to distinct feature requirements for the state storage backend DB, and if so, how do they differ? And, is there any desire or intention to use DB backup/export (as @tzdybal mentioned here) as part of the implementation of SDK sync snapshots?

aaronc

I imagine we will want to refine certain things as we go along but this is a great start. Good work @robert-zaremba et al!

aaronc · 2021-05-12T17:15:00Z

One thing that I think needs to happen when we transition to SMT which isn't covered in this ADR is removing the multi-store. IMHO the multi-store just has to go (see #6370).

alexanderbez · 2021-05-12T17:59:37Z

One thing that I think needs to happen when we transition to SMT which isn't covered in this ADR is removing the multi-store. IMHO the multi-store just has to go (see #6370).

Yes please! Just use namespaces or something.

aaronc · 2021-05-12T19:43:18Z

One thing that I think needs to happen when we transition to SMT which isn't covered in this ADR is removing the multi-store. IMHO the multi-store just has to go (see #6370).

Yes please! Just use namespaces or something.

So probably it should work something like this:

given a store named foo
in the SC layer we store hash("foo", key) for the key
in the SS layer we store concat(prefix("foo"), key) for the key where prefix("foo") is a single byte deterministically generated prefix

What do you think @robert-zaremba ?

robert-zaremba · 2021-05-18T21:17:18Z

Correct, multistore is not needed. Prefix store a cleaner solution.

Re: @aaronc example, I prefer to be consistent and use same keys in both SC and SS - so in both cases the key which will go in the API call to SC and SS will be concat("foo", key) . NOTE: SC will always hash keys internally when inserting a record in a tree.

prefix("foo") is a single byte deterministically generated prefix

I'm a bit skeptical about this. What if we will have more than 255 "sub-stores" under same "parent" (eg 300 modules)?
We can use Huffman Coding instead if we want to compress the prefix keys.

I will create an update to this ADR to discuss this.

robert-zaremba · 2021-05-18T23:07:33Z

PR: adr-40: use prefix store instead of multistore #9355

zmanian · 2021-05-19T05:32:54Z

Death to the multistore!

robert-zaremba · 2021-05-19T08:24:58Z

Let's continue the discussion about multistore in Replace IAVL and decouple state commitment from storage #8297 discussion.

ADR-040: Storage and SMT State Commitments

11728cf

robert-zaremba added C:Store T: ADR An issue or PR relating to an architectural decision record labels Jan 25, 2021

robert-zaremba requested review from aaronc, alessio and alexanderbez as code owners January 25, 2021 22:31

robert-zaremba requested review from liamsi and erikgrinaker January 25, 2021 22:31

robert-zaremba marked this pull request as draft January 25, 2021 22:32

tzdybal reviewed Jan 26, 2021

View reviewed changes

liamsi reviewed Jan 26, 2021

View reviewed changes

robert-zaremba and others added 5 commits January 26, 2021 12:44

Update docs/architecture/adr-040-storage-and-smt-state-commitments.md

662ec91

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

Update docs/architecture/adr-040-storage-and-smt-state-commitments.md

5fdbe5d

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

Added more details for snapshotting and pruning.

fa8e9e3

updated links and references

864927e

add blockchains which already use SMT

78215b2

tzdybal reviewed Jan 27, 2021

View reviewed changes

reorganize versioning and pruning

6dd0323

tzdybal mentioned this pull request Jan 28, 2021

Initial SMT store type celestiaorg/cosmos-sdk#12

Closed

tzdybal reviewed Jan 28, 2021

View reviewed changes

docs/architecture/adr-040-storage-and-smt-state-commitments.md Outdated Show resolved Hide resolved

robert-zaremba and others added 3 commits January 29, 2021 02:23

Update docs/architecture/adr-040-storage-and-smt-state-commitments.md

250b5ff

Co-authored-by: Tomasz Zdybał <tomek@zdybal.lap.pl>

Update docs/architecture/adr-040-storage-and-smt-state-commitments.md

374916f

Co-authored-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

adding a paragraph about state management

e90bf8a

tzdybal mentioned this pull request Feb 4, 2021

Add SMT store type #8507

Merged

tac0turtle reviewed Feb 25, 2021

View reviewed changes

docs/architecture/adr-040-storage-and-smt-state-commitments.md Outdated Show resolved Hide resolved

tac0turtle reviewed Feb 25, 2021

View reviewed changes

docs/architecture/adr-040-storage-and-smt-state-commitments.md Outdated Show resolved Hide resolved

tac0turtle reviewed Apr 29, 2021

View reviewed changes

docs/architecture/adr-040-storage-and-smt-state-commitments.md Outdated Show resolved Hide resolved

colin-axner reviewed Apr 29, 2021

View reviewed changes

review updates

962a28b

roysc reviewed May 5, 2021

View reviewed changes

robert-zaremba and others added 2 commits May 5, 2021 18:15

Apply suggestions from code review

bb89798

Co-authored-by: Roy Crihfield <30845198+roysc@users.noreply.github.com>

review udpates

19d2126

roysc mentioned this pull request May 6, 2021

Implement decoupled state commitments and storage vulcanize/cosmos-sdk#11

Merged

9 tasks

i-norden mentioned this pull request May 6, 2021

Outlook for Tendermint Databases [Tracking issue] tendermint/tendermint#6032

Closed

4 tasks

adding clarification

356f987

roysc mentioned this pull request May 7, 2021

Deep dive into underlying databases: RocksDB and BadgerDB vulcanize/cosmos-sdk#10

Closed

robert-zaremba requested review from aaronc and alexanderbez May 11, 2021 16:53

Merge branch 'master' into robert/adr-040

42e7f08

alexanderbez approved these changes May 11, 2021

View reviewed changes

aaronc approved these changes May 11, 2021

View reviewed changes

aaronc merged commit f1de92f into master May 11, 2021

aaronc deleted the robert/adr-040 branch May 11, 2021 20:45

aaronc removed the Status: In Review label May 11, 2021

UnitylChaos mentioned this pull request Jun 9, 2021

Proposal: Storage Overhaul f-o-a-m/kepler#262

Open

i-norden mentioned this pull request Jun 10, 2021

Epic 2: Replace tm-db database interface with new cosmos-db vulcanize/cosmos-sdk#2

Closed

roysc mentioned this pull request Jun 10, 2021

docs(adr): update ADR-40 by adding low level access to SC #9451

Closed

9 tasks


		### Snapshots

		One of the Stargate core features are snapshots and fast sync. Currently this feature is implemented through IAVL.


		## Abstract

		Sparse Merke Tree (SMT) is a version of a Merkle Tree with various storage and performance optimizations. This ADR defines a separation of state commitments from data storage and the SDK transition from IAVL to SMT.

		Database snapshot is a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a snapshot mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.
		Some DB engines support snapshotting. Hence, we propose to reuse that functionality for the state sync and versioning (described below). It will the supported DB engines to ones which efficiently implement snapshots. In a final section we will discuss evaluated DBs.

ADR-040: Storage and SMT State Commitments #8430

ADR-040: Storage and SMT State Commitments #8430

Conversation

robert-zaremba commented Jan 25, 2021

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

musalbas Jan 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liamsi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackzampolin commented Feb 3, 2021

robert-zaremba commented Feb 3, 2021

tzdybal commented Feb 4, 2021

hxrts commented Feb 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colin-axner Apr 30, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robert-zaremba commented Apr 30, 2021

roysc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roysc commented May 7, 2021

aaronc left a comment

Choose a reason for hiding this comment

aaronc commented May 12, 2021

alexanderbez commented May 12, 2021

aaronc commented May 12, 2021 • edited Loading

robert-zaremba commented May 18, 2021

robert-zaremba commented May 18, 2021

zmanian commented May 19, 2021

robert-zaremba commented May 19, 2021

musalbas Jan 27, 2021 •

edited

Loading

colin-axner Apr 30, 2021 •

edited

Loading

aaronc commented May 12, 2021 •

edited

Loading