Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Vision: Multi-block Migrations #7911

Closed
apopiak opened this issue Jan 15, 2021 · 12 comments
Closed

Vision: Multi-block Migrations #7911

apopiak opened this issue Jan 15, 2021 · 12 comments
Labels
J0-enhancement An additional feature request. J4-duplicate Issue is a duplicate. Closer should comment with a link to the duplicate. Z3-substantial Can be fixed by an experienced coder with a working knowledge of the codebase.

Comments

@apopiak
Copy link
Contributor

apopiak commented Jan 15, 2021

In situations where the block time is subject to a hard cap, a Substrate chain needs to be able to execute a(n expensive) storage migration over the course of more than one block. This will likely happen in the context of running as a parachain on Polkadot, as validators will enforce a maximum block time on parachain blocks.

This can likely not be addressed by the existing task scheduling means as the blockchain is not operational while migrating. It should likely be "frozen" to avoid inconsistent states.

@gui1117
Copy link
Contributor

gui1117 commented Mar 11, 2021

One solution can be:

do a runtime migration which change the logic of the storage to:

  • read from old storage.
  • write to new storage and old storage.
  • copy N value from old storage to new storage per block.

once all value in old storage and new storage are identical, do another runtime migration which change the logic of the storage to:

  • read and write to new storage
  • delete N value from old storage per block.

@xlc
Copy link
Contributor

xlc commented Mar 11, 2021

We can always do lazy upgrade. i.e. Only migrate storage upon access. So when reading something, try new storage, and if not exist, try old one and migrate to new one.

After long enough time, most of the old storages should be moved to new one and then we can do a final migration to move remaining ones.

@burdges
Copy link

burdges commented Mar 11, 2021

We know more efficient storage migration schemes, but nobody clarified why we'd ever justify the implementation effort. I'll describe one fairly straightforward approach:

We first require storage checkpoint protocol where one passes the storage through the availability and approval system, like parathread blocks. In other words, one node erasure codes the storage, either all or only part, distributes the pieces among the validators, and then approval checkers reconstruct the storage and check its Merkle root. We require checkpointing for system parathreads and multiple relay chains too, so rather important.

We do migration for the relay chains by first one node migrating and checkpointing the new storage, so like publishing the relay chain state in a series of parathread blocks. We then continue running the chain almost as usual, except we now track both the pre- and post-migration Merkle roots. All nodes validate either pre- or post- migration Merkle root updates, with initially the checkpoint blocks' approval checkers validating post-migration updates, but also they spot check the other. After we approve and finalize the checkpoint, then all nodes download the checkpoint, and the replay the intervening blocks upon the checkpointed state.

We need parathreads, storage checkpoints, and this double hashing before this makes any sense, but after those this becomes straightforward. We avoid most nodes recomputing the whole storage since they obtain it from the checkpoint and replay a small-ish number of blocks.

@apopiak
Copy link
Contributor Author

apopiak commented Mar 12, 2021

@burdges Your scheme sounds like a way to do relay chain migrations.
The main use case that I see for multi-block migrations stems from the harder limit on block times that parachains will be subject to.

@stale
Copy link

stale bot commented Jul 7, 2021

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 7, 2021
@apopiak apopiak removed the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 12, 2021
@apopiak
Copy link
Contributor Author

apopiak commented Jul 12, 2021

Stil relevant

@JoshOrndorff
Copy link
Contributor

The Moonbeam team has begun to explore a multi-block migration strategy in moonbeam-foundation/moonbeam#527. This design expresses migrations as batches which allows migrations to split across multiple blocks. This way the strict execution time limit can be met and the migration can also be run to completion.

But this multi-block migration support introduces the additional challenge that normal transactions should not be processed while the migration is ongoing or else they may corrupt the state. Ideally transactions would accumulate in the transaction pool during this period and be processed as soon as the migration is complete.

In the Drupal web CMS (and others) they have a notion of "maintenance mode". In this mode site visitors cannot interact with the site except to view public static files. No db reads or writes are permitted at all. Such a maintenance mode may make sense at the FRAME level. These Drupal docs mention that

A site might be put in maintenance mode during development, to perform site maintenance, or as a security precaution during an attack.

Maintenance mode could also serve as an emergency stop button when a chain is being attacked or a bug exploited.

@JoshOrndorff
Copy link
Contributor

I really like @xlc's idea of lazy storage migrations. These could also be run during the on_idle hook so we don't have to wait too long before all the storage has been migrated.

@gui1117
Copy link
Contributor

gui1117 commented Aug 17, 2021

yes a lazy migration is not hard. All access to storage are done through the type alias (as Foo in type Foo<T> = StorageValue<...>).
So one can change this type alias to a new implementation which does this:

  • on read: read from new or it if doesn't exist from old,
  • on write: write to new, and remove old.
    Then we can use something like half of the normal dispatch block weight to migrate.
    This way the chain is more bloated but half of the normal dispatch is still available for normal extrinsics.

Once every old keys have been removed, we can do another runtime upgrade which only use the new storage.

@shawntabrizi shawntabrizi added the Z3-substantial Can be fixed by an experienced coder with a working knowledge of the codebase. label Jun 3, 2022
@gregzaitsev
Copy link

Hi @JoshOrndorff! Is there a way Unique team can contribute to this issue?

@mrshiposha
Copy link
Contributor

Hi everyone!

It seems to me that the lazy migration would have some issues, namely:

  • If we do the migration only on reads/writes and we don't use the remaining weight (as in the on_idle hook) then we can't tell when the migration will complete
  • If we do the migration on writes and continue the migration using the remaining weight then the state could become inconsistent. For instance, the StorageMap::translate can't work correctly if there are already migrated values in the storage because the function can't distinguish migrated values and invalid values -- both just can't be decoded.

So, it seems to me that the Multi-Block Migration should work in a span of several continuous blocks with a guarantee that no transaction will be executed during the migration.

I wrote a gist with thoughts on how we can implement the Multi-Block Migration and how it could look like.
In short, I think that it would be nice to add the analog of the translate function -- the multiblock_translate fn. And possibly a new hook -- the OnMultiblockMigration hook (See Questions section)

@kianenigma kianenigma moved this to Backlog in Runtime / FRAME Aug 21, 2022
@kianenigma kianenigma changed the title Multi-block Migrations Vision: Multi-block Migrations Mar 10, 2023
@juangirini
Copy link
Contributor

Closed in lieu of paritytech/polkadot-sdk#198

@juangirini juangirini closed this as not planned Won't fix, can't repro, duplicate, stale May 29, 2023
@github-project-automation github-project-automation bot moved this from Backlog to Done in Runtime / FRAME May 29, 2023
@juangirini juangirini moved this from Done to Won't Fix in Runtime / FRAME May 29, 2023
@juangirini juangirini added the J4-duplicate Issue is a duplicate. Closer should comment with a link to the duplicate. label Jun 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J0-enhancement An additional feature request. J4-duplicate Issue is a duplicate. Closer should comment with a link to the duplicate. Z3-substantial Can be fixed by an experienced coder with a working knowledge of the codebase.
Projects
Status: Won't Fix
Development

No branches or pull requests

9 participants