Vision: Multi-block Migrations #7911

apopiak · 2021-01-15T16:24:10Z

In situations where the block time is subject to a hard cap, a Substrate chain needs to be able to execute a(n expensive) storage migration over the course of more than one block. This will likely happen in the context of running as a parachain on Polkadot, as validators will enforce a maximum block time on parachain blocks.

This can likely not be addressed by the existing task scheduling means as the blockchain is not operational while migrating. It should likely be "frozen" to avoid inconsistent states.

gui1117 · 2021-03-11T16:16:53Z

One solution can be:

do a runtime migration which change the logic of the storage to:

read from old storage.
write to new storage and old storage.
copy N value from old storage to new storage per block.

once all value in old storage and new storage are identical, do another runtime migration which change the logic of the storage to:

read and write to new storage
delete N value from old storage per block.

xlc · 2021-03-11T22:26:08Z

We can always do lazy upgrade. i.e. Only migrate storage upon access. So when reading something, try new storage, and if not exist, try old one and migrate to new one.

After long enough time, most of the old storages should be moved to new one and then we can do a final migration to move remaining ones.

burdges · 2021-03-11T22:54:45Z

We know more efficient storage migration schemes, but nobody clarified why we'd ever justify the implementation effort. I'll describe one fairly straightforward approach:

We first require storage checkpoint protocol where one passes the storage through the availability and approval system, like parathread blocks. In other words, one node erasure codes the storage, either all or only part, distributes the pieces among the validators, and then approval checkers reconstruct the storage and check its Merkle root. We require checkpointing for system parathreads and multiple relay chains too, so rather important.

We do migration for the relay chains by first one node migrating and checkpointing the new storage, so like publishing the relay chain state in a series of parathread blocks. We then continue running the chain almost as usual, except we now track both the pre- and post-migration Merkle roots. All nodes validate either pre- or post- migration Merkle root updates, with initially the checkpoint blocks' approval checkers validating post-migration updates, but also they spot check the other. After we approve and finalize the checkpoint, then all nodes download the checkpoint, and the replay the intervening blocks upon the checkpointed state.

We need parathreads, storage checkpoints, and this double hashing before this makes any sense, but after those this becomes straightforward. We avoid most nodes recomputing the whole storage since they obtain it from the checkpoint and replay a small-ish number of blocks.

apopiak · 2021-03-12T08:25:15Z

@burdges Your scheme sounds like a way to do relay chain migrations.
The main use case that I see for multi-block migrations stems from the harder limit on block times that parachains will be subject to.

stale · 2021-07-07T20:23:35Z

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

apopiak · 2021-07-12T09:03:09Z

Stil relevant

JoshOrndorff · 2021-07-30T13:13:26Z

The Moonbeam team has begun to explore a multi-block migration strategy in moonbeam-foundation/moonbeam#527. This design expresses migrations as batches which allows migrations to split across multiple blocks. This way the strict execution time limit can be met and the migration can also be run to completion.

But this multi-block migration support introduces the additional challenge that normal transactions should not be processed while the migration is ongoing or else they may corrupt the state. Ideally transactions would accumulate in the transaction pool during this period and be processed as soon as the migration is complete.

In the Drupal web CMS (and others) they have a notion of "maintenance mode". In this mode site visitors cannot interact with the site except to view public static files. No db reads or writes are permitted at all. Such a maintenance mode may make sense at the FRAME level. These Drupal docs mention that

A site might be put in maintenance mode during development, to perform site maintenance, or as a security precaution during an attack.

Maintenance mode could also serve as an emergency stop button when a chain is being attacked or a bug exploited.

JoshOrndorff · 2021-08-16T16:38:26Z

I really like @xlc's idea of lazy storage migrations. These could also be run during the on_idle hook so we don't have to wait too long before all the storage has been migrated.

gui1117 · 2021-08-17T08:06:33Z

yes a lazy migration is not hard. All access to storage are done through the type alias (as Foo in type Foo<T> = StorageValue<...>).
So one can change this type alias to a new implementation which does this:

on read: read from new or it if doesn't exist from old,
on write: write to new, and remove old.
Then we can use something like half of the normal dispatch block weight to migrate.
This way the chain is more bloated but half of the normal dispatch is still available for normal extrinsics.

Once every old keys have been removed, we can do another runtime upgrade which only use the new storage.

gregzaitsev · 2022-08-03T13:00:06Z

Hi @JoshOrndorff! Is there a way Unique team can contribute to this issue?

mrshiposha · 2022-08-11T12:13:53Z

Hi everyone!

It seems to me that the lazy migration would have some issues, namely:

If we do the migration only on reads/writes and we don't use the remaining weight (as in the on_idle hook) then we can't tell when the migration will complete
If we do the migration on writes and continue the migration using the remaining weight then the state could become inconsistent. For instance, the StorageMap::translate can't work correctly if there are already migrated values in the storage because the function can't distinguish migrated values and invalid values -- both just can't be decoded.

So, it seems to me that the Multi-Block Migration should work in a span of several continuous blocks with a guarantee that no transaction will be executed during the migration.

I wrote a gist with thoughts on how we can implement the Multi-Block Migration and how it could look like.
In short, I think that it would be nice to add the analog of the translate function -- the multiblock_translate fn. And possibly a new hook -- the OnMultiblockMigration hook (See Questions section)

juangirini · 2023-05-29T14:07:57Z

Closed in lieu of paritytech/polkadot-sdk#198

apopiak added the J0-enhancement An additional feature request. label Jan 15, 2021

apopiak mentioned this issue Jan 15, 2021

Candidate validation timeouts (t_good, t_bad, and t_ugly) paritytech/polkadot#1656

Closed

kianenigma mentioned this issue Feb 24, 2021

Runtime Task Executor + Example for staking slashing spans #8197

Closed

1 task

stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 7, 2021

apopiak removed the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 12, 2021

apopiak mentioned this issue Jul 30, 2021

Multi-block migrations and Maintenance mode #9464

Closed

This was referenced Nov 30, 2021

Test storage migration and try runtime parallel-finance/parallel#676

Closed

Could not send block production result to proposer #10407

Closed

Dinonard mentioned this issue Jan 7, 2022

Too heavy runtime upgrade #10611

Closed

shawntabrizi added the Z3-substantial Can be fixed by an experienced coder with a working knowledge of the codebase. label Jun 3, 2022

kianenigma moved this to Backlog in Runtime / FRAME Aug 21, 2022

kianenigma added this to Runtime / FRAME Aug 21, 2022

xlc mentioned this issue Dec 17, 2022

pallet-contracts v9 storage migration can result in huge PoV #12908

Closed

2 tasks

eskimor mentioned this issue Dec 20, 2022

disputes pallet: Remove spam slots paritytech/polkadot#6345

Merged

7 tasks

kianenigma changed the title ~~Multi-block Migrations~~ Vision: Multi-block Migrations Mar 10, 2023

juangirini closed this as not planned Won't fix, can't repro, duplicate, stale May 29, 2023

github-project-automation bot moved this from Backlog to Done in Runtime / FRAME May 29, 2023

juangirini moved this from Done to Won't Fix in Runtime / FRAME May 29, 2023

juangirini added the J4-duplicate Issue is a duplicate. Closer should comment with a link to the duplicate. label Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision: Multi-block Migrations #7911

Vision: Multi-block Migrations #7911

apopiak commented Jan 15, 2021 •

edited

Loading

gui1117 commented Mar 11, 2021

xlc commented Mar 11, 2021

burdges commented Mar 11, 2021 •

edited

Loading

apopiak commented Mar 12, 2021

stale bot commented Jul 7, 2021

apopiak commented Jul 12, 2021

JoshOrndorff commented Jul 30, 2021

JoshOrndorff commented Aug 16, 2021

gui1117 commented Aug 17, 2021 •

edited

Loading

gregzaitsev commented Aug 3, 2022

mrshiposha commented Aug 11, 2022

juangirini commented May 29, 2023

Vision: Multi-block Migrations #7911

Vision: Multi-block Migrations #7911

Comments

apopiak commented Jan 15, 2021 • edited Loading

gui1117 commented Mar 11, 2021

xlc commented Mar 11, 2021

burdges commented Mar 11, 2021 • edited Loading

apopiak commented Mar 12, 2021

stale bot commented Jul 7, 2021

apopiak commented Jul 12, 2021

JoshOrndorff commented Jul 30, 2021

JoshOrndorff commented Aug 16, 2021

gui1117 commented Aug 17, 2021 • edited Loading

gregzaitsev commented Aug 3, 2022

mrshiposha commented Aug 11, 2022

juangirini commented May 29, 2023

apopiak commented Jan 15, 2021 •

edited

Loading

burdges commented Mar 11, 2021 •

edited

Loading

gui1117 commented Aug 17, 2021 •

edited

Loading