Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Keep existing blocks when restoring a Snapshot #8643

Merged
merged 42 commits into from
Nov 17, 2018
Merged

Conversation

ngotchac
Copy link
Contributor

@ngotchac ngotchac commented May 16, 2018

Closes #6350

In order to restore existing blocks, it just iterates over the blocks of the current DB before swapping it with the snapshot one, from 1 (or first needed block) until all the blocks have been imported.

To test the feature, one can sync without warp-sync for a few thousand blocks, then restart with --warp-barrier

Blocks restoration seemed pretty fast (~15 seconds for 150_000 blocks), but if too slow this could be improved by skipping caches.

@ngotchac ngotchac added A0-pleasereview 🤓 Pull request needs code review. M4-core ⛓ Core client code / Rust. B7-releasenotes 📜 Changes should be mentioned in the release notes of the next minor version release. A1-onice 🌨 Pull request is reviewed well, but should not yet be merged. labels May 16, 2018

// Writting changes to DB and logging every now and then
if block_number % 1_000 == 0 {
next_db.write_buffered(batch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_buffered will keep them in memory. we need to flush periodically

let block_receipts = block_receipts.receipts;

next_chain.insert_unordered_block(&mut batch, &raw_block, block_receipts, parent_diff, false, true);
parent_diff = Some(diff);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parent_diff is meant to be a total difficulty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups, right...

// Try to include every block that will need to be downloaded from the current chain
// Break when no more blocks are available from it.
match (next_chain_info.ancient_block_number, next_chain_info.first_block_number) {
(Some(next_ancient_block), Some(next_first_block)) if next_ancient_block + 1 < next_first_block => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ancient_block_number is always going to be 1 (or 0, can't remember) after a restoration, isn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the blocks we want to import from the old chain: anything available from genesis to first_block_number, but maintaining the invariant that only a single gap exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's right, it's supposed to always be 1. It does ensure that there is still only one gap of blocks.

next_ancient_block, next_first_block,
);

let mut block_number = next_ancient_block + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really needs to check that the first ancient block we pull from the old client has the right parent_hash.

}
// Break if we already imported some blocks in the current batch and there
// are no more left
} else if parent_diff.is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only break if parent_diff is Some? if there are no more blocks we can import it seems like it should always break the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's right, didn't push this change yet.

while block_number < next_first_block {
let chain = cur_chain.read();

if let Some(block_hash) = chain.block_hash(block_number) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have an exclusive lock on the chain's DB. this is racing with client's normal block import. (in practice this isn't a problem right now but let's not lean on assumptions from outside the module)

Copy link
Contributor Author

@ngotchac ngotchac May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it lock the DB for the time of the restoration? The blocks won't change during it.

Copy link
Contributor

@rphmeier rphmeier May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should rather reference blocks by hash as opposed to number. Hash -> Block is stable, Number -> Block varies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is easiest when we import blocks in reverse because the parent_hash is always available. But then we should always make sure that next_first_block.parent_hash corresponds to the hash of a block that we import as its parent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So could it be just a check that for each block, the parent's hash is the one expected, ie. the one we got from the previous block ? If there is a mismatch, then just stop right there.

@rphmeier
Copy link
Contributor

Since GitHub marks that comment as hidden I'll continue here:

So could it be just a check that for each block, the parent's hash is the one expected, ie. the one we got from the previous block ? If there is a mismatch, then just stop right there.

sure, but that's a little weird because we know that those blocks are still probably in the DB, just that there was a reorg in the meantime

@ngotchac
Copy link
Contributor Author

But do we keep blocks in DB that has been reorged?

@rphmeier
Copy link
Contributor

rphmeier commented May 16, 2018

yes. because there could be a reorg back at any point, there is no reason to discard them.

@5chdn 5chdn added this to the 1.12 milestone May 16, 2018
@ngotchac ngotchac removed the A1-onice 🌨 Pull request is reviewed well, but should not yet be merged. label May 17, 2018
@ngotchac
Copy link
Contributor Author

@rphmeier I update the PR so that it starts at the best available ancient block, and it iterates backwards from the parent's hash.
It actually also fixes an issue with resuming snapshot restoration with all the needed chunks (it would previously hang).

debris
debris previously requested changes May 25, 2018
Copy link
Collaborator

@debris debris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very important pr, but imo still requires some polishing :)

@@ -845,6 +845,11 @@ impl Client {
*self.exit_handler.lock() = Some(Box::new(f));
}

/// Returns the chain reference
pub fn chain(&self) -> &RwLock<Arc<BlockChain>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bad practice to expose &RwLock. There is no guarantee this will not deadlock

@@ -16,6 +16,7 @@

//! Snapshot network service implementation.

// use std::cmp;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be removed

@@ -220,7 +221,7 @@ pub struct ServiceParams {
/// Usually "<chain hash>/snapshot"
pub snapshot_root: PathBuf,
/// A handle for database restoration.
pub db_restore: Arc<DatabaseRestore>,
pub client: Arc<Client>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's contradictory to @0x7CFE refactor of ethcore. We don't want any modules to require Client, but only the interface that is actually being used

@@ -103,6 +102,9 @@ fn restored_is_equivalent() {

#[test]
fn guards_delete_folders() {
let gas_prices = vec![1.into(), 2.into(), 3.into(), 999.into()];
let client = generate_dummy_client_with_spec_and_data(Spec::new_null, 400, 5, &gas_prices);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, this helper function generates client with 400 blocks. you should check in this test if they were actually migrated correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. This test only tests if the guarded folders are deleted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from #6350

When restoring from a snapshot parity should try and re-import ancient blocks existing in the current database starting from genesis.

We need a test for that ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah sure, I thought you were commenting on the guards_delete_folders test!

@@ -838,6 +827,80 @@ impl BlockChain {
}
}

/// Update the best ancient block to the given hash, after checking that
/// it's directly linked to the currently known best ancient block
pub fn update_best_ancient_block(&self, hash: &H256) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called only from a single place. Is it possible for hash to not be linked to any known ancient block? If yes, what does it mean? Should it be handled somehow? Currently the result of this function execution is unknown which makes it very difficult to debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this function will go from the given hash, and will ensure there is a link between the block at the given hash and the last know best ancient block. Thus, it only update the best ancient block if there is a link.

@5chdn
Copy link
Contributor

5chdn commented Oct 2, 2018

Please reopen when ready

@5chdn 5chdn closed this Oct 2, 2018
@ngotchac
Copy link
Contributor Author

ngotchac commented Oct 2, 2018

Rebased on master

@ngotchac ngotchac reopened this Oct 2, 2018
@ngotchac ngotchac removed the A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Oct 2, 2018
@5chdn
Copy link
Contributor

5chdn commented Oct 26, 2018

@debris @tomusdrw could you give this a final review?

@ngotchac sorry, could you rebase this again?

@5chdn 5chdn added the B9-blocker 🚧 This pull request blocks the next release from happening. Use only in extreme cases. label Oct 26, 2018
@5chdn 5chdn modified the milestones: 2.2, 2.3 Oct 29, 2018
Copy link
Collaborator

@tomusdrw tomusdrw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this merged and tested in wild, it's too long overdue already.

let mut block_hash = *hash;
let mut is_linked = false;

loop {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ @ngotchac is this addressed?

if let Some(mut receipts_bytes) = io.chain().encoded_block_receipts(&rlp.val_at::<H256>(i)?) {
if let Some(receipts) = io.chain().block_receipts(&rlp.val_at::<H256>(i)?) {
let mut receipts_bytes = ::rlp::encode(&receipts).into_vec();
// if let Some(mut receipts_bytes) = io.chain().encoded_block_receipts(&rlp.val_at::<H256>(i)?) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove please?

@tomusdrw tomusdrw added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Nov 9, 2018
Copy link
Contributor

@ascjones ascjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally took the time to go through this. Looks good!

ethcore/src/snapshot/service.rs Outdated Show resolved Hide resolved
Co-Authored-By: ngotchac <ngotchac@gmail.com>
@5chdn 5chdn merged commit 9475a2e into master Nov 17, 2018
@5chdn 5chdn deleted the ng-keep-ancient-blocks branch November 17, 2018 23:06
@ngotchac
Copy link
Contributor Author

🎉 !!!

@ngotchac ngotchac mentioned this pull request Nov 22, 2018
@5chdn 5chdn mentioned this pull request Nov 27, 2018
12 tasks
5chdn pushed a commit that referenced this pull request Nov 27, 2018
* Rename db_restore => client

* First step: make it compile!

* Second step: working implementation!

* Refactoring

* Fix tests

* PR Grumbles

* PR Grumbles WIP

* Migrate ancient blocks interating backward

* Early return in block migration if snapshot is aborted

* Remove RwLock getter (PR Grumble I)

* Remove dependency on `Client`: only used Traits

* Add test for recovering aborted snapshot recovery

* Add test for migrating old blocks

* Fix build

* PR Grumble I

* PR Grumble II

* PR Grumble III

* PR Grumble IV

* PR Grumble V

* PR Grumble VI

* Fix one test

* Fix test

* PR Grumble

* PR Grumbles

* PR Grumbles II

* Fix tests

* Release RwLock earlier

* Revert Cargo.lock

* Update _update ancient block_ logic: set local in `commit`

* Update typo in ethcore/src/snapshot/service.rs

Co-Authored-By: ngotchac <ngotchac@gmail.com>
5chdn added a commit that referenced this pull request Nov 29, 2018
* version: bump beta to 2.2.2

* Add experimental RPCs flag (#9928)

* WiP

* Enable experimental RPCs.

* Keep existing blocks when restoring a Snapshot (#8643)

* Rename db_restore => client

* First step: make it compile!

* Second step: working implementation!

* Refactoring

* Fix tests

* PR Grumbles

* PR Grumbles WIP

* Migrate ancient blocks interating backward

* Early return in block migration if snapshot is aborted

* Remove RwLock getter (PR Grumble I)

* Remove dependency on `Client`: only used Traits

* Add test for recovering aborted snapshot recovery

* Add test for migrating old blocks

* Fix build

* PR Grumble I

* PR Grumble II

* PR Grumble III

* PR Grumble IV

* PR Grumble V

* PR Grumble VI

* Fix one test

* Fix test

* PR Grumble

* PR Grumbles

* PR Grumbles II

* Fix tests

* Release RwLock earlier

* Revert Cargo.lock

* Update _update ancient block_ logic: set local in `commit`

* Update typo in ethcore/src/snapshot/service.rs

Co-Authored-By: ngotchac <ngotchac@gmail.com>

* Adjust requests costs for light client (#9925)

* PIP Table Cost relative to average peers instead of max peers

* Add tracing in PIP new_cost_table

* Update stat peer_count

* Use number of leeching peers for Light serve costs

* Fix test::light_params_load_share_depends_on_max_peers (wrong type)

* Remove (now) useless test

* Remove `load_share` from LightParams.Config
Prevent div. by 0

* Add LEECHER_COUNT_FACTOR

* PR Grumble: u64 to u32 for f64 casting

* Prevent u32 overflow for avg_peer_count

* Add tests for LightSync::Statistics

* Fix empty steps (#9939)

* Don't send empty step twice or empty step then block.

* Perform basic validation of locally sealed blocks.

* Don't include empty step twice.

* prevent silent errors in daemon mode, closes #9367 (#9946)

* Fix a deadlock (#9952)

* Update informant:
  - decimal in Mgas/s
  - print every 5s (not randomly between 5s and 10s)

* Fix dead-lock in `blockchain.rs`

* Update locks ordering

* Fix light client informant while syncing (#9932)

* Add `is_idle` to LightSync to check importing status

* Use SyncStateWrapper to make sure is_idle gets updates

* Update is_major_import to use verified queue size as well

* Add comment for `is_idle`

* Add Debug to `SyncStateWrapper`

* `fn get` -> `fn into_inner`

*  ci: rearrange pipeline by logic (#9970)

* ci: rearrange pipeline by logic

* ci: rename docs script

* fix docker build (#9971)

* Deny unknown fields for chainspec (#9972)

* Add deny_unknown_fields to chainspec

* Add tests and fix existing one

* Remove serde_ignored dependency for chainspec

* Fix rpc test eth chain spec

* Fix starting_nonce_test spec

* Improve block and transaction propagation (#9954)

* Refactor sync to add priority tasks.

* Send priority tasks notifications.

* Propagate blocks, optimize transactions.

* Implement transaction propagation. Use sync_channel.

* Tone down info.

* Prevent deadlock by not waiting forever for sync lock.

* Fix lock order.

* Don't use sync_channel to prevent deadlocks.

* Fix tests.

* Fix unstable peers and slowness in sync (#9967)

* Don't sync all peers after each response

* Update formating

* Fix tests: add `continue_sync` to `Sync_step`

* Update ethcore/sync/src/chain/mod.rs

Co-Authored-By: ngotchac <ngotchac@gmail.com>

* fix rpc middlewares

* fix Cargo.lock

* json: resolve merge in spec

* rpc: fix starting_nonce_test

* ci: allow nightl job to fail
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A8-looksgood 🦄 Pull request is reviewed well. B7-releasenotes 📜 Changes should be mentioned in the release notes of the next minor version release. B9-blocker 🚧 This pull request blocks the next release from happening. Use only in extreme cases. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants