Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable conservative out-of-bound snapshot cleaning #8811

Merged

Conversation

ryoqun
Copy link
Member

@ryoqun ryoqun commented Mar 12, 2020

Problem

There is no codepath for cleaning unused storages in snapshot when starting a validator

Summary of Changes

Our past efforts has paid off; Now just need small plumbing and test clarification.
And, I'm now very conservative with good reason of newly-introduced bugs with the adventure of #8168. So, this PR doesn't contain aggressive strategy, originally had in #8337.

Let it rip little by little. :)

Not codewise, but this PR requires #8724

Split from #8337
Towards #8168

@@ -2965,10 +2965,9 @@ pub mod tests {
}

#[test]
fn test_accounts_purge_chained_purge_after_snapshot_restore_complex() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood this mystical test finally...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoqun ryoqun force-pushed the conservative-outbound-snapshot-cleaning branch from d372bdd to 4cc6cee Compare March 12, 2020 11:48
@ryoqun ryoqun requested a review from sakridge March 12, 2020 11:50
@ryoqun ryoqun added the v1.0 label Mar 12, 2020
@codecov
Copy link

codecov bot commented Mar 12, 2020

Codecov Report

Merging #8811 into master will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@          Coverage Diff           @@
##           master   #8811   +/-   ##
======================================
  Coverage    80.0%   80.0%           
======================================
  Files         265     265           
  Lines       57477   57481    +4     
======================================
+ Hits        45998   46011   +13     
+ Misses      11479   11470    -9     

Copy link
Member

@sakridge sakridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ryoqun
Copy link
Member Author

ryoqun commented Mar 13, 2020

I ran v1.0 validator with this PR patched overnight, and it seems that it still produces valid snapshots.

Overall, size reduction is 10x and restore-time reduction is 20x. That's amazing.

$ time ./solana-ledger-tool --ledger ./verify-big verify 
4uhcVJyU9pJkvQyS88uRDiswHXSCkY3zQawwpjk2NsNY
[2020-03-13T03:59:58.113891099Z INFO  solana_ledger::blockstore] Maximum open file descriptors: 65000
[2020-03-13T03:59:58.264635824Z INFO  solana_ledger::blockstore] "/mnt/solana-data/tds1.0.6/verify-big/rocksdb" open took 150ms
[2020-03-13T03:59:58.264675934Z INFO  solana_ledger::bank_forks_utils] Initializing snapshot path: "/mnt/solana-data/tds1.0.6/verify-big/snapshot"
[2020-03-13T03:59:58.265309913Z INFO  solana_ledger::bank_forks_utils] Loading snapshot package: "/mnt/solana-data/tds1.0.6/verify-big/snapshot-2979390-7T3hMt1aeihazDSj8TV4nPHYrZA7cuZUpKsf61ndMwR6.tar.bz2"
[2020-03-13T04:02:28.463507741Z INFO  solana_ledger::snapshot_utils] snapshot untar took 150.2s
[2020-03-13T04:02:28.463741743Z INFO  solana_ledger::snapshot_utils] snapshot version: 1.0.0
[2020-03-13T04:02:28.463787304Z INFO  solana_ledger::snapshot_utils] Loading bank from "/mnt/solana-data/tds1.0.6/verify-big/snapshot/.tmpmyRa9I/snapshots/2979390/2979390"
[2020-03-13T04:02:28.470340963Z INFO  solana_ledger::snapshot_utils] Rebuilding accounts...
[2020-03-13T04:02:39.718186407Z INFO  solana_ledger::snapshot_utils] Rebuilding status cache...
[2020-03-13T04:02:39.734429305Z INFO  solana_ledger::snapshot_utils] Loaded bank for slot: 2979390
[2020-03-13T04:02:39.734538476Z INFO  solana_runtime::accounts_db] total_stores: 9192, newest_slot: 2979390, oldest_slot: 0, max_slot: 2127304 (num=9), min_slot: 797455 (num=1)
[2020-03-13T04:02:39.734567527Z INFO  solana_metrics::metrics] metrics disabled: SOLANA_METRICS_CONFIG: environment variable not found
[2020-03-13T04:02:39.735095743Z INFO  solana_metrics::metrics] datapoint: accounts_db-stores total_count=9192i
[2020-03-13T04:02:45.834285050Z INFO  solana_runtime::accounts_db] scan took 147us merge took 68us accumulate took 256us
[2020-03-13T04:02:45.834323581Z INFO  solana_runtime::bank] bank frozen: 2979390 hash: TGatxEgmMUy8DD84vnndYVuC4pfYJEhdPnGyiqM9AvB accounts_delta: HWbnv15yEZwtTVC1P4QSQvQ2WvAYV88ZGob9gubmwycf signature_count: 192 last_blockhash: 2kypLJkVsrS1iANBQpBqhWw3VPKuYWxXErhWJZPe2rZq
[2020-03-13T04:02:45.834362271Z INFO  solana_runtime::bank] accounts hash slot: 2979390 stats: BankHashStats { num_removed_accounts: 194, num_added_accounts: 0, num_lamports_stored: 996203915310581396, total_data_len: 511963, num_executable_accounts: 0 }
[2020-03-13T04:02:45.834375302Z INFO  solana_ledger::snapshot_utils] bank rebuild from snapshot took 17.4s
[2020-03-13T04:02:46.019387346Z INFO  solana_ledger::blockstore_processor] processing ledger from root slot 2979390...
[2020-03-13T04:02:46.095345426Z INFO  solana_ledger::blockstore_processor] ledger processed in 75ms. 89 MB allocated. 1 fork at 2979390, with 1 frozen bank
Ok

real    2m48.878s
user    2m15.377s
sys     0m27.926s
$ time ./solana-ledger-tool --ledger ./verify verify 
4uhcVJyU9pJkvQyS88uRDiswHXSCkY3zQawwpjk2NsNY
[2020-03-13T04:03:27.659987553Z INFO  solana_ledger::blockstore] Maximum open file descriptors: 65000
[2020-03-13T04:03:27.734123485Z INFO  solana_ledger::blockstore] "/mnt/solana-data/tds1.0.6/verify/rocksdb" open took 74ms
[2020-03-13T04:03:27.734159665Z INFO  solana_ledger::bank_forks_utils] Initializing snapshot path: "/mnt/solana-data/tds1.0.6/verify/snapshot"
[2020-03-13T04:03:27.735254419Z INFO  solana_ledger::bank_forks_utils] Loading snapshot package: "/mnt/solana-data/tds1.0.6/verify/snapshot-2980756-5WTADy6e5LDbkp8nMSC9ktKEXqwaWunUSYEqXoPZbtv2.tar.bz2"
[2020-03-13T04:03:33.532770687Z INFO  solana_ledger::snapshot_utils] snapshot untar took 5.8s
[2020-03-13T04:03:33.533033361Z INFO  solana_ledger::snapshot_utils] snapshot version: 1.0.0
[2020-03-13T04:03:33.533084181Z INFO  solana_ledger::snapshot_utils] Loading bank from "/mnt/solana-data/tds1.0.6/verify/snapshot/.tmpVdgdLe/snapshots/2980756/2980756"
[2020-03-13T04:03:33.539386848Z INFO  solana_ledger::snapshot_utils] Rebuilding accounts...
[2020-03-13T04:03:33.671787519Z INFO  solana_ledger::snapshot_utils] Rebuilding status cache...
[2020-03-13T04:03:33.685240783Z INFO  solana_ledger::snapshot_utils] Loaded bank for slot: 2980756
[2020-03-13T04:03:33.685339634Z INFO  solana_runtime::accounts_db] total_stores: 413, newest_slot: 2980756, oldest_slot: 0, max_slot: 0 (num=5), min_slot: 2580656 (num=1)
[2020-03-13T04:03:33.685371035Z INFO  solana_metrics::metrics] metrics disabled: SOLANA_METRICS_CONFIG: environment variable not found
[2020-03-13T04:03:33.685592937Z INFO  solana_metrics::metrics] datapoint: accounts_db-stores total_count=413i
[2020-03-13T04:03:33.695538389Z INFO  solana_runtime::accounts_db] scan took 153us merge took 42us accumulate took 180us
[2020-03-13T04:03:33.695574660Z INFO  solana_runtime::bank] bank frozen: 2980756 hash: 3FgZbogiGmXLWrVqHkqrLNPRRquFC23aXwpobyaqVijx accounts_delta: 4SWxmRXEM2TevLrLWcnTjkHYYeWj8JJaE9WegqZyAZEE signature_count: 138 last_blockhash: 4rTUT2rTNR6idZVmAApo7gBf1vbCA7THQTRE4Ritpw8c
[2020-03-13T04:03:33.695599949Z INFO  solana_runtime::bank] accounts hash slot: 2980756 stats: BankHashStats { num_removed_accounts: 144, num_added_accounts: 0, num_lamports_stored: 996189844701729378, total_data_len: 423444, num_executable_accounts: 0 }
[2020-03-13T04:03:33.695611470Z INFO  solana_ledger::snapshot_utils] bank rebuild from snapshot took 162ms
[2020-03-13T04:03:33.711222801Z INFO  solana_ledger::blockstore_processor] processing ledger from root slot 2980756...
[2020-03-13T04:03:33.782578685Z INFO  solana_ledger::blockstore_processor] ledger processed in 71ms. 89 MB allocated. 1 fork at 2980756, with 1 frozen bank
Ok

real    0m6.281s
user    0m5.411s
sys     0m0.996s
$ stat /mnt/solana-data/tds1.0.6/verify-big/snapshot-2979390-7T3hMt1aeihazDSj8TV4nPHYrZA7cuZUpKsf61ndMwR6.tar.bz2 /mnt/solana-data/tds1.0.6/verify/snapshot-2980756-5WTADy6e5LDbkp8nMSC9ktKEXqwaWunUSYEqXoPZbtv2.tar.bz2
  File: /mnt/solana-data/tds1.0.6/verify-big/snapshot-2979390-7T3hMt1aeihazDSj8TV4nPHYrZA7cuZUpKsf61ndMwR6.tar.bz2
  Size: 97738159        Blocks: 190896     IO Block: 4096   regular file
Device: 35h/53d Inode: 6844677     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  ubuntu)   Gid: ( 1000/  ubuntu)
Access: 2020-03-13 12:59:58.260774025 +0900
Modify: 2020-03-13 12:59:46.140978219 +0900
Change: 2020-03-13 12:59:46.140978219 +0900
 Birth: -
  File: /mnt/solana-data/tds1.0.6/verify/snapshot-2980756-5WTADy6e5LDbkp8nMSC9ktKEXqwaWunUSYEqXoPZbtv2.tar.bz2
  Size: 9969401         Blocks: 19472      IO Block: 4096   regular file
Device: 35h/53d Inode: 6820768     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  ubuntu)   Gid: ( 1000/  ubuntu)
Access: 2020-03-13 12:04:11.341448829 +0900
Modify: 2020-03-13 12:04:02.121603381 +0900
Change: 2020-03-13 12:04:02.121603381 +0900
 Birth: -

@ryoqun
Copy link
Member Author

ryoqun commented Mar 13, 2020

@mvines
Copy link
Member

mvines commented Mar 13, 2020

That is amazing!!

@ryoqun ryoqun merged commit 4bbf09f into solana-labs:master Mar 13, 2020
mergify bot pushed a commit that referenced this pull request Mar 13, 2020
* Enable conservative out-of-bound snapshot cleaning

* Add tests

(cherry picked from commit 4bbf09f)
@ryoqun
Copy link
Member Author

ryoqun commented Mar 13, 2020

I also tested against the SLP cluster. And there was no problem, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants