[wip] Correctly purge zero lamport accounts #7013

ryoqun · 2019-11-18T17:02:10Z

[READ FIRST] REVIEW POINTS

This is still a draft. Code is full of any kinds of dirtiness/roughness. It seems the current impl mostly works as intended.

I want reviews for following points to be sure I can continue to the correct direction before wasting time for wrong direction:

The descriptions of following problem and solution.
This new purging logic.
This new validating logic.

Problem

Sometimes snapshot restore fails (#6890). At first I thought this is a vote-account-specific issue. But it turned out this is not correct. The more general problem description is that snapshots restoration can't handle any kinds of accounts with 0 lamport balance at the moment, albeit prior attempt to fix this.
This means this can easily be triggered by just running solana pay XXXXXX 0 lamports. (By the way, solana pay should at least warn for such a meaningless transfer of 0 lamports...)

The problem is that removed zero lamport account can reappear when restoring from snapshots, although we intended to purge them when creating snapshots.
As for the in-process state consistency, validator correctly purges zero accounts by purge_zero_lamport_accounts. And indeed, it can successfully do this periodically while creating newer snapshots until it restarts, because the running validator never actually uses the snapshot.

Then, when restoring from snapshot after simple restart or by different validator startup, the solana-validator loads the snapshot. Such bad snapshots aborts the restoration.

The reason zero lamport accounts can reappear is that while purge_zero_lamport_accounts does correctly purges zero lamport accounts from the in-process AccountsIndex, it fails to update the ref-counted AccountStorage correctly.

Solution

First, leaving zero lamport accounts as is in the AccountStorage is not acceptable. Because this allows malicious snapshot-serving validators enlarge the active dataset of account db as they like via a crafted snapshot. Also they can shield otherwise legitimate accounts as zero balance.

On the other hand, merely decrementing the ref-count of AccountStorage is not sufficient with the hope the GC nature of AccountStorage solves everything for us, because accounts are first grouped, then those groups are ref-counted as the minimum unit. When the group containing zero lamport account contains other active account, the group as a whole can not be purged.

Also, it's preferable to preserve copy-on-write semantics wherever possible for sound solutions. So inplace update isn't well welcomed.

In these constraints, I'd like to propose to retain the zero lamport account with greatest slot value, while purging older ones.

This rather not a straight fix. This is because there isn't a good way to convey this info (the purged-ness of accounts) to next invocation or joining validator.

So, in my solution, the retained zero lamport account works as special marker as the purged entry. And older account entries shadowed by copy-on-writes will eventually removed as the cluster moves the state forward and uses fewer old slots, then finally the special marker itself can be removed. I think this introduces exploitable fragmentation like normal accounts can cause today. So for that fix, we'll need a genral compaction mechanism.

Other possible solutions

In-place overwrite of accounts data? (maybe bad for performance, rollback, bankhash, security)
Introduce some end-of-life marker for zero-lamport accounts? (what about bankhash?, moderate cost to implement =~ considerable risks for more bugs for such low-level system component.)
Teach serializer/deserializer to specially handle accounts marked as purged with a tombstone? (This might be a leaky abstraction; on top of that, the same applies for impl cost and risks.)
start serializing/deserializing the index as well (takes more time for snapshot creation).

ryoqun · 2019-11-18T17:05:12Z

runtime/src/accounts_db.rs

@@ -841,6 +842,25 @@ impl AccountsDB {
        infos
    }

+    pub fn verify_zero_lamports(&self, _slot: Slot, ancestors: &HashMap<Slot, usize>) -> bool {


This moved from here.

ryoqun · 2019-11-18T17:12:15Z

runtime/src/accounts_db.rs

+                        let list = accounts_index.account_maps.get(&pubkey).unwrap().read().unwrap();
+                        let list: Vec<(Slot, AccountInfo)> = list.iter().filter(|(slot, _)| accounts_index.is_root(*slot)).cloned().collect();
+                        error!("ryoqun had zero?: {}, {:?}, {}, list: {}", pubkey, account, _b, list.len());
+                        if list.len() > 1 {


By this way, I think we can prevent attackers from shielding legitimate accounts as zero balance.

ryoqun · 2019-11-18T17:13:49Z

cli/src/cluster_query.rs

@@ -309,12 +309,13 @@ pub fn process_ping(
    let mut submit_count = 0;
    let mut confirmed_count = 0;
    let mut confirmation_time: VecDeque<u64> = VecDeque::with_capacity(1024);
+    let lamport = 0;


these are needed for testing only. I'll create separate PR for this function addition to solana ping.

Btw, this is done #7029

ryoqun · 2019-11-21T09:35:19Z

ledger/src/snapshot_utils.rs

@@ -224,6 +226,7 @@ pub fn bank_from_archive<P: AsRef<Path>>(
        unpacked_accounts_dir,
    )?;

+    bank.purge_zero_lamport_accounts();


As discussed, now we both do purge zero lamport accounts when saving and loading snapshots.

In contract to this, this call cannot be omitted for security reasons.

ryoqun · 2019-11-21T09:37:33Z

runtime/src/accounts_db.rs

+
+    pub fn restore_account_count(&self) {
+        let mut count_and_status = self.count_and_status.write().unwrap();
+        *count_and_status = (self.all_existing_accounts().len(), count_and_status.1);


This effectively makes deseriazation ignore count_and_status in snapshot.

TODO: rewrote code to avoid this unneeded serialization first of all.

ryoqun · 2019-11-21T09:40:24Z

runtime/src/accounts_db.rs

        self.cleanup_dead_slots(&mut dead_slots, last_root);
+        error!("ryoqun: zla dead slots 2 {:?}", dead_slots);
+        for slot in dead_slots {
+            self.purge_slot(slot);


Not totally sure, but this is needed?

ryoqun · 2019-11-21T09:41:48Z

runtime/src/accounts_db.rs

+                .cloned()
+                .collect();
+            for storage in storage_maps.into_iter() {
+                storage.restore_account_count();


Note this comment.

This code is tightly paired with this code.

ryoqun · 2019-11-21T09:43:39Z

runtime/src/accounts_db.rs

+                                slot_id, store.slot_id,
+                                "AccountDB::accounts_index corrupted. Storage should only point to one slot"
+                            );
+                            store.remove_account();


This maintains the count of count_and_status of for updated non-zero-lamports accounts.

sakridge · 2019-11-21T21:07:23Z

runtime/src/accounts_db.rs

+
+    pub fn restore_account_count(&self) {
+        let mut count_and_status = self.count_and_status.write().unwrap();
+        let new_count = self.all_existing_accounts().len();


I don't think this is right actually. This version of the account could be masked by a version of the account on another fork. This should only count accounts where the latest version is represented in this append vec.

Thanks for early preview!

Yes, your concern is right.

To make sure not to count older versions of accounts, we gradually decrease the count there by calling remove_account, once increased to the full count of stored accounts here. I think calculating the right count here is difficult because this method can't tell there are any other newer representation of accounts if any.

As for the freshness check of the actual account contents like the counts, we realize it by replacing older account with new one in chronological order when generating the index as done by the original implementation. The handling is similar to counts.

Yeah, code is a bit difficult to read. Maybe I can add comment to this method, or moreover directly embed the code into generate_index as this is the sole caller of this (and should be in the future). Even if we completely transition not to serialize the count, I'll leave a comment explaining that.

codecov · 2019-11-21T23:22:22Z

Codecov Report

Merging #7013 into master will increase coverage by 9.3%.
The diff coverage is 95.9%.

@@           Coverage Diff            @@
##           master   #7013     +/-   ##
========================================
+ Coverage    70.1%   79.4%   +9.3%     
========================================
  Files         230     230             
  Lines       51040   45317   -5723     
========================================
+ Hits        35782   36003    +221     
+ Misses      15258    9314   -5944

ryoqun · 2019-11-22T02:56:06Z

ledger/src/snapshot_utils.rs

@@ -151,6 +151,7 @@ where
 }

 pub fn add_snapshot<P: AsRef<Path>>(snapshot_path: P, bank: &Bank) -> Result<()> {
+    trace!("ryoqun saving snapshot");
    bank.purge_zero_lamport_accounts();


Note: We can reduce the frequency of purge_zero_lamport_accounts invocations for the performance gains because the invocation is completely optional for creating snapshots other than freeing up some memory consumed by zero-lamport accounts.

ryoqun · 2019-11-22T03:26:03Z

runtime/src/accounts_db.rs

+                        storage.all_existing_accounts().iter().for_each(|a| {
+                            if a.meta.pubkey == *pubkey
+                                && *version != a.meta.write_version
+                                && storage.remove_account() == 0


This way we reduce count as many as the number of older accounts.

ryoqun · 2019-11-22T03:33:34Z

@sakridge Finally, I settled on the algorithms for the count reconstruction when loading snapshots and got the CI green. It seems this impl now is working fine locally. I've also replied your comment. Does this PR looks right? There are still rough edges.

TODOs (Once the soundness of this PR confirmed)

Update PR description.
Stop serializing the count.
Clean up unit tests/impl.
Add more unit tests and local-cluster tests.

sakridge · 2019-11-22T22:37:17Z

@sakridge Finally, I settled on the algorithms for the count reconstruction when loading snapshots and got the CI green. It seems this impl now is working fine locally. I've also replied your comment. Does this PR looks right? There are still rough edges.

TODOs (Once the soundness of this PR confirmed)

Update PR description.

Stop serializing the count.

Clean up unit tests/impl.

Add more unit tests and local-cluster tests.

@ryoqun Is there a difference between this logic and just fixing up the count_and_status beforehand and then doing the clean as in #7010

If that solution is equivalent it seems like a lot less code and logic than this and also you have some very expensive loops over all account vecs and the accounts_index pubkey map. I think some of those could be reduced, but if we can share the purge logic vs. having the separate generate_index logic do it, then I think it would be preferable.

ryoqun · 2019-11-24T23:54:30Z

@ryoqun Is there a difference between this logic and just fixing up the count_and_status beforehand and then doing the clean as in #7010

@sakridge There is no difference between this logic and the final form of the PR #7010. Your solution is better than mine. :)

I misunderstood that we must precisely manage the count in the AccountStorageEntry inbound/online to avoid resource leaks between validator's restarts, but it seems this is not hard requirement after all. It will be just enough for us to have out-of-bound/offline vacuum tool (if needed in the future) for cleaning up unused AppendVecs, which got never referenced from the current account index.

So, I'm closing this. I'm really confident the fix by you (#7010) does fixes the problem #6890.

Thanks for working on this by actually writing better solution!

sakridge · 2019-11-25T16:16:02Z

@ryoqun Is there a difference between this logic and just fixing up the count_and_status beforehand and then doing the clean as in #7010

@sakridge There is no difference between this logic and the final form of the PR #7010. Your solution is better than mine. :)

I misunderstood that we must precisely manage the count in the AccountStorageEntry inbound/online to avoid resource leaks between validator's restarts, but it seems this is not hard requirement after all. It will be just enough for us to have out-of-bound/offline vacuum tool (if needed in the future) for cleaning up unused AppendVecs, which got never referenced from the current account index.

So, I'm closing this. I'm really confident the fix by you (#7010) does fixes the problem #6890.

Thanks for working on this by actually writing better solution!

Ok. Thanks so much for your help root-causing the issue. That really helped.

ryoqun · 2019-11-26T17:07:53Z

runtime/src/accounts_db.rs

+    }
+
+    #[test]
+    fn test_accounts_db_serialize_zero_and_free() {


It passes. :)

Even after rebase to the current master HEAD. :)

ryoqun · 2019-11-26T20:17:09Z

runtime/src/accounts_db.rs

+        let mut current_slot = 1;
+        accounts.store(current_slot, &[(&pubkey, &account)]);
+        //accounts.store(current_slot, &[(&pubkey2, &account)]); // <= Enabling this line causes even my PR fails...
+        error!("#1: {:#?}", accounts.get_storage_entries()); // #1


Steps how things to evolve!

#1: Add a test account with some lamports at slot1: The account data is stored into the index and new storage entry is created with slot: 0 and count: 1, status: Available. #2: Make the test account unfunded at slot2: The account data is stored into the index and another new storage entry is created with slot: 1 and count: 1, status: Available. #3: Fill the StorageEnty created at #2 with full of outdated accont datas: To make that happen, repeatedly store another unrelated accounts, resulting the StorageEntry with slot: 1, count: 1, status: Full and new StorageEntry containing spilled ones. #4: Assertion of existing of zero lamport account succeeds: At this moment, the index holds the account data (lamport=0) inserted at #2; #5: Purge the zero lamport account (Simulate call of this function from periodical snapshot creation) #6: on master: Because #5's purge decrements the #2 StorageEntry's count to 0, it's got flushed, resulting with count: 0, status: Available In my PR: dead_slots are purged in purge_zero_lamport_accounts, #1's StorageEntry is also purged.) #7: on master: StorageEntry from #1 is alive. in my pr: all of slots containg the test account is now refreshed as blank.

@sakridge I found these costly recalculation of count doesn't really help much.. I found this not-passing test scenario even with the impl.

So, yet more correct solution would be stop doing store.remove_account() for zero lamport accounts without any complex things.

ryoqun · 2019-12-03T02:15:18Z

(I will close this PR once we're really sure we fixed the stability issue of snapshots after some idling at the TdS cluster)

ryoqun · 2019-12-03T02:16:14Z

(I will close this PR once we're really sure we fixed the stability issue of snapshots after some idling at the TdS cluster)

Err, well I clicked the close button. Ok, close 'em all!

ryoqun requested review from sakridge and mvines November 18, 2019 17:02

ryoqun commented Nov 18, 2019

View reviewed changes

ryoqun force-pushed the no-bad-account-in-snapshot branch from 5571cae to 6c258bb Compare November 21, 2019 09:31

ryoqun commented Nov 21, 2019

View reviewed changes

sakridge reviewed Nov 21, 2019

View reviewed changes

ryoqun commented Nov 22, 2019

View reviewed changes

ryoqun closed this Nov 24, 2019

ryoqun mentioned this pull request Nov 24, 2019

Fixup accounts count when importing snapshot append vecs #7010

Merged

ryoqun mentioned this pull request Nov 26, 2019

[wip] Add failing test case to spot an accountsDb corruption bug #7143

Closed

ryoqun reopened this Nov 26, 2019

ryoqun commented Nov 26, 2019

View reviewed changes

ryoqun added 6 commits November 27, 2019 02:11

save

c6bdaac

finally working test

206a098

finally working test

53edd4a

finally working test

091c2ce

finally working test

0c21d4f

finally working test

bdc6cf6

ryoqun added 23 commits November 27, 2019 02:11

rustfmt & clippy

d273b15

Fix test

4c589c9

Don't use XXX

bbcfd87

Don't use XXX

e440ec2

just save

5a5b879

just save

fe1eb8e

just save

244e194

just save

794f45f

just save

c15e019

just save

1e21ede

just save

354f6f0

just save

28b537f

just save

0a48cc7

just save

be4e3fd

rustfmt & clippy

f5c77fa

clean up

27509b7

clean up

0fa3a31

rustfmt & clippy

72fca15

Fix test...

646679e

Remove comment...

525645a

backported test_accounts_db_serialize_zero_and_free

198be91

Make actually backported test pass

b891f59

rustfmt & clippy

8b54fba

ryoqun force-pushed the no-bad-account-in-snapshot branch from f51930a to 8b54fba Compare November 26, 2019 17:20

ryoqun added 2 commits November 27, 2019 04:36

Add debugs

4c62be8

Add failing case comment

3fc836c

ryoqun commented Nov 26, 2019

View reviewed changes

ryoqun closed this Dec 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] Correctly purge zero lamport accounts #7013

[wip] Correctly purge zero lamport accounts #7013

ryoqun commented Nov 18, 2019 •

edited

Loading

ryoqun Nov 18, 2019

ryoqun Nov 18, 2019

ryoqun Nov 18, 2019

ryoqun Nov 21, 2019

ryoqun Nov 21, 2019

ryoqun Nov 22, 2019

ryoqun Nov 21, 2019

ryoqun Nov 21, 2019

ryoqun Nov 21, 2019

ryoqun Nov 21, 2019

ryoqun Nov 22, 2019

ryoqun Nov 21, 2019

sakridge Nov 21, 2019 •

edited

Loading

ryoqun Nov 21, 2019 •

edited

Loading

codecov bot commented Nov 21, 2019 •

edited

Loading

ryoqun Nov 22, 2019 •

edited

Loading

ryoqun Nov 22, 2019

ryoqun commented Nov 22, 2019

sakridge commented Nov 22, 2019

ryoqun commented Nov 24, 2019

sakridge commented Nov 25, 2019

ryoqun Nov 26, 2019

ryoqun Nov 26, 2019

ryoqun Nov 26, 2019

ryoqun Nov 26, 2019 •

edited

Loading

ryoqun Nov 26, 2019

ryoqun commented Dec 3, 2019

ryoqun commented Dec 3, 2019

[wip] Correctly purge zero lamport accounts #7013

[wip] Correctly purge zero lamport accounts #7013

Conversation

ryoqun commented Nov 18, 2019 • edited Loading

[READ FIRST] REVIEW POINTS

Problem

Solution

Other possible solutions

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sakridge Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

ryoqun Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Nov 21, 2019 • edited Loading

Codecov Report

ryoqun Nov 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun commented Nov 22, 2019

sakridge commented Nov 22, 2019

ryoqun commented Nov 24, 2019

sakridge commented Nov 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun Nov 26, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryoqun commented Dec 3, 2019

ryoqun commented Dec 3, 2019

ryoqun commented Nov 18, 2019 •

edited

Loading

sakridge Nov 21, 2019 •

edited

Loading

ryoqun Nov 21, 2019 •

edited

Loading

codecov bot commented Nov 21, 2019 •

edited

Loading

ryoqun Nov 22, 2019 •

edited

Loading

ryoqun Nov 26, 2019 •

edited

Loading