Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secure sysvars under hash by freezing all strictly #7892

Merged
merged 4 commits into from
Jan 24, 2020

Conversation

ryoqun
Copy link
Member

@ryoqun ryoqun commented Jan 21, 2020

Problem

sysvar account data inside snapshots can be changed without affecting the bank hash (meaning being unable to detect malicious snapshots) because it's excluded from the bank hash calculation.
it's a security risk.
Also, it's desirable to save sysvars in the snapshot instead of not doing so because we can't always dynamically create all of sysvars when starting a validator. Not hashing some of them would relax us from creating deterministically serialized sysvar data; However I've opted not to do so because otherwise we can't guarantee the exact sysvar state when replaying the ledger when debugging ledger inconsistency. And general consistency with normal accounts.

I'm not completely sure the exact reason for the unconditional special handling of all sysvar's exclusion from the bank hash, but I think this is due to some outdated implementation restriction for the historical reasons as far as I glanced the history of git/discord. So, just remove the special handling altogether! :)

Summary of changes

Remove the special handling by adjusting sysvar updates and include sysvar accounts into the bank hash calcuration like any other accounts, resulting in beautifier design. :)

To make it possible:

  • delay the slot hashes sysvar's update from inside Bank::freeze() to Bank::new_from_parent() like other sysvar updates.
  • Correctly carry previous stateful sysvar's account hash by defining Bank::update_sysvar_account().
  • Minor cleanups: unify set_hash and freeze

Todo/Notes

  • Write more tests
  • Additional cost for sorting and hashing shouldn't be problem?

part of #7167

@ryoqun ryoqun changed the title Secure sysvars under hash by freezing all strictly [wip] Secure sysvars under hash by freezing all strictly Jan 21, 2020
@ryoqun ryoqun requested a review from sakridge January 21, 2020 09:41

pub fn freeze(&self) {
if self.set_hash() {
self.update_slot_hashes();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating slot_hashes here results in mutating bank_hash of frozen slot, which complicates things. So just delay updating.

@@ -442,6 +442,7 @@ impl Bank {
new.ancestors.insert(p.slot(), i + 1);
});

new.update_slot_hashes(parent.slot(), parent.hash());
Copy link
Member Author

@ryoqun ryoqun Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, this parent bank's book keeping is duplicated across all the child (= forked) banks; but I think this is a justifiable trade-off to avoid complicated mutable bank hash management otherwise.

runtime/src/accounts_db.rs Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 21, 2020

Codecov Report

Merging #7892 into master will increase coverage by <.1%.
The diff coverage is 99.2%.

@@           Coverage Diff            @@
##           master   #7892     +/-   ##
========================================
+ Coverage    81.9%   81.9%   +<.1%     
========================================
  Files         243     243             
  Lines       52568   52621     +53     
========================================
+ Hits        43075   43120     +45     
- Misses       9493    9501      +8

@ryoqun
Copy link
Member Author

ryoqun commented Jan 22, 2020

Hmm, CI passed albeit new errors are observed from buildkite/solana/local-cluster

test test_snapshots_blockstore_floor ... [2020-01-22T04:12:57.853893756Z ERROR solana_local_cluster::local_cluster] leader_pubkey: E7qrNstNKodcMerczZwgnWRYuvo6UyRS5jPLx4J3oKqo
[2020-01-22T04:13:17.950725895Z WARN  solana_vote_program::vote_state] E7qrNstNKodcMerczZwgnWRYuvo6UyRS5jPLx4J3oKqo dropped vote Vote { slots: [45], hash: 3YmawEYV6UiSAkLa71bjVvpqM9cDhBdDdL3NhAL5garR, timestamp: None } failed to match hash 3YmawEYV6UiSAkLa71bjVvpqM9cDhBdDdL3NhAL5garR 4B8aRYtteNBGZQJWcgSqQS6eFFYgzM7ApUoxQyB9dyrf
[2020-01-22T04:13:17.953982789Z WARN  solana_vote_program::vote_state] E7qrNstNKodcMerczZwgnWRYuvo6UyRS5jPLx4J3oKqo dropped vote Vote { slots: [46], hash: DDHRPN1ZjexvEEuEnTdtqrVmRuPfN6SW6nstK3d2umkV, timestamp: None } failed to match hash DDHRPN1ZjexvEEuEnTdtqrVmRuPfN6SW6nstK3d2umkV 6TwdnysCSaz1CrdqmPuawFmyJdxVGtXfgzao66ySyGQJ
[2020-01-22T04:13:17.957110883Z WARN  solana_vote_program::vote_state] E7qrNstNKodcMerczZwgnWRYuvo6UyRS5jPLx4J3oKqo dropped vote Vote { slots: [47], hash: DuToJZen5eAfzWjm3tJTUR9NqktZ4K92WEHT9qB44TNm, timestamp: None } failed to match hash DuToJZen5eAfzWjm3tJTUR9NqktZ4K92WEHT9qB44TNm 7ueej4PEfGfpXE9xkUPeJ61Buw7taoHUhhnteJf8rtGC
[2020-01-22T04:13:17.959878577Z WARN solana_vote_program::vote_state] E7qrNstNKodcMerczZwgnWRYuvo6UyRS5jPLx4J3oKqo dropped vote Vote { slots: [48], hash: BmN8EvbBBcXqnRGSU95bfvMdYjsvhxXHY8xBU7zP8RLi, timestamp: None } failed to match hash BmN8EvbBBcXqnRGSU95bfvMdYjsvhxXHY8xBU7zP8RLi 7tgMwhUcLmPMKModR3sRD5LspCan8cBpsEwwNUUMGR1E

@ryoqun
Copy link
Member Author

ryoqun commented Jan 22, 2020

CC: @rob-solana (I'm guessing you're the sysvar expert.)

@rob-solana
Copy link
Contributor

nice work!

with moving slothash updates to the child, you can remove the code that skips sysvars in hashing, then this PR gets nice and small, and we're removing code (yay!)

@ryoqun
Copy link
Member Author

ryoqun commented Jan 22, 2020

@rob-solana Thanks for confirming!

Sadly, I can't remove the code that skips sysvars in hashing as long as recent_blockhashes isn't deterministic. It changes the contained set of slots differing with older slots. Originally, I thought this is because each validator can have possibly different vote history, so this is plausible. But this occurs even by running the solana-ledger-tool verify....

I think this is very odd or moreover a bug? I think recent blockhashes must be deterministic otherwise old TXes are rejected sometimes and not othertimes

First run:

ryoqun@ubuqun:~/work/solana/solana$ RUST_LOG=solana_runtime=debug,warn solana-ledger-tool verify --ledger config/ledger/ 2>&1 | grep "slot: 264, pubkey: SysvarRecentB1ockHashes11111111111111111111" -A33
[2020-01-22T07:55:05.793739519Z DEBUG solana_runtime::bank] get_sysvar_account: slot: 264, pubkey: SysvarRecentB1ockHashes11111111111111111111, account: None
recent_blockhashes 264
recent_blockhashes 263
recent_blockhashes 262
recent_blockhashes 260
recent_blockhashes 259
recent_blockhashes 261
recent_blockhashes 254
recent_blockhashes 257
recent_blockhashes 256
recent_blockhashes 258
recent_blockhashes 240
recent_blockhashes 251
recent_blockhashes 255
recent_blockhashes 248
recent_blockhashes 252
recent_blockhashes 227
recent_blockhashes 222
recent_blockhashes 250
recent_blockhashes 253
recent_blockhashes 235
recent_blockhashes 231
recent_blockhashes 234
recent_blockhashes 221
recent_blockhashes 249
recent_blockhashes 225
recent_blockhashes 242
recent_blockhashes 233
recent_blockhashes 232
recent_blockhashes 244
recent_blockhashes 243
recent_blockhashes 246
recent_blockhashes 204
recent_blockhashes: RecentBlockhashes(

Next run:

ryoqun@ubuqun:~/work/solana/solana$ RUST_LOG=solana_runtime=debug,warn solana-ledger-tool verify --ledger config/ledger/ 2>&1 | grep "slot: 264, pubkey: SysvarRecentB1ockHashes11111111111111111111" -A33
[2020-01-22T07:56:14.758892926Z DEBUG solana_runtime::bank] get_sysvar_account: slot: 264, pubkey: SysvarRecentB1ockHashes11111111111111111111, account: None
recent_blockhashes 264
recent_blockhashes 263
recent_blockhashes 262
recent_blockhashes 261
recent_blockhashes 260
recent_blockhashes 258
recent_blockhashes 259
recent_blockhashes 233
recent_blockhashes 255
recent_blockhashes 232
recent_blockhashes 254
recent_blockhashes 257
recent_blockhashes 250
recent_blockhashes 256
recent_blockhashes 252
recent_blockhashes 226
recent_blockhashes 217
recent_blockhashes 249
recent_blockhashes 242
recent_blockhashes 228
recent_blockhashes 227
recent_blockhashes 238
recent_blockhashes 247
recent_blockhashes 246
recent_blockhashes 253
recent_blockhashes 237
recent_blockhashes 208
recent_blockhashes 206
recent_blockhashes 245
recent_blockhashes 248
recent_blockhashes 243
recent_blockhashes 222
recent_blockhashes: RecentBlockhashes(

Really odd... I can get a different set of slots for the recent block hashes for the same slot 264 everytime I ran solana-ledger-tool verify.

@rob-solana
Copy link
Contributor

recent_blockhashes needs to be deterministic, that's a bug

@ryoqun
Copy link
Member Author

ryoqun commented Jan 22, 2020

recent_blockhashes needs to be deterministic, that's a bug

I see! Then I'll work on #7918 first and revert 5890a72 and 3e8810a

@ryoqun ryoqun force-pushed the snapshot-sysvar-hardening branch from 3e8810a to a2d2413 Compare January 23, 2020 03:18
@ryoqun
Copy link
Member Author

ryoqun commented Jan 23, 2020

recent_blockhashes needs to be deterministic, that's a bug

I see! Then I'll work on #7918 first and revert 5890a72 and 3e8810a

Rebased on the now-merged #7918.

@ryoqun ryoqun changed the title [wip] Secure sysvars under hash by freezing all strictly Secure sysvars under hash by freezing all strictly Jan 23, 2020
@ryoqun ryoqun marked this pull request as ready for review January 23, 2020 10:30
@ryoqun
Copy link
Member Author

ryoqun commented Jan 23, 2020

@sakridge Every remaining work is done; I'm pretty sure this is having the prime time for the review! :)


// Normally, just use the hash from parent slot. However, use the
// existing stored hash if any for the sake of bank hash's idempotent.
if let Some((hot_account, _)) = self.get_account_modified_since_parent(pubkey) {
Copy link
Member Author

@ryoqun ryoqun Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, I just camed up this naming and like it. new_account <=> (old_account == {hot -> cold -> frozen}_account). So, if this is agreeable, I'll rename get_account_modified_since_parent to get_hot_account().

@mvines @garious Any opnitions? Mention based on https://github.com/solana-labs/solana/pull/2806/files#diff-7145777648f8c12caeb15b4e1755215cR560 #2806

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find the hot/cold labeling here intuitive. If I didn't have the additional context that a hot account is an account_modified_since_parent I don't think I would have guessed that on my own.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your opinion! Then, I just reverted the naming thing: 4f1fe0a

@ryoqun
Copy link
Member Author

ryoqun commented Jan 23, 2020

* [ ]  Additional cost for sorting and hashing shouldn't be problem?

Sorting thing is now reverted; and as far as I can tell from the quoted log of an idling validator, hashing of this size per a slot should not be a problem at all. :)

[2020-01-23T10:54:12.170350774Z INFO  solana_runtime::bank] accounts hash slot: 200 stats: BankHashStats { num_removed_accounts: 8, num_added_accounts: 0, num_lamports_stored: 1103457792838, total_data_len: 156552, num_executable_accounts: 0 }

@@ -38,7 +38,6 @@ use solana_sdk::bank_hash::BankHash;
use solana_sdk::clock::{Epoch, Slot};
use solana_sdk::hash::{Hash, Hasher};
use solana_sdk::pubkey::Pubkey;
use solana_sdk::sysvar;
Copy link
Member Author

@ryoqun ryoqun Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bye bye, leaky abstraction; Feels so good. :)

#[should_panic]
fn test_accounts_empty_account_bank_hash() {
let accounts = Accounts::new(Vec::new());
accounts.store_slow(0, &Pubkey::default(), &Account::new(1, 0, &sysvar::id()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sysvars are now the first citizen in the AccountsDB land. :)

@@ -759,6 +758,14 @@ impl AccountsDB {
let hash_info = bank_hashes
.get(&parent_slot)
.expect("accounts_db::set_hash::no parent slot");
if bank_hashes.get(&slot).is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this additional line in scope for this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rob-solana Yes!

First, this PR added additional tests for the lazy initialization of slot hashes sysvar around here. That exact line internally hits here and bails out with this if condition. So, this if is needed to pass the test correctly. Otherwise, this following assertion for idempotent bank hash would fail because set_file without this if would incorrectly reset the bank_hash for the tested slot, which is shared by bank2 and bank3.

I know such sharing of a slot between child banks doesn't occur in the real validator behavior. Thus, I alternatively could replace this error! with panic! and adjust all tests. But it takes extra effort and introduces an artifical limitation to AccountsDB: Currently, all of other AccountsDB code doesn't forbid such slot sharing, it just works. Only this part didn't work; So I wanted to preserve that flexible property and managed to do that with this tiny additional if clause.

In other words, AccountsDB isn't aware of anything called forks and free of leaky abstraction now and forever. I think that's nice and beautiful design and separation of concern. :)

Copy link
Contributor

@rob-solana rob-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

sysvars' store(create_account) pattern was expedient when bank hashes were actually hashes.

Now that bank hashes are xors (and/or changing to RSA inclusion proofs), we might consider changing sysvars' pattern to update a mutable account, just to alleviate confusion for people visiting this code in the future. sysvars really jump through some hoops to get updated now...

@ryoqun
Copy link
Member Author

ryoqun commented Jan 24, 2020

@rob-solana Thanks for reviewing and approvals!

FYI: @sakridge I'm going to merge this, although I originally added you as a reviewer. :)

sysvars' store(create_account) pattern was expedient when bank hashes were actually hashes.

Now that bank hashes are xors (and/or changing to RSA inclusion proofs), we might consider changing sysvars' pattern to update a mutable account, just to alleviate confusion for people visiting this code in the future. sysvars really jump through some hoops to get updated now...

I see! Thanks for explaining! I'll keep that in mind. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants