Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break up AccountsIndex lock #12787

Merged
merged 1 commit into from
Oct 22, 2020
Merged

Conversation

carllin
Copy link
Contributor

@carllin carllin commented Oct 10, 2020

Problem

Holding AccountsIndex lock during scans blocks account stores

Summary of Changes

Builds on top of #12126

Benchmark bench_concurrent_scan_write here shows 1000 inserts of new pubkeys into the storage taking about 9-10ms (80% still waiting on the AccountsIndex write lock), even as the index/accountsdb storage size grows. Before this change, the inserts tarting taking hundreds of ms.

Maybe we would use something like a https://github.com/crossbeam-rs/crossbeam/tree/master/crossbeam-skiplist to optimize contention the core lock farther, but there does not seem to be any stable alternatives yet.
Fixes #

@carllin carllin requested a review from ryoqun October 10, 2020 03:33
@carllin carllin force-pushed the FixAccounts2 branch 2 times, most recently from daa7fe7 to ba3311d Compare October 13, 2020 04:17
runtime/Cargo.toml Outdated Show resolved Hide resolved
runtime/benches/accounts.rs Outdated Show resolved Hide resolved
runtime/benches/accounts.rs Outdated Show resolved Hide resolved
runtime/benches/accounts.rs Outdated Show resolved Hide resolved
runtime/benches/accounts.rs Outdated Show resolved Hide resolved
runtime/benches/accounts.rs Outdated Show resolved Hide resolved
Comment on lines 189 to 200
// Write to a different slot than the one being read from. Because
// there's a new account pubkey being written to every time, will
// compete for the accounts index lock on every store
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good comment!

runtime/benches/accounts.rs Outdated Show resolved Hide resolved
runtime/src/accounts.rs Outdated Show resolved Hide resolved
Comment on lines 48 to 31
#[derive(Clone, Debug)]
pub struct AccountMapEntry<T> {
ref_count: Arc<AtomicU64>,
pub slot_list: Arc<RwLock<SlotList<T>>>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder we could reduce Arc to 1 from 2 by line this, it's a bit hideous but maybe worth effort? (we should be keen the size of AcccontIndex memory footprint (Arc costs 2-word memory for each, so we're increasing entry size by 4 * usize here per Account.) and reduced general update cost from updating Arc twice everytime accessed).

pub struct AccountMapEntryInner<T> {
    ref_count: AtomicU64,
    pub slot_list: RwLock<SlotList<T>>,
}
type AccountMapEntry<T> = Arc<AccountMapEntryInner>;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoqun this is much cleaner with the ouroboros crate here: 970b2c7

@ryoqun
Copy link
Member

ryoqun commented Oct 13, 2020

Benchmark bench_concurrent_scan_write here shows 1000 inserts of new pubkeys into the storage taking about 9-10ms (80% still waiting on the AccountsIndex write lock), even as the index/accountsdb storage size grows. Before this change, the inserts tarting taking hundreds of ms.

I'm really happy to see this report. :)

How about any perf changes of colo-gpu-perf-4-val-1-client.yml with this(#12787) & previous dashmap (#12126), compared to the base commit?

@ryoqun
Copy link
Member

ryoqun commented Oct 13, 2020

first-pass of review done. :)

@carllin carllin force-pushed the FixAccounts2 branch 2 times, most recently from c9e2367 to 256a009 Compare October 14, 2020 10:06
runtime/benches/accounts.rs Outdated Show resolved Hide resolved
@carllin carllin force-pushed the FixAccounts2 branch 7 times, most recently from a3c18af to 00fbb85 Compare October 15, 2020 01:24
@codecov
Copy link

codecov bot commented Oct 15, 2020

Codecov Report

Merging #12787 into master will increase coverage by 0.0%.
The diff coverage is 99.5%.

@@           Coverage Diff            @@
##           master   #12787    +/-   ##
========================================
  Coverage    82.1%    82.1%            
========================================
  Files         366      366            
  Lines       86103    86237   +134     
========================================
+ Hits        70739    70872   +133     
- Misses      15364    15365     +1     

Comment on lines 114 to 116
// already present, then the function will return back Some(account_info) which
// the caller can then take the write lock and do an 'insert' with the item.
// It returns None if the item is already present and thus successfully updated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated comment?

@ryoqun
Copy link
Member

ryoqun commented Oct 16, 2020

@ryoqun been running for past 20 hours or so with snapshot verification: https://metrics.solana.com:3000/d/monitor-edge/cluster-telemetry-edge?orgId=2&from=1602723051575&to=1602796347486&panelId=35&fullscreen&var-datasource=Solana%20Metrics%20(read-only)&var-testnet=mainnet-beta&var-hostid=aeypRbHck9k5zGuftrdSaUMfjJyTrjtaGJEtALt835y, no issues yet!

cool. :)

also, had the validator been slummed by getProgramAccounts RPC reqs? and no slowdowns of replay and no odd panic!s?

How about any perf changes of colo-gpu-perf-4-val-1-client.yml with this(#12787) & previous dashmap (#12126), compared to the base commit?

As a ground wrapping-up of past various AccountsDB related improvements by you, how about writing like this? Too much hustle? This is done for AccountIndex's HashMap => BTreeMap change.:

#9527 (comment)

you can peek around this build jobs to know how to create these pdfs or ask me :) : https://buildkite.com/solana-labs/system-performance-tests/builds?branch=pull%2F9527%2Fhead

also, please take a look at my previous similar reports for inspiration:

#8436 (comment)
#8436 (comment)

pub fn handle_dead_keys(&self, dead_keys: Vec<Pubkey>) {
if !dead_keys.is_empty() {
for key in &dead_keys {
let mut w_index = self.account_maps.write().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just making sure: w_index is moved inside loop body to mitigate longer write lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah, it should be a pretty fast loop though, so maybe it's not necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I'm not sure too for the perf. difference.. Anyway, I just wanted to confirm this was somewhat intentional or just typo.

(w_account_entry.unwrap(), is_newly_inserted)
}

pub fn handle_dead_keys(&self, dead_keys: Vec<Pubkey>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: how about taking a &[Pubkey]? I donno why clippy didn't warn for this. ;)

Comment on lines 290 to 258
if let Some(account_entry) = w_index.get(key) {
if account_entry.slot_list.read().unwrap().is_empty() {
w_index.remove(key);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: I think we could reduce the lookup from 2 times to once like this:

if let Some(index_entry) = w_index.entry(key) {
    if (...) {
        index_entry.remove();
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slight variation on this using Occupied instead of Some, but updated!

@ryoqun
Copy link
Member

ryoqun commented Oct 16, 2020

2nd review pass is done. I think this is really reaching to the lgtm. Thanks for defending my wave of nits attacks patiently. Well, I'm very tend to be interrupted by other concurrent tasks..

@carllin carllin force-pushed the FixAccounts2 branch 3 times, most recently from f79529f to 66e67f3 Compare October 16, 2020 07:32
@ryoqun
Copy link
Member

ryoqun commented Oct 18, 2020

@ryoqun, I couldn't get the PDF's to scale properly

Thanks for running these perf results, anyway! I created them for future reference.

break-up-accounts-index-before.pdf
break-up-accounts-index-after.pdf

I'm still looking the result. But, in general, there should be no bad perf. degradation and spike reduced a lot for some reason. Maybe the root contributor to it is the reduction of squashing, which I didn't expected but a good outcome. :)

Comment on lines 744 to 747
// Assertion enforced by `accounts_index.get()`, the latest slot
// will not be greater than the given `max_clean_root`
if let Some(max_clean_root) = max_clean_root {
assert!(*slot <= max_clean_root);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: this assert! is no longer needed?

Copy link
Contributor Author

@carllin carllin Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh bad merge thanks! re-added!

ryoqun
ryoqun previously approved these changes Oct 18, 2020
Copy link
Member

@ryoqun ryoqun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits!

Thanks for working so hard on AccountsDB.

@carllin carllin merged commit e6b821c into solana-labs:master Oct 22, 2020
@carllin carllin added the v1.4 label Oct 28, 2020
mergify bot pushed a commit that referenced this pull request Oct 28, 2020
Co-authored-by: Carl Lin <carl@solana.com>
(cherry picked from commit e6b821c)

# Conflicts:
#	runtime/src/accounts.rs
#	runtime/src/accounts_db.rs
carllin added a commit that referenced this pull request Oct 28, 2020
Co-authored-by: Carl Lin <carl@solana.com>
carllin added a commit that referenced this pull request Oct 28, 2020
Co-authored-by: Carl Lin <carl@solana.com>
mergify bot added a commit that referenced this pull request Oct 28, 2020
Co-authored-by: Carl Lin <carl@solana.com>

Co-authored-by: carllin <wumu727@gmail.com>
Co-authored-by: Carl Lin <carl@solana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants