Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch accounts storage lock to DashMap #12126

Merged
merged 9 commits into from
Oct 14, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

41 changes: 41 additions & 0 deletions programs/bpf/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions runtime/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ blake3 = "0.3.6"
bv = { version = "0.11.1", features = ["serde"] }
byteorder = "1.3.4"
bzip2 = "0.3.3"
dashmap = "3.11.10"
crossbeam-channel = "0.4"
dir-diff = "0.3.2"
flate2 = "1.0.14"
Expand Down
57 changes: 56 additions & 1 deletion runtime/benches/accounts.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

extern crate test;

use rand::Rng;
use solana_runtime::{
accounts::{create_test_accounts, Accounts},
bank::*,
Expand All @@ -11,7 +12,7 @@ use solana_sdk::{
genesis_config::{create_genesis_config, ClusterType},
pubkey::Pubkey,
};
use std::{path::PathBuf, sync::Arc};
use std::{collections::HashMap, path::PathBuf, sync::Arc, thread::Builder};
use test::Bencher;

fn deposit_many(bank: &Bank, pubkeys: &mut Vec<Pubkey>, num: usize) {
Expand Down Expand Up @@ -141,3 +142,57 @@ fn bench_delete_dependencies(bencher: &mut Bencher) {
accounts.accounts_db.clean_accounts(None);
});
}

#[bench]
#[ignore]
fn bench_concurrent_read_write(bencher: &mut Bencher) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, do you have any idea of how single-threaded throughput of writing is changed from std::collections::HashMap to DashMap? I think I'm worrying too much but I'd rather want to confirm how far does DashMap do well while supporting the concurrency via sharding. Maybe it's trading off maximum throughput by negligible margin?

Copy link
Member

@ryoqun ryoqun Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add more context, our basic tenet is batch them if we can, make it concurrent otherwise. so, I guess the upper layer is slumming the AccountsDB optimized for batching and the single threaded perf is kind of moderately related to the batching perf. Thus, we're somewhat sensitive to it.

Copy link
Contributor Author

@carllin carllin Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoqun that's a good point, I've run the benchmark here: https://github.com/xacrimon/conc-map-bench which has three different work profiles, exchange(read-write), cache (read-heavy), and rapid-grow (insert-heavy), more details can be found in that link above. The results for a single thread on my dev machine here: https://console.cloud.google.com/compute/instancesDetail/zones/us-west1-b/instances/carl-dev?project=principal-lane-200702&authuser=1

== cache
-- MutexStd
25165824 operations across 1 thread(s) in 14.941146454s; time/op = 593ns

-- DashMap
25165824 operations across 1 thread(s) in 15.589596263s; time/op = 619ns
==

== exchange
-- MutexStd
25165824 operations across 1 thread(s) in 20.954264682s; time/op = 831ns

-- DashMap
25165824 operations across 1 thread(s) in 20.875345754s; time/op = 828ns
==

== rapid grow
-- MutexStd
25165824 operations across 1 thread(s) in 20.22593938s; time/op = 802ns
==

-- DashMap
25165824 operations across 1 thread(s) in 17.456807471s; time/op = 693ns
==

Copy link
Member

@ryoqun ryoqun Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carllin that report is interesting. Thanks for running it and sharing this report. So, DashMap operations seem to be on par with not-batched Mutex<HashMap> operations as far as I read the code https://github.com/xacrimon/conc-map-bench/blob/master/src/adapters.rs#L40 ? (I'm assuming our workload is rather like cache or exchange, not like rapid grow).

Ideally, I'd like to see more realistic results which reflect our base batched implementation. Of course, maybe this is small part compared to the overall solana-validator's runtime... Pardon me to be nit-pick here.

Also, how does bench_concurrent_read_write perform before and after dashmap with single/multi thread for writer with no reader? I think this bench is easy enough to cherrypick onto the merge base commit.

Also, there is also accounts-bench/src/main.rs if you have extra stamina, whose AccountDB preparation step tortures AccountsDB quite much :)

What I'm a bit worried is that we rather want to ensure not to introduce silent perf. degradation for validators who aren't affected by RPC calls. (mainnet-beta validators). Also, I'm assuming DashMap is internally locking shards while updating. That means we're locking/unlocking them for each read/write operation? In other words, we're moving to not-batched operation with frequent locks/unlocks (but so less contention!), from batched operation with infrequent locks/unlocks.

Quite interesting benchmarking showdown. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoqun, good suggestions, here's the results I saw on my Macbook Pro:

bench_concurrent_read_write on 1 writer, no readers:`

DashMap:
test bench_concurrent_read_write  ... bench:   3,713,260 ns/iter (+/- 679,081)

Master: 
test bench_concurrent_read_write  ... bench:   3,773,731 ns/iter (+/- 654,643)

accounts-bench/src/main.rs:

DashMap:
clean: false
Creating 10000 accounts
created 10000 accounts in 4 slots create accounts took 148ms

Master: 
clean: false
Creating 10000 accounts
created 10000 accounts in 4 slots create accounts took 145ms

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carllin Perfect reporting :)

let num_readers = 5;
let accounts = Arc::new(Accounts::new(
vec![
PathBuf::from(std::env::var("FARF_DIR").unwrap_or_else(|_| "farf".to_string()))
.join("concurrent_read_write"),
],
&ClusterType::Development,
));
let num_keys = 1000;
let slot = 0;
accounts.add_root(slot);
let pubkeys: Arc<Vec<_>> = Arc::new(
(0..num_keys)
.map(|_| {
let pubkey = Pubkey::new_rand();
let account = Account::new(1, 0, &Account::default().owner);
accounts.store_slow(slot, &pubkey, &account);
pubkey
})
.collect(),
);

for _ in 0..num_readers {
let accounts = accounts.clone();
let pubkeys = pubkeys.clone();
Builder::new()
.name("readers".to_string())
.spawn(move || {
let mut rng = rand::thread_rng();
loop {
let i = rng.gen_range(0, num_keys);
test::black_box(accounts.load_slow(&HashMap::new(), &pubkeys[i]).unwrap());
}
})
.unwrap();
}

let num_new_keys = 1000;
let new_accounts: Vec<_> = (0..num_new_keys)
.map(|_| Account::new(1, 0, &Account::default().owner))
.collect();
bencher.iter(|| {
for account in &new_accounts {
// Write to a different slot than the one being read from. Because
// there's a new account pubkey being written to every time, will
// compete for the accounts index lock on every store
accounts.store_slow(slot + 1, &Pubkey::new_rand(), &account);
}
})
}
8 changes: 3 additions & 5 deletions runtime/src/accounts.rs
Original file line number Diff line number Diff line change
Expand Up @@ -303,7 +303,6 @@ impl Accounts {
//PERF: hold the lock to scan for the references, but not to clone the accounts
//TODO: two locks usually leads to deadlocks, should this be one structure?
let accounts_index = self.accounts_db.accounts_index.read().unwrap();
let storage = self.accounts_db.storage.read().unwrap();

let fee_config = FeeConfig {
secp256k1_program_enabled: feature_set
Expand All @@ -328,7 +327,7 @@ impl Accounts {
};

let load_res = self.load_tx_accounts(
&storage,
&self.accounts_db.storage,
ancestors,
&accounts_index,
tx,
Expand All @@ -343,7 +342,7 @@ impl Accounts {
};

let load_res = Self::load_loaders(
&storage,
&self.accounts_db.storage,
ancestors,
&accounts_index,
tx,
Expand Down Expand Up @@ -1507,10 +1506,9 @@ mod tests {
let ancestors = vec![(0, 0)].into_iter().collect();

let accounts_index = accounts.accounts_db.accounts_index.read().unwrap();
let storage = accounts.accounts_db.storage.read().unwrap();
assert_eq!(
Accounts::load_executable_accounts(
&storage,
&accounts.accounts_db.storage,
&ancestors,
&accounts_index,
&Pubkey::new_rand(),
Expand Down
Loading