-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch accounts storage lock to DashMap #12126
Conversation
Codecov Report
@@ Coverage Diff @@
## master #12126 +/- ##
=========================================
- Coverage 81.9% 81.9% -0.1%
=========================================
Files 360 360
Lines 84873 84899 +26
=========================================
+ Hits 69549 69556 +7
- Misses 15324 15343 +19 |
The 4.0.0 release candidates are alleged to resolve this issue |
runtime/src/accounts_db.rs
Outdated
@@ -702,10 +707,9 @@ impl AccountsDB { | |||
// Calculate store counts as if everything was purged | |||
// Then purge if we can | |||
let mut store_counts: HashMap<AppendVecId, (usize, HashSet<Pubkey>)> = HashMap::new(); | |||
let storage = self.storage.read().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, the extra risk removing this large lock here adds is race conditions where now other slots in self.storage
can be modified. The three places this can happen:
-
https://github.com/solana-labs/solana/blob/master/runtime/src/accounts_db.rs#L970-L974. Shouldn't be possible b/c we hold the
shrink_candidate_slots
lock, andshrink_all_slots() -> do_shrink_slot_forced()
is not called after startup, so the lock should be respected -
https://github.com/solana-labs/solana/blob/master/runtime/src/accounts_db.rs#L1256-L1259. Can
store_with_hashes() -> handle_reclaims_maybe_cleanup()
remove a slot that exists inpurges.account_infos
such that the call below toself.storage.0.get(&slot).unwrap();
panics? -
https://github.com/solana-labs/solana/blob/master/runtime/src/accounts_db.rs#L1236-L1238. Adding a new storage entry should be ok as that should be on some future non-rooted slot which shouldn't exist in
account_infos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yeah, I think your understanding is correct.
- Yeah, can
panic!
. you are addressing this at Fix rooted accounts cleanup, simplify locking #12194 - Yeah, this is correct as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
Btw, this needs concurrent btree map because currently AccountsIndex uses |
@@ -381,7 +386,7 @@ pub struct AccountsDB { | |||
/// Keeps tracks of index into AppendVec on a per slot basis | |||
pub accounts_index: RwLock<AccountsIndex<AccountInfo>>, | |||
|
|||
pub storage: RwLock<AccountStorage>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool :)
runtime/src/accounts_db.rs
Outdated
let mut stores = self.storage.write().unwrap(); | ||
let slot_storage = stores.0.entry(slot).or_insert_with(HashMap::new); | ||
let mut slot_storage = self.storage.0.entry(slot).or_insert_with(HashMap::new); | ||
slot_storage.insert(store.id, store_for_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay! :)
runtime/src/accounts_db.rs
Outdated
let stores = self.storage.read().unwrap(); | ||
|
||
if let Some(slot_stores) = stores.0.get(&slot) { | ||
let slot_stores_guard = self.storage.0.get(&slot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: I think this isn't a lock guard anymore. So, remove the _guard
prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh well technically, I think it's still a guard on the specific DashMap shard: https://docs.rs/dashmap/3.11.10/src/dashmap/lib.rs.html#432-438, i.e. you shouldn't hold it for too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see!
I think this is better than #12132 because of its straight-forwardness to the problem. Anyway, I wonder how to best to fix the locks for AccountsDB... |
@ryoqun I think this PR and #12132 may actually work together 😃 DashMap essentially partitions the HashMap into #12132 should guarantee that during the accounts scan, a single shard isn't locked up for the entire duration of the scan within that shard |
Unfortunately that one doesn't seem to implement the "range()" function we want from BTreeMap 😢 , thought it may be an easy addition. I think a sufficient temporary bandaid is to switch |
7626983
to
de8586a
Compare
Sad...
Yeah, that will work. Seems a good bandaid. :) |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
ccc0525
to
b499b43
Compare
6371371
to
e72ce20
Compare
@carllin how does this perform in various benchmarks? There is several benchmark standard problems. Also, could you try to simulate concurrent heavy write by ReplayStage and heavy read by RPC? |
ed51a7b
to
8479f62
Compare
@ryoqun I added a benchmark simulating heavy single reads from RPC along with writes from Replay. As expected, this case doesn't see a lot of benefit from this change because:
I didn't yet add a benchmark here for simulating the RPC |
|
||
#[bench] | ||
#[ignore] | ||
fn bench_concurrent_read_write(bencher: &mut Bencher) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, do you have any idea of how single-threaded throughput of writing is changed from std::collections::HashMap
to DashMap
? I think I'm worrying too much but I'd rather want to confirm how far does DashMap
do well while supporting the concurrency via sharding. Maybe it's trading off maximum throughput by negligible margin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add more context, our basic tenet is batch them if we can, make it concurrent otherwise. so, I guess the upper layer is slumming the AccountsDB optimized for batching and the single threaded perf is kind of moderately related to the batching perf. Thus, we're somewhat sensitive to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryoqun that's a good point, I've run the benchmark here: https://github.com/xacrimon/conc-map-bench which has three different work profiles, exchange(read-write), cache (read-heavy), and rapid-grow (insert-heavy), more details can be found in that link above. The results for a single thread on my dev machine here: https://console.cloud.google.com/compute/instancesDetail/zones/us-west1-b/instances/carl-dev?project=principal-lane-200702&authuser=1
== cache
-- MutexStd
25165824 operations across 1 thread(s) in 14.941146454s; time/op = 593ns
-- DashMap
25165824 operations across 1 thread(s) in 15.589596263s; time/op = 619ns
==
== exchange
-- MutexStd
25165824 operations across 1 thread(s) in 20.954264682s; time/op = 831ns
-- DashMap
25165824 operations across 1 thread(s) in 20.875345754s; time/op = 828ns
==
== rapid grow
-- MutexStd
25165824 operations across 1 thread(s) in 20.22593938s; time/op = 802ns
==
-- DashMap
25165824 operations across 1 thread(s) in 17.456807471s; time/op = 693ns
==
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carllin that report is interesting. Thanks for running it and sharing this report. So, DashMap operations seem to be on par with not-batched Mutex<HashMap>
operations as far as I read the code https://github.com/xacrimon/conc-map-bench/blob/master/src/adapters.rs#L40 ? (I'm assuming our workload is rather like cache or exchange, not like rapid grow).
Ideally, I'd like to see more realistic results which reflect our base batched implementation. Of course, maybe this is small part compared to the overall solana-validator
's runtime... Pardon me to be nit-pick here.
Also, how does bench_concurrent_read_write
perform before and after dashmap with single/multi thread for writer with no reader? I think this bench is easy enough to cherrypick onto the merge base commit.
Also, there is also accounts-bench/src/main.rs
if you have extra stamina, whose AccountDB preparation step tortures AccountsDB quite much :)
What I'm a bit worried is that we rather want to ensure not to introduce silent perf. degradation for validators who aren't affected by RPC calls. (mainnet-beta validators). Also, I'm assuming DashMap is internally locking shards while updating. That means we're locking/unlocking them for each read/write operation? In other words, we're moving to not-batched operation with frequent locks/unlocks (but so less contention!), from batched operation with infrequent locks/unlocks.
Quite interesting benchmarking showdown. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryoqun, good suggestions, here's the results I saw on my Macbook Pro:
bench_concurrent_read_write
on 1 writer, no readers:`
DashMap:
test bench_concurrent_read_write ... bench: 3,713,260 ns/iter (+/- 679,081)
Master:
test bench_concurrent_read_write ... bench: 3,773,731 ns/iter (+/- 654,643)
accounts-bench/src/main.rs:
DashMap:
clean: false
Creating 10000 accounts
created 10000 accounts in 4 slots create accounts took 148ms
Master:
clean: false
Creating 10000 accounts
created 10000 accounts in 4 slots create accounts took 145ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carllin Perfect reporting :)
almost lgtm (I'd like to see a system-test-level perf report combined with #12126, also give me few hours to ponder on this as a final check.) |
special thanks for addressing my bunch of nits quickly as always. this pr got mature pretty quickly because of it. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in code-wise with all nits resolved correctly!
Please check some perf. concerns I wrote about.
Co-authored-by: Carl Lin <carl@solana.com>
Co-authored-by: Carl Lin <carl@solana.com> (cherry picked from commit f8d338c)
Co-authored-by: Carl Lin <carl@solana.com> (cherry picked from commit f8d338c)
Problem
Accounts scans from RPC hold:
the account storage lock duration the entire duration of the scan, which blocks replay on
create_and_insert_store()
during account commit.the account index lock duration the entire duration of the scan, which blocks replay in AccountsDb
store()->update_index()
during account commit.Summary of Changes
Experimenting with switching global accounts storage (part 1 above) lock to
DashMap
: https://github.com/xacrimon/dashmap, a concurrent hashmap implemented by sharding the table. This removes the need to hold the globalAccountStorage
read lock inscan_accounts
which is blockingcreate_and_insert_store()
TODO: This can also potentially be expanded to replace the accounts index lock in part 2 above
TODO: Reason about correctness of places where I've replaced the large account_storage read locks, specifically around cleaning and shrinking. @ryoqun would really appreciate a review in those areas!
Pertinent gotchas with v3 of
DashMap
: xacrimon/dashmap#74Other candidates that were considered but not chosen:
Fixes #