-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Ethstore optimizations #6827
Ethstore optimizations #6827
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ use std::collections::{BTreeMap, HashMap}; | |
use std::mem; | ||
use std::path::PathBuf; | ||
use parking_lot::{Mutex, RwLock}; | ||
use std::time::{Instant, Duration}; | ||
|
||
use crypto::KEY_ITERATIONS; | ||
use random::Random; | ||
|
@@ -28,6 +29,8 @@ use presale::PresaleWallet; | |
use json::{self, Uuid, OpaqueKeyFile}; | ||
use {import, Error, SimpleSecretStore, SecretStore, SecretVaultRef, StoreAccountRef, Derivation, OpaqueSecret}; | ||
|
||
const REFRESH_TIME_SEC: u64 = 5; | ||
|
||
/// Accounts store. | ||
pub struct EthStore { | ||
store: EthMultiStore, | ||
|
@@ -245,7 +248,12 @@ pub struct EthMultiStore { | |
// order lock: cache, then vaults | ||
cache: RwLock<BTreeMap<StoreAccountRef, Vec<SafeAccount>>>, | ||
vaults: Mutex<HashMap<String, Box<VaultKeyDirectory>>>, | ||
dir_hash: Mutex<Option<u64>>, | ||
timestamp: Mutex<Timestamp>, | ||
} | ||
|
||
struct Timestamp { | ||
dir_hash: Option<u64>, | ||
last_checked: Instant, | ||
} | ||
|
||
impl EthMultiStore { | ||
|
@@ -261,20 +269,27 @@ impl EthMultiStore { | |
vaults: Mutex::new(HashMap::new()), | ||
iterations: iterations, | ||
cache: Default::default(), | ||
dir_hash: Default::default(), | ||
timestamp: Mutex::new(Timestamp { | ||
dir_hash: None, | ||
last_checked: Instant::now(), | ||
}), | ||
}; | ||
store.reload_accounts()?; | ||
Ok(store) | ||
} | ||
|
||
fn reload_if_changed(&self) -> Result<(), Error> { | ||
let mut last_dir_hash = self.dir_hash.lock(); | ||
let dir_hash = Some(self.dir.unique_repr()?); | ||
if *last_dir_hash == dir_hash { | ||
return Ok(()) | ||
let mut last_timestamp = self.timestamp.lock(); | ||
let now = Instant::now(); | ||
if (now - last_timestamp.last_checked) > Duration::from_secs(REFRESH_TIME_SEC) { | ||
let dir_hash = Some(self.dir.unique_repr()?); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This appears to be creating a disk IO read problem. Because every 5 seconds you end up looping and hashing all files in the keystore...if this is a large number of files (in my case 150,000)...in the end causes total server crash. Is there a better way to detect changes in the keystore dir? Possibly only detect a change when new key is added via personal_newAccount... calls. Or simple file count? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @AdvancedStyle are you saying your server crashes with the changes from this PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, doing more testing and will provide more information shortly. Basically once I compile/run latest master with this PR it fixes the peer connections, but then crashes the server (with noticeably huge amount of disk reading). But i'll do a bit more testing and see if I can gather some actually error messages from the server crash There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, i've been using iostat / iotop / tracefile (https://gitlab.com/ole.tange/tangetools/tree/master/tracefile) to montor disk IO and here are the results: With using latest master the iostat util is quite low (less than 5%) however the tracefiles shows that parity is constantly accessing the keystore files...but it is not causing high disk util and not causing server crash..only causing peers to drop/timeout and syncing to stop. After PR #6827 the iostat util goes through the roof (50-100% disk util all the time): iostat output sample:avg-cpu: %user %nice %system %iowait %steal %idle 23.14 0.00 20.48 13.56 0.27 42.55Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util avg-cpu: %user %nice %system %iowait %steal %idle Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util But the strange thing is the tracefile (strace script) is not really seeing a lot of file access on the process (parity is no longer accessing the keystore files individuals)... it only appears to be accessing the keystore dir...so why all the disk IO?: Sample of tracefile output:
VM then hangs / crashes after a few minutes with hung_task_timeout_secs
And a bunch of similar hung tasks. Conclusion: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The IO must be just the database IO. Does the machine have HDD or SSD? How exactly does it crash? Is there a backtrace printed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SSD. Server running on a KVM virtual machine (Debian host, Ubuntu 16 guest). The entire virtual machine just locks up; SSH non-responsive and virtmanager console non-responsive, then it dumps a series of error messages in the virtmanager console similar to: INFO: task kworker/[...] blocked for more than 120 seconds. Note, i'm running parity with params:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've decided to clone the VM and try to delete/resync the chain from 0 to see if that helps, and so far it seems to be doing ok (no IO read problems so far)...will post back shortly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
last_timestamp.last_checked = now; | ||
if last_timestamp.dir_hash == dir_hash { | ||
return Ok(()) | ||
} | ||
self.reload_accounts()?; | ||
last_timestamp.dir_hash = dir_hash; | ||
} | ||
self.reload_accounts()?; | ||
*last_dir_hash = dir_hash; | ||
Ok(()) | ||
} | ||
|
||
|
@@ -455,11 +470,11 @@ impl SimpleSecretStore for EthMultiStore { | |
} | ||
|
||
fn account_ref(&self, address: &Address) -> Result<StoreAccountRef, Error> { | ||
use std::collections::Bound; | ||
self.reload_if_changed()?; | ||
self.cache.read().keys() | ||
.find(|r| &r.address == address) | ||
.cloned() | ||
.ok_or(Error::InvalidAccount) | ||
let cache = self.cache.read(); | ||
let mut r = cache.range((Bound::Included(*address), Bound::Included(*address))); | ||
r.next().ok_or(Error::InvalidAccount).map(|(k, _)| k.clone()) | ||
} | ||
|
||
fn accounts(&self) -> Result<Vec<StoreAccountRef>, Error> { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather store
last_timestamp.valid_till
to avoid subtraction (and possible overflow in case the time is adjusted by the system):There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instant
is monotonic. It is not affected by clock changes and is guaranteed to be never decreased.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.