-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve console wallet responsiveness #3304
feat: improve console wallet responsiveness #3304
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle I am ok with a cooldown on balance refresh however there is an issue because balance refresh is only triggered on events that would update the balance so what happens if the balance is updated 10 times in quick succession and the last trigger happens within a cooldown period. The balance will never be updated to reflect the change of that last event.
applications/tari_console_wallet/src/ui/state/wallet_event_monitor.rs
Outdated
Show resolved
Hide resolved
let chain_metadata = tip_info | ||
.metadata | ||
.ok_or_else(|| BaseNodeMonitorError::InvalidBaseNodeResponse("Tip info no metadata".to_string())) | ||
.and_then(|metadata| { | ||
ChainMetadata::try_from(metadata).map_err(BaseNodeMonitorError::InvalidBaseNodeResponse) | ||
})?; | ||
// Note: Do not use an RPC ping call to measure latency here, as the time measured for that always follows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my understanding of the RPC ping message is that it is supposed to be a measure of the round trip response without a database read so it is a better measure of network latency that should be divorced from the business of the base node database. If the metadata call and this are linked then that actually points to a problem on the base node that the RPC Pong response is somehow being blocked by Database activity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really difficult to make sense of this. I could pick this up in the logs, and these may be from the same conversation or not but it is difficult to tell:
CW: 00:15:41.694385500 [wallet::base_node_service::chain_metadata_monitor] TRACE Obtain RPC client 0 ms
BN: 00:15:43.309834400 [comms::rpc] DEBUG [Peer=`ff03297bea5f3f548a85dd72c6`] Got request RequestID=12, Deadline=120s, Flags=(empty), Message=0 byte(s)
CW: 00:16:56.568149800 [comms::rpc::client] TRACE Received response (64 byte(s)) from request #12 (protocol = t/bnwallet/1, method=5) in 75s
CW: 00:16:56.568175700 [wallet::base_node_service::chain_metadata_monitor] TRACE Obtain metadata from client 74873 ms
CW: 00:16:56.568225000 [comms::rpc::client] DEBUG Ping (protocol t/bnwallet/1) sent in 3.00µs
BN: 00:16:58.072925300 [comms::rpc] TRACE RPC TIMING(REQ_ID=3): 'transmit' took 53.64s (SLOW)
BN: 00:16:58.073036500 [comms::rpc] DEBUG [Peer=`ff03297bea5f3f548a85dd72c6` t/bnwallet/1] sending ACK response.
CW: 00:18:23.682751100 [wallet::base_node_service::chain_metadata_monitor] TRACE Obtain latency (ping) from client 87114 ms
BN: 00:18:25.297436900 [comms::rpc] TRACE RPC TIMING(REQ_ID=3): 'transmit' took 85.17s (SLOW)
What stands out is:
- the wallet measures 74.873s for a round trip metadata request; the base node reports a slow transmit of 53.64s
- the wallet measures 87.114s for a round trip ping request; the base node reports a slow transmit of 85.17s
If those timings are linked it would suggest being not blocked by database activity.
c660fd3
to
6a4e072
Compare
6a4e072
to
ca3554f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try not to create endless toml configuration that are meaningless to anyone that does not have intimate knowledge of the code. If we feel it necessary to tweak values endlessly, we should probably take a step back and think about a better way.
The call to refresh_balance now does not necessarily refresh the balance. If the balance is being refreshed too often by some caller, perhaps the caller should handle that (debounce) correctly or mark the balance as stale so that it can be updated by the next tick (I think the latter is a good option). This is preferable to manually tweaking some values and creating unexpected side (non)effects in refresh_balance.
Otherwise, fine with the removal of ping. This can go in, but we should strive to create systems that work for users, rather than ones that have to be configured by engineers to work.
|
||
let latency = client.ping().await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we are picking up differences across machines - in my tests the real ping was lower when for e.g. the base node was sending blocks to another base node and/or getting scanned by another wallet etc. .
Whereas, because get_tip_info requires the db, it would be slower in this case. The value presented as latency is not purely network latency and could be misleading. There are, as you'd expect, some cases where network performance takes a hit (and on your and some others machines it seems a lot worse).
I did mention in my original PR that latency for get_tip_info would be more efficient and a better measure of actual base node performance, once we see a truer network latency value (which was instantly better in my case), but happy if we keep in mind the fact that database contention is included in the value presented in the console wallet as latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... happy if we keep in mind the fact that database contention is included in the value presented in the console wallet as latency.
Agreed, a proper comment in the code highlight this.
ca3554f
to
38f3993
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why you want to add this, but the implementation is so specific and adds so many extra tracking variables to achieve it that the extra orchestration code muddies up the other code.
If you can find a way to achieve the same goal in a more elegant manner, I'd consider merging it
let balance = self.wallet.output_manager_service.get_balance().await?; | ||
self.data.balance = balance; | ||
self.updated = true; | ||
if self.balance_enquiry_cooldown_start.elapsed() > self.balance_enquiry_cooldown_period { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for an additional method here, and the names are confusing. I would suggest removing the refresh_balance_check
method and the balance_enquiry_cooldown_infringement
variables as calling the refresh_balance
method twice in quick succession has the same effect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented a balance check debouncer for these purposes - please see the latest commit
// The cooldown period between balance enquiry checks in seconds; requests faster than this will be ignored. | ||
// For specialized wallets processing many batch transactions this setting could be increased to 60 s to retain | ||
// responsiveness of the wallet with slightly delayed balance updates (default: 1) | ||
let key = "wallet.balance_enquiry_cooldown_period"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nobody is ever going to change this to anything other than 1. I would say hardcoding it as 1 second is better in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is set to 1. The configuration setting is needed for stress testing or for running a large wallet where many events occur that can change the balance, where 1 s is not a good choice. These are average time measurements for the call to the pub async fn get_balance
SQL queries to complete:
- ~15,000 total spent and unspent TXOs: 1 s
- ~125,000 total spent and unspent TXOs: 400 s
- Improved console wallet responsiveness by introducing a cooldown period for calculating the balance, as this is a database call that grows linearly with the amount of UTXOs in the wallet and does not need to be repeated multiple times per second in certain scenarios. - Submit connectivity failed event if the metadata RPC requests time out so that 'Waiting for data...' will be displayed in the wallet instead of the last known good value, as the former state can sometimes continue for more than 30 minutes. - Removed the RPC client ping call as a method when monitoring the base node to calculate latency and instead replaced it with the time measured to obtain metadata via the RPC client. The time measured for the ping call always follows the time it takes for the metadata call and will often take longer than the metadata call.
38f3993
to
32e2399
Compare
32e2399
to
95ee4d4
Compare
} | ||
loop { | ||
tokio::select! { | ||
_ = &mut delay => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the desired behaviour is to cache the result for multiple calls within the cooldown period, this is not being achieved.
What will happen here is that many calls to this method will get slower and slower, blocking the callers.
Wouldn't it be better to change it to something like
select! {
balance_enquiry_events.recv() => {
if Instant::now() < cache_until_time { return cached_value }
// get value
cache_until_time = Instant::now() + self.balance_enquiry_cooldown_period;
return value;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the desired behaviour is to cache the result for multiple calls within the cooldown period, this is not being achieved.
The desired behaviour is to postpone all calls to update the balance for a set cooldown period of time till the very end of the period (and of course then only perform one calculation) and not update it at the beginning of the period and use the cached value. This will ensure that the last request to update the balance will always be honoured if no more requests arrive after the current cooldown period. If the cooldown period is made much bigger for example increased to 60 s for large busy wallets, then not honouring the last balance update request in the cooldown period is very much noticeable in the TUI with displaying the incorrect balance.
What will happen here is that many calls to this method will get slower and slower, blocking the callers.
If 1000 requests are sent to the debouncer (broadcast
) channel in 1 s then only the last one will be remembered as it has size 1; all prior requests will be dropped. The select loop will only check for a new update request after the delay event, and the channel receive future will timeout if there is none. Thus no blocking calls.
Disconnect the connected peer in the wallet connectivity if there is an RPC error, so that a new connection can be established. Improve console wallet responsiveness as per PR tari-project#3304
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not 100% convinced, but the impact should be low. I think I can craft a simpler debouncer as I stated in the comments, but this can go in for now
Disconnect the connected peer in the wallet connectivity if there is an RPC error, so that a new connection can be established. Improve console wallet responsiveness as per PR tari-project#3304
Description
trigger_balance_refresh
command that will send an event to the debouncer, which in turn will collect all such events and only perform one balance update database query after the end of the cooldown period.]A new base node state change (metadata) event will ensure the last request to calculate balance will be fulfilled.Motivation and Context
Wallets with growing numbers of UTXOs become more and more sluggish due to repeated database calls to calculate the balance; this is much more pronounced when many broadcast monitoring protocols are running.
When the metadata RPC requests time out previous base node state would be displayed; this could carry on for 30 minutes or more with consecutive timeouts where the correct user feedback would be that the wallet is waiting for data.
The time measured for the ping call always follows the time it takes for the metadata call, which is evident as well when the base node gets busy and will often take longer than the metadata call. Having two longish RPC calls with no added value results in more RPC request timeouts than is necessary. See time measurement plots below:
How Has This Been Tested?
System-level testing during a stressed coin split