Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve console wallet responsiveness #3304

Merged

Conversation

hansieodendaal
Copy link
Contributor

@hansieodendaal hansieodendaal commented Sep 6, 2021

Description

  1. Improved console wallet responsiveness by introducing a [edit debouncer with a] cooldown period for calculating the balance, as this is a database call that grows linearly with the amount of UTXOs in the wallet and does not need to be repeated multiple times per second in certain scenarios. [Edit All events that can change the balance and are captured in the wallet event monitor are followed by the trigger_balance_refresh command that will send an event to the debouncer, which in turn will collect all such events and only perform one balance update database query after the end of the cooldown period.] A new base node state change (metadata) event will ensure the last request to calculate balance will be fulfilled.
  2. Submit connectivity failed event if the metadata RPC requests time out so that 'Waiting for data...' will be displayed in the wallet.
  3. Removed the RPC client ping call as a method when monitoring the base node to calculate latency and instead replaced it with the time measured to obtain metadata via the RPC client.

Motivation and Context

  1. Wallets with growing numbers of UTXOs become more and more sluggish due to repeated database calls to calculate the balance; this is much more pronounced when many broadcast monitoring protocols are running.

  2. When the metadata RPC requests time out previous base node state would be displayed; this could carry on for 30 minutes or more with consecutive timeouts where the correct user feedback would be that the wallet is waiting for data.

  3. The time measured for the ping call always follows the time it takes for the metadata call, which is evident as well when the base node gets busy and will often take longer than the metadata call. Having two longish RPC calls with no added value results in more RPC request timeouts than is necessary. See time measurement plots below:

    image
    image
    image
    image
    image

How Has This Been Tested?

System-level testing during a stressed coin split

Copy link
Contributor

@philipr-za philipr-za left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle I am ok with a cooldown on balance refresh however there is an issue because balance refresh is only triggered on events that would update the balance so what happens if the balance is updated 10 times in quick succession and the last trigger happens within a cooldown period. The balance will never be updated to reflect the change of that last event.

applications/tari_console_wallet/src/ui/state/app_state.rs Outdated Show resolved Hide resolved
applications/tari_console_wallet/src/ui/state/app_state.rs Outdated Show resolved Hide resolved
let chain_metadata = tip_info
.metadata
.ok_or_else(|| BaseNodeMonitorError::InvalidBaseNodeResponse("Tip info no metadata".to_string()))
.and_then(|metadata| {
ChainMetadata::try_from(metadata).map_err(BaseNodeMonitorError::InvalidBaseNodeResponse)
})?;
// Note: Do not use an RPC ping call to measure latency here, as the time measured for that always follows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my understanding of the RPC ping message is that it is supposed to be a measure of the round trip response without a database read so it is a better measure of network latency that should be divorced from the business of the base node database. If the metadata call and this are linked then that actually points to a problem on the base node that the RPC Pong response is somehow being blocked by Database activity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really difficult to make sense of this. I could pick this up in the logs, and these may be from the same conversation or not but it is difficult to tell:

CW:    00:15:41.694385500 [wallet::base_node_service::chain_metadata_monitor] TRACE Obtain RPC client 0 ms
   BN: 00:15:43.309834400 [comms::rpc] DEBUG [Peer=`ff03297bea5f3f548a85dd72c6`] Got request RequestID=12, Deadline=120s, Flags=(empty), Message=0 byte(s) 
CW:    00:16:56.568149800 [comms::rpc::client] TRACE Received response (64 byte(s)) from request #12 (protocol = t/bnwallet/1, method=5) in 75s
CW:    00:16:56.568175700 [wallet::base_node_service::chain_metadata_monitor] TRACE Obtain metadata from client 74873 ms
CW:    00:16:56.568225000 [comms::rpc::client] DEBUG Ping (protocol t/bnwallet/1) sent in 3.00µs
   BN: 00:16:58.072925300 [comms::rpc] TRACE RPC TIMING(REQ_ID=3): 'transmit' took 53.64s (SLOW)
   BN: 00:16:58.073036500 [comms::rpc] DEBUG [Peer=`ff03297bea5f3f548a85dd72c6` t/bnwallet/1] sending ACK response.
CW:    00:18:23.682751100 [wallet::base_node_service::chain_metadata_monitor] TRACE Obtain latency (ping) from client 87114 ms
   BN: 00:18:25.297436900 [comms::rpc] TRACE RPC TIMING(REQ_ID=3): 'transmit' took 85.17s (SLOW) 

What stands out is:

  • the wallet measures 74.873s for a round trip metadata request; the base node reports a slow transmit of 53.64s
  • the wallet measures 87.114s for a round trip ping request; the base node reports a slow transmit of 85.17s

If those timings are linked it would suggest being not blocked by database activity.

common/config/presets/tari_config_example.toml Outdated Show resolved Hide resolved
common/src/configuration/utils.rs Outdated Show resolved Hide resolved
@hansieodendaal hansieodendaal changed the title Improve console wallet responsiveness feat: improve console wallet responsiveness Sep 6, 2021
@hansieodendaal hansieodendaal force-pushed the ho_improve_wallet_response branch 4 times, most recently from c660fd3 to 6a4e072 Compare September 7, 2021 09:26
@hansieodendaal hansieodendaal force-pushed the ho_improve_wallet_response branch from 6a4e072 to ca3554f Compare September 7, 2021 13:03
sdbondi
sdbondi previously approved these changes Sep 8, 2021
Copy link
Member

@sdbondi sdbondi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try not to create endless toml configuration that are meaningless to anyone that does not have intimate knowledge of the code. If we feel it necessary to tweak values endlessly, we should probably take a step back and think about a better way.

The call to refresh_balance now does not necessarily refresh the balance. If the balance is being refreshed too often by some caller, perhaps the caller should handle that (debounce) correctly or mark the balance as stale so that it can be updated by the next tick (I think the latter is a good option). This is preferable to manually tweaking some values and creating unexpected side (non)effects in refresh_balance.

Otherwise, fine with the removal of ping. This can go in, but we should strive to create systems that work for users, rather than ones that have to be configured by engineers to work.

applications/tari_console_wallet/src/ui/state/app_state.rs Outdated Show resolved Hide resolved

let latency = client.ping().await?;
Copy link
Member

@sdbondi sdbondi Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we are picking up differences across machines - in my tests the real ping was lower when for e.g. the base node was sending blocks to another base node and/or getting scanned by another wallet etc. .

Whereas, because get_tip_info requires the db, it would be slower in this case. The value presented as latency is not purely network latency and could be misleading. There are, as you'd expect, some cases where network performance takes a hit (and on your and some others machines it seems a lot worse).

I did mention in my original PR that latency for get_tip_info would be more efficient and a better measure of actual base node performance, once we see a truer network latency value (which was instantly better in my case), but happy if we keep in mind the fact that database contention is included in the value presented in the console wallet as latency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... happy if we keep in mind the fact that database contention is included in the value presented in the console wallet as latency.

Agreed, a proper comment in the code highlight this.

applications/tari_console_wallet/src/ui/state/app_state.rs Outdated Show resolved Hide resolved
applications/tari_console_wallet/src/ui/state/app_state.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@stringhandler stringhandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you want to add this, but the implementation is so specific and adds so many extra tracking variables to achieve it that the extra orchestration code muddies up the other code.

If you can find a way to achieve the same goal in a more elegant manner, I'd consider merging it

let balance = self.wallet.output_manager_service.get_balance().await?;
self.data.balance = balance;
self.updated = true;
if self.balance_enquiry_cooldown_start.elapsed() > self.balance_enquiry_cooldown_period {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for an additional method here, and the names are confusing. I would suggest removing the refresh_balance_check method and the balance_enquiry_cooldown_infringement variables as calling the refresh_balance method twice in quick succession has the same effect

Copy link
Contributor Author

@hansieodendaal hansieodendaal Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented a balance check debouncer for these purposes - please see the latest commit

applications/tari_console_wallet/src/ui/state/app_state.rs Outdated Show resolved Hide resolved
applications/tari_console_wallet/src/ui/state/app_state.rs Outdated Show resolved Hide resolved
applications/tari_console_wallet/src/ui/state/app_state.rs Outdated Show resolved Hide resolved
applications/tari_console_wallet/src/ui/state/app_state.rs Outdated Show resolved Hide resolved
base_layer/wallet/src/base_node_service/monitor.rs Outdated Show resolved Hide resolved
// The cooldown period between balance enquiry checks in seconds; requests faster than this will be ignored.
// For specialized wallets processing many batch transactions this setting could be increased to 60 s to retain
// responsiveness of the wallet with slightly delayed balance updates (default: 1)
let key = "wallet.balance_enquiry_cooldown_period";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nobody is ever going to change this to anything other than 1. I would say hardcoding it as 1 second is better in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is set to 1. The configuration setting is needed for stress testing or for running a large wallet where many events occur that can change the balance, where 1 s is not a good choice. These are average time measurements for the call to the pub async fn get_balance SQL queries to complete:

  • ~15,000 total spent and unspent TXOs: 1 s
  • ~125,000 total spent and unspent TXOs: 400 s

- Improved console wallet responsiveness by introducing a cooldown period
  for calculating the balance, as this is a database call that grows
  linearly with the amount of UTXOs in the wallet and does not need to be
  repeated multiple times per second in certain scenarios.
- Submit connectivity failed event if the metadata RPC requests time out
  so that 'Waiting for data...' will be displayed in the wallet instead of
  the last known good value, as the former state can sometimes continue for
  more than 30 minutes.
- Removed the RPC client ping call as a method when monitoring the base
  node to calculate latency and instead replaced it with the time measured
  to obtain metadata via the RPC client. The time measured for the ping
  call always follows the time it takes for the metadata call and will
  often take longer than the metadata call.
@hansieodendaal hansieodendaal force-pushed the ho_improve_wallet_response branch from 38f3993 to 32e2399 Compare September 20, 2021 16:44
@hansieodendaal hansieodendaal added the P-do_not_merge Process - Not ready for merging label Sep 21, 2021
@hansieodendaal hansieodendaal force-pushed the ho_improve_wallet_response branch from 32e2399 to 95ee4d4 Compare September 21, 2021 08:21
@hansieodendaal hansieodendaal removed the P-do_not_merge Process - Not ready for merging label Sep 21, 2021
}
loop {
tokio::select! {
_ = &mut delay => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the desired behaviour is to cache the result for multiple calls within the cooldown period, this is not being achieved.

What will happen here is that many calls to this method will get slower and slower, blocking the callers.

Wouldn't it be better to change it to something like


select! {
  balance_enquiry_events.recv() =>  {
   if  Instant::now() < cache_until_time { return cached_value }
  // get value
   cache_until_time = Instant::now() + self.balance_enquiry_cooldown_period;
   return value;
 }
  
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the desired behaviour is to cache the result for multiple calls within the cooldown period, this is not being achieved.

The desired behaviour is to postpone all calls to update the balance for a set cooldown period of time till the very end of the period (and of course then only perform one calculation) and not update it at the beginning of the period and use the cached value. This will ensure that the last request to update the balance will always be honoured if no more requests arrive after the current cooldown period. If the cooldown period is made much bigger for example increased to 60 s for large busy wallets, then not honouring the last balance update request in the cooldown period is very much noticeable in the TUI with displaying the incorrect balance.

What will happen here is that many calls to this method will get slower and slower, blocking the callers.

If 1000 requests are sent to the debouncer (broadcast) channel in 1 s then only the last one will be remembered as it has size 1; all prior requests will be dropped. The select loop will only check for a new update request after the delay event, and the channel receive future will timeout if there is none. Thus no blocking calls.

hansieodendaal added a commit to hansieodendaal/tari that referenced this pull request Sep 21, 2021
Disconnect the connected peer in the wallet connectivity if there is an RPC error,
so that a new connection can be established.

Improve console wallet responsiveness

as per PR tari-project#3304
Copy link
Collaborator

@stringhandler stringhandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% convinced, but the impact should be low. I think I can craft a simpler debouncer as I stated in the comments, but this can go in for now

hansieodendaal added a commit to hansieodendaal/tari that referenced this pull request Sep 21, 2021
Disconnect the connected peer in the wallet connectivity if there is an RPC error,
so that a new connection can be established.

Improve console wallet responsiveness

as per PR tari-project#3304
@aviator-app aviator-app bot merged commit 73017a4 into tari-project:development Sep 21, 2021
@hansieodendaal hansieodendaal deleted the ho_improve_wallet_response branch October 14, 2021 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants