feat: improve console wallet responsiveness #3304

hansieodendaal · 2021-09-06T08:48:22Z

Description

Improved console wallet responsiveness by introducing a [edit debouncer with a] cooldown period for calculating the balance, as this is a database call that grows linearly with the amount of UTXOs in the wallet and does not need to be repeated multiple times per second in certain scenarios. [Edit All events that can change the balance and are captured in the wallet event monitor are followed by the trigger_balance_refresh command that will send an event to the debouncer, which in turn will collect all such events and only perform one balance update database query after the end of the cooldown period.] ~~A new base node state change (metadata) event will ensure the last request to calculate balance will be fulfilled.~~
Submit connectivity failed event if the metadata RPC requests time out so that 'Waiting for data...' will be displayed in the wallet.
Removed the RPC client ping call as a method when monitoring the base node to calculate latency and instead replaced it with the time measured to obtain metadata via the RPC client.

Motivation and Context

Wallets with growing numbers of UTXOs become more and more sluggish due to repeated database calls to calculate the balance; this is much more pronounced when many broadcast monitoring protocols are running.
When the metadata RPC requests time out previous base node state would be displayed; this could carry on for 30 minutes or more with consecutive timeouts where the correct user feedback would be that the wallet is waiting for data.
The time measured for the ping call always follows the time it takes for the metadata call, which is evident as well when the base node gets busy and will often take longer than the metadata call. Having two longish RPC calls with no added value results in more RPC request timeouts than is necessary. See time measurement plots below:

How Has This Been Tested?

System-level testing during a stressed coin split

base_layer/wallet/src/base_node_service/monitor.rs

philipr-za

In principle I am ok with a cooldown on balance refresh however there is an issue because balance refresh is only triggered on events that would update the balance so what happens if the balance is updated 10 times in quick succession and the last trigger happens within a cooldown period. The balance will never be updated to reflect the change of that last event.

applications/tari_console_wallet/src/ui/state/app_state.rs

applications/tari_console_wallet/src/ui/state/wallet_event_monitor.rs

philipr-za · 2021-09-06T09:09:21Z

base_layer/wallet/src/base_node_service/monitor.rs

            let chain_metadata = tip_info
                .metadata
                .ok_or_else(|| BaseNodeMonitorError::InvalidBaseNodeResponse("Tip info no metadata".to_string()))
                .and_then(|metadata| {
                    ChainMetadata::try_from(metadata).map_err(BaseNodeMonitorError::InvalidBaseNodeResponse)
                })?;
+            // Note: Do not use an RPC ping call to measure latency here, as the time measured for that always follows


So my understanding of the RPC ping message is that it is supposed to be a measure of the round trip response without a database read so it is a better measure of network latency that should be divorced from the business of the base node database. If the metadata call and this are linked then that actually points to a problem on the base node that the RPC Pong response is somehow being blocked by Database activity.

Really difficult to make sense of this. I could pick this up in the logs, and these may be from the same conversation or not but it is difficult to tell:

CW: 00:15:41.694385500 [wallet::base_node_service::chain_metadata_monitor] TRACE Obtain RPC client 0 ms BN: 00:15:43.309834400 [comms::rpc] DEBUG [Peer=`ff03297bea5f3f548a85dd72c6`] Got request RequestID=12, Deadline=120s, Flags=(empty), Message=0 byte(s) CW: 00:16:56.568149800 [comms::rpc::client] TRACE Received response (64 byte(s)) from request #12 (protocol = t/bnwallet/1, method=5) in 75s CW: 00:16:56.568175700 [wallet::base_node_service::chain_metadata_monitor] TRACE Obtain metadata from client 74873 ms CW: 00:16:56.568225000 [comms::rpc::client] DEBUG Ping (protocol t/bnwallet/1) sent in 3.00µs BN: 00:16:58.072925300 [comms::rpc] TRACE RPC TIMING(REQ_ID=3): 'transmit' took 53.64s (SLOW) BN: 00:16:58.073036500 [comms::rpc] DEBUG [Peer=`ff03297bea5f3f548a85dd72c6` t/bnwallet/1] sending ACK response. CW: 00:18:23.682751100 [wallet::base_node_service::chain_metadata_monitor] TRACE Obtain latency (ping) from client 87114 ms BN: 00:18:25.297436900 [comms::rpc] TRACE RPC TIMING(REQ_ID=3): 'transmit' took 85.17s (SLOW)

What stands out is:

the wallet measures 74.873s for a round trip metadata request; the base node reports a slow transmit of 53.64s

the wallet measures 87.114s for a round trip ping request; the base node reports a slow transmit of 85.17s

If those timings are linked it would suggest being not blocked by database activity.

common/config/presets/tari_config_example.toml

common/src/configuration/utils.rs

common/config/presets/tari_config_example.toml

common/config/presets/tari_igor_config.toml

sdbondi

We should try not to create endless toml configuration that are meaningless to anyone that does not have intimate knowledge of the code. If we feel it necessary to tweak values endlessly, we should probably take a step back and think about a better way.

The call to refresh_balance now does not necessarily refresh the balance. If the balance is being refreshed too often by some caller, perhaps the caller should handle that (debounce) correctly or mark the balance as stale so that it can be updated by the next tick (I think the latter is a good option). This is preferable to manually tweaking some values and creating unexpected side (non)effects in refresh_balance.

Otherwise, fine with the removal of ping. This can go in, but we should strive to create systems that work for users, rather than ones that have to be configured by engineers to work.

applications/tari_console_wallet/src/ui/state/app_state.rs

sdbondi · 2021-09-08T04:45:02Z

base_layer/wallet/src/base_node_service/monitor.rs


-            let latency = client.ping().await?;


Think we are picking up differences across machines - in my tests the real ping was lower when for e.g. the base node was sending blocks to another base node and/or getting scanned by another wallet etc. .

Whereas, because get_tip_info requires the db, it would be slower in this case. The value presented as latency is not purely network latency and could be misleading. There are, as you'd expect, some cases where network performance takes a hit (and on your and some others machines it seems a lot worse).

I did mention in my original PR that latency for get_tip_info would be more efficient and a better measure of actual base node performance, once we see a truer network latency value (which was instantly better in my case), but happy if we keep in mind the fact that database contention is included in the value presented in the console wallet as latency.

... happy if we keep in mind the fact that database contention is included in the value presented in the console wallet as latency.

Agreed, a proper comment in the code highlight this.

applications/tari_console_wallet/src/ui/state/app_state.rs

stringhandler

I understand why you want to add this, but the implementation is so specific and adds so many extra tracking variables to achieve it that the extra orchestration code muddies up the other code.

If you can find a way to achieve the same goal in a more elegant manner, I'd consider merging it

stringhandler · 2021-09-14T09:11:13Z

applications/tari_console_wallet/src/ui/state/app_state.rs

-        let balance = self.wallet.output_manager_service.get_balance().await?;
-        self.data.balance = balance;
-        self.updated = true;
+        if self.balance_enquiry_cooldown_start.elapsed() > self.balance_enquiry_cooldown_period {


There is no need for an additional method here, and the names are confusing. I would suggest removing the refresh_balance_check method and the balance_enquiry_cooldown_infringement variables as calling the refresh_balance method twice in quick succession has the same effect

Implemented a balance check debouncer for these purposes - please see the latest commit

applications/tari_console_wallet/src/ui/state/app_state.rs

base_layer/wallet/src/base_node_service/monitor.rs

stringhandler · 2021-09-14T09:14:53Z

common/src/configuration/global.rs

+    // The cooldown period between balance enquiry checks in seconds; requests faster than this will be ignored.
+    // For specialized wallets processing many batch transactions this setting could be increased to 60 s to retain
+    // responsiveness of the wallet with slightly delayed balance updates (default: 1)
+    let key = "wallet.balance_enquiry_cooldown_period";


nobody is ever going to change this to anything other than 1. I would say hardcoding it as 1 second is better in this case

The default is set to 1. The configuration setting is needed for stress testing or for running a large wallet where many events occur that can change the balance, where 1 s is not a good choice. These are average time measurements for the call to the pub async fn get_balance SQL queries to complete:

~15,000 total spent and unspent TXOs: 1 s

~125,000 total spent and unspent TXOs: 400 s

- Improved console wallet responsiveness by introducing a cooldown period for calculating the balance, as this is a database call that grows linearly with the amount of UTXOs in the wallet and does not need to be repeated multiple times per second in certain scenarios. - Submit connectivity failed event if the metadata RPC requests time out so that 'Waiting for data...' will be displayed in the wallet instead of the last known good value, as the former state can sometimes continue for more than 30 minutes. - Removed the RPC client ping call as a method when monitoring the base node to calculate latency and instead replaced it with the time measured to obtain metadata via the RPC client. The time measured for the ping call always follows the time it takes for the metadata call and will often take longer than the metadata call.

stringhandler · 2021-09-21T10:30:36Z

applications/tari_console_wallet/src/ui/state/debouncer.rs

+        }
+        loop {
+            tokio::select! {
+                _ = &mut delay => {


If the desired behaviour is to cache the result for multiple calls within the cooldown period, this is not being achieved.

What will happen here is that many calls to this method will get slower and slower, blocking the callers.

Wouldn't it be better to change it to something like

select! { balance_enquiry_events.recv() => { if Instant::now() < cache_until_time { return cached_value } // get value cache_until_time = Instant::now() + self.balance_enquiry_cooldown_period; return value; } }

If the desired behaviour is to cache the result for multiple calls within the cooldown period, this is not being achieved.

The desired behaviour is to postpone all calls to update the balance for a set cooldown period of time till the very end of the period (and of course then only perform one calculation) and not update it at the beginning of the period and use the cached value. This will ensure that the last request to update the balance will always be honoured if no more requests arrive after the current cooldown period. If the cooldown period is made much bigger for example increased to 60 s for large busy wallets, then not honouring the last balance update request in the cooldown period is very much noticeable in the TUI with displaying the incorrect balance.

What will happen here is that many calls to this method will get slower and slower, blocking the callers.

If 1000 requests are sent to the debouncer (broadcast) channel in 1 s then only the last one will be remembered as it has size 1; all prior requests will be dropped. The select loop will only check for a new update request after the delay event, and the channel receive future will timeout if there is none. Thus no blocking calls.

Disconnect the connected peer in the wallet connectivity if there is an RPC error, so that a new connection can be established. Improve console wallet responsiveness as per PR tari-project#3304

stringhandler

Not 100% convinced, but the impact should be low. I think I can craft a simpler debouncer as I stated in the comments, but this can go in for now

Disconnect the connected peer in the wallet connectivity if there is an RPC error, so that a new connection can be established. Improve console wallet responsiveness as per PR tari-project#3304

hansieodendaal requested review from sdbondi, philipr-za and stringhandler September 6, 2021 08:48

sdbondi reviewed Sep 6, 2021

View reviewed changes

base_layer/wallet/src/base_node_service/monitor.rs Outdated Show resolved Hide resolved

philipr-za reviewed Sep 6, 2021

View reviewed changes

hansieodendaal changed the title ~~Improve console wallet responsiveness~~ feat: improve console wallet responsiveness Sep 6, 2021

hansieodendaal force-pushed the ho_improve_wallet_response branch 4 times, most recently from c660fd3 to 6a4e072 Compare September 7, 2021 09:26

hansieodendaal requested a review from philipr-za September 7, 2021 09:27

philipr-za reviewed Sep 7, 2021

View reviewed changes

common/config/presets/tari_config_example.toml Outdated Show resolved Hide resolved

common/config/presets/tari_igor_config.toml Outdated Show resolved Hide resolved

hansieodendaal force-pushed the ho_improve_wallet_response branch from 6a4e072 to ca3554f Compare September 7, 2021 13:03

sdbondi previously approved these changes Sep 8, 2021

View reviewed changes

hansieodendaal dismissed sdbondi’s stale review via 38f3993 September 13, 2021 09:39

hansieodendaal force-pushed the ho_improve_wallet_response branch from ca3554f to 38f3993 Compare September 13, 2021 09:39

stringhandler requested changes Sep 14, 2021

View reviewed changes

hansieodendaal added 3 commits September 20, 2021 18:43

review comments

b354ee9

review comments

d4f118a

hansieodendaal force-pushed the ho_improve_wallet_response branch from 38f3993 to 32e2399 Compare September 20, 2021 16:44

hansieodendaal added the P-do_not_merge Process - Not ready for merging label Sep 21, 2021

review comments - balance enquiry debouncer

95ee4d4

hansieodendaal force-pushed the ho_improve_wallet_response branch from 32e2399 to 95ee4d4 Compare September 21, 2021 08:21

hansieodendaal removed the P-do_not_merge Process - Not ready for merging label Sep 21, 2021

hansieodendaal requested a review from stringhandler September 21, 2021 08:51

stringhandler reviewed Sep 21, 2021

View reviewed changes

stringhandler approved these changes Sep 21, 2021

View reviewed changes

stringhandler added the mq-approved label Sep 21, 2021

Merge branch 'development' into ho_improve_wallet_response

5bd3b9e

aviator-app bot merged commit 73017a4 into tari-project:development Sep 21, 2021

stringhandler mentioned this pull request Sep 21, 2021

refactor(wallet): simpler debounce mechanism #3375

Closed

hansieodendaal deleted the ho_improve_wallet_response branch October 14, 2021 05:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve console wallet responsiveness #3304

feat: improve console wallet responsiveness #3304

hansieodendaal commented Sep 6, 2021 •

edited

Loading

philipr-za left a comment

philipr-za Sep 6, 2021

hansieodendaal Sep 6, 2021

sdbondi left a comment •

edited

Loading

sdbondi Sep 8, 2021 •

edited

Loading

hansieodendaal Sep 13, 2021

stringhandler left a comment

stringhandler Sep 14, 2021

hansieodendaal Sep 21, 2021 •

edited

Loading

stringhandler Sep 14, 2021

hansieodendaal Sep 21, 2021

stringhandler Sep 21, 2021

hansieodendaal Sep 21, 2021

stringhandler left a comment

feat: improve console wallet responsiveness #3304

feat: improve console wallet responsiveness #3304

Conversation

hansieodendaal commented Sep 6, 2021 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

philipr-za left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdbondi left a comment • edited Loading

Choose a reason for hiding this comment

sdbondi Sep 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stringhandler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hansieodendaal Sep 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stringhandler left a comment

Choose a reason for hiding this comment

hansieodendaal commented Sep 6, 2021 •

edited

Loading

sdbondi left a comment •

edited

Loading

sdbondi Sep 8, 2021 •

edited

Loading

hansieodendaal Sep 21, 2021 •

edited

Loading