services/horizon/internal/db2/history: Optimize query for reaping lookup tables #5393

tamirms · 2024-07-18T10:01:20Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with name of package that is most changed in the PR, ex.
services/friendbot, or all or doc if the changes are broad or impact many
packages.

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated any docs (developer docs, .md
files, etc... affected by this change). Take a look in the docs folder for a given service,
like this one.

Release planning

I've updated the relevant CHANGELOG (here for Horizon) if
needed with deprecations, added features, breaking changes, and DB schema changes.
I've decided if this PR requires a new major/minor version according to
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.

What

Previously the sql queries to fined orphaned history lookup table rows looked like:

SELECT id
FROM   (SELECT id,
               (SELECT 1
                FROM   history_effects
                WHERE  history_account_id = hcb.id
                LIMIT  1) AS c0,
               (SELECT 1
                FROM   history_operation_participants
                WHERE  history_account_id = hcb.id
                LIMIT  1) AS c1,
               (SELECT 1
                FROM   history_trades
                WHERE  base_account_id = hcb.id
                LIMIT  1) AS c2,
               (SELECT 1
                FROM   history_trades
                WHERE  counter_account_id = hcb.id
                LIMIT  1) AS c3,
               (SELECT 1
                FROM   history_transaction_participants
                WHERE  history_account_id = hcb.id
                LIMIT  1) AS c4,
               1          AS cx
        FROM   history_accounts hcb
        WHERE  id >= 3183878982
        ORDER  BY id
        LIMIT  1000) AS sub
WHERE  c0 IS NULL
       AND c1 IS NULL
       AND c2 IS NULL
       AND c3 IS NULL
       AND c4 IS NULL
       AND 1 = 1

The idea behind this query is to iterate over rows from the history_accounts table in batches of 1000. For each row in the batch we check if the account id is referenced in any of the history tables. If there are no references then we know we have identified an orphaned row which can be deleted.

Here is the explain analyze output on that query:

                                                                                            QUERY PLAN                                                                                             
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on sub  (cost=0.43..3559.46 rows=1 width=8) (actual time=93780.856..93780.858 rows=0 loops=1)
   Filter: ((sub.c0 IS NULL) AND (sub.c1 IS NULL) AND (sub.c2 IS NULL) AND (sub.c3 IS NULL) AND (sub.c4 IS NULL))
   Rows Removed by Filter: 1000
   ->  Limit  (cost=0.43..3549.46 rows=1000 width=32) (actual time=8.656..93780.428 rows=1000 loops=1)
         ->  Index Only Scan using index_history_accounts_on_id on history_accounts hcb  (cost=0.43..11756323.12 rows=3312542 width=32) (actual time=8.655..93780.174 rows=1000 loops=1)
               Index Cond: (id >= '3183878982'::bigint)
               Heap Fetches: 5
               SubPlan 1
                 ->  Limit  (cost=0.71..0.77 rows=1 width=4) (actual time=46.704..46.704 rows=1 loops=1000)
                       ->  Index Only Scan using hist_e_id on history_effects  (cost=0.71..46809.94 rows=775981 width=4) (actual time=46.701..46.701 rows=1 loops=1000)
                             Index Cond: (history_account_id = hcb.id)
                             Heap Fetches: 84730
               SubPlan 2
                 ->  Limit  (cost=0.71..0.74 rows=1 width=4) (actual time=42.713..42.713 rows=1 loops=1000)
                       ->  Index Only Scan using hist_op_p_id on history_operation_participants  (cost=0.71..18835.15 rows=516039 width=4) (actual time=42.710..42.710 rows=1 loops=1000)
                             Index Cond: (history_account_id = hcb.id)
                             Heap Fetches: 125718
               SubPlan 3
                 ->  Limit  (cost=0.57..0.61 rows=1 width=4) (actual time=0.038..0.038 rows=0 loops=1000)
                       ->  Index Only Scan using htrd_by_base_account on history_trades  (cost=0.57..6013.19 rows=200418 width=4) (actual time=0.037..0.037 rows=0 loops=1000)
                             Index Cond: (base_account_id = hcb.id)
                             Heap Fetches: 0
               SubPlan 4
                 ->  Limit  (cost=0.57..0.61 rows=1 width=4) (actual time=0.038..0.038 rows=0 loops=1000)
                       ->  Index Only Scan using htrd_by_counter_account on history_trades history_trades_1  (cost=0.57..5581.26 rows=186019 width=4) (actual time=0.037..0.037 rows=0 loops=1000)
                             Index Cond: (counter_account_id = hcb.id)
                             Heap Fetches: 3
               SubPlan 5
                 ->  Limit  (cost=0.70..0.79 rows=1 width=4) (actual time=4.278..4.278 rows=1 loops=1000)
                       ->  Index Only Scan using hist_tx_p_id on history_transaction_participants  (cost=0.70..11259.16 rows=135159 width=4) (actual time=4.276..4.276 rows=1 loops=1000)
                             Index Cond: (history_account_id = hcb.id)
                             Heap Fetches: 1853
 Planning Time: 50.173 ms
 Execution Time: 93780.903 ms
(34 rows)

This PR changes the query to use LATERAL joins:

SELECT    e1.id
FROM      (
                   SELECT   id
                   FROM     history_accounts
                   WHERE    id >= 4632128084
                   ORDER BY id limit 1000) e1
LEFT JOIN lateral
          (
                 SELECT 1 AS row
                 FROM   history_transaction_participants
                 WHERE  history_transaction_participants.history_account_id = e1.id limit 1) e2
ON        true
LEFT JOIN lateral
          (
                 SELECT 1 AS row
                 FROM   history_effects
                 WHERE  history_effects.history_account_id = e1.id limit 1) e3
ON        true
LEFT JOIN lateral
          (
                 SELECT 1 AS row
                 FROM   history_operation_participants
                 WHERE  history_operation_participants.history_account_id = e1.id limit 1) e4
ON        true
LEFT JOIN lateral
          (
                 SELECT 1 AS row
                 FROM   history_trades
                 WHERE  history_trades.base_account_id = e1.id limit 1) e5
ON        true
LEFT JOIN lateral
          (
                 SELECT 1 AS row
                 FROM   history_trades
                 WHERE  history_trades.counter_account_id = e1.id limit 1) e6
ON        true
WHERE     e2.row IS NULL
AND       e3.row IS NULL
AND       e4.row IS NULL
AND       e5.row IS NULL
AND       e6.row IS NULL

The new query is often faster because if it detects that an account id is referenced in one history table it aborts the search and does not query the remaining history tables in the subsequent lateral joins.

                                                                                              QUERY PLAN                                                                                              
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=3.70..864.09 rows=1 width=8) (actual time=10615.471..10615.475 rows=0 loops=1)
   Filter: ((1) IS NULL)
   ->  Nested Loop Left Join  (cost=3.12..863.47 rows=1 width=8) (actual time=10615.470..10615.473 rows=0 loops=1)
         Filter: ((1) IS NULL)
         ->  Nested Loop Left Join  (cost=2.55..862.84 rows=1 width=8) (actual time=10615.470..10615.472 rows=0 loops=1)
               Filter: ((1) IS NULL)
               ->  Nested Loop Left Join  (cost=1.84..862.08 rows=1 width=8) (actual time=10615.470..10615.472 rows=0 loops=1)
                     Filter: ((1) IS NULL)
                     ->  Nested Loop Left Join  (cost=1.14..858.14 rows=5 width=8) (actual time=10615.469..10615.471 rows=0 loops=1)
                           Filter: ((1) IS NULL)
                           Rows Removed by Filter: 1000
                           ->  Limit  (cost=0.43..39.84 rows=1000 width=8) (actual time=1.183..2.864 rows=1000 loops=1)
                                 ->  Index Only Scan using index_history_accounts_on_id on history_accounts  (cost=0.43..122615.46 rows=3111337 width=8) (actual time=1.182..2.733 rows=1000 loops=1)
                                       Index Cond: (id >= '4632128084'::bigint)
                                       Heap Fetches: 25
                           ->  Limit  (cost=0.70..0.79 rows=1 width=4) (actual time=10.612..10.612 rows=1 loops=1000)
                                 ->  Index Only Scan using hist_tx_p_id on history_transaction_participants  (cost=0.70..11259.16 rows=135159 width=4) (actual time=10.609..10.609 rows=1 loops=1000)
                                       Index Cond: (history_account_id = history_accounts.id)
                                       Heap Fetches: 7358
                     ->  Limit  (cost=0.71..0.77 rows=1 width=4) (never executed)
                           ->  Index Only Scan using hist_e_id on history_effects  (cost=0.71..46809.94 rows=775981 width=4) (never executed)
                                 Index Cond: (history_account_id = history_accounts.id)
                                 Heap Fetches: 0
               ->  Limit  (cost=0.71..0.74 rows=1 width=4) (never executed)
                     ->  Index Only Scan using hist_op_p_id on history_operation_participants  (cost=0.71..18835.15 rows=516039 width=4) (never executed)
                           Index Cond: (history_account_id = history_accounts.id)
                           Heap Fetches: 0
         ->  Limit  (cost=0.57..0.61 rows=1 width=4) (never executed)
               ->  Index Only Scan using htrd_by_base_account on history_trades  (cost=0.57..6013.19 rows=200418 width=4) (never executed)
                     Index Cond: (base_account_id = history_accounts.id)
                     Heap Fetches: 0
   ->  Limit  (cost=0.57..0.61 rows=1 width=4) (never executed)
         ->  Index Only Scan using htrd_by_counter_account on history_trades history_trades_1  (cost=0.57..5581.26 rows=186019 width=4) (never executed)
               Index Cond: (counter_account_id = history_accounts.id)
               Heap Fetches: 0
 Planning Time: 0.246 ms
 Execution Time: 10615.520 ms
(37 rows)

This change was deployed on staging on 7/18 at approximately 11am UTC. We can see that the duration of reaping lookup tables has decreased significantly:

We also observed a decrease in the number of timeout errors when reaping lookup tables:

Known limitations

[N/A]

mollykarcher · 2024-07-23T15:38:59Z

Nice results! I have some other suggestions for things we could maybe test out to optimize further, but these shouldn't block merging this as is, as it's still a step forward.

Using a CTE to store the intermediate results of that top-level select/from might give us some gains. So by that I mean testing out changing it from

SELECT    e1.id
FROM      (
     SELECT   id
     FROM     history_accounts
     WHERE    id >= 4632128084
     ORDER BY id limit 1000
) e1

to

WITH ha_batch AS (
       SELECT   id
       FROM     history_accounts
       WHERE    id >= 4632128084
       ORDER BY id limit 1000
) SELECT e1.id FROM ha_batch e1

The index scan on hist_tx_p_id on history_transaction_participants is still using a crazy amount of heap fetches. My guess is because this looks like it's not just an index on history_account_id, it's a unique, multicolumn index. I'm not sure if it would be feasible to add a migration that adds an index to this column, and we might even want to to sanity-check/benchmark it against our full history instances if we decide to do it, but that would almost certainly be our biggest win still left to gain here.
Looks like there might be some minor index bloat on the index_history_accounts_on_id (on history_accounts) index as well, based on that query plan. I would probably ignore this for now because it's not major and the CTE change might render it moot anyway. But if it doesn't take too long, we could sanity-check that with a REINDEX and then decide from there if the gains would be worth more tuning.

tamirms · 2024-07-23T15:52:24Z

@mollykarcher

My guess is because this looks like it's not just an index on history_account_id, it's a unique, multicolumn index.

hist_tx_p_id is in fact a unique multicolumn index:

horizon=> \d+ history_transaction_participants
                             Table "public.history_transaction_participants"
         Column         |  Type  | Collation | Nullable | Default | Storage | Stats target | Description 
------------------------+--------+-----------+----------+---------+---------+--------------+-------------
 history_transaction_id | bigint |           | not null |         | plain   |              | 
 history_account_id     | bigint |           | not null |         | plain   |              | 
Indexes:
    "hist_tx_p_id" UNIQUE, btree (history_account_id, history_transaction_id)
    "htp_by_htid" btree (history_transaction_id)
Access method: heap

I thought that multicolumn indexes could still be efficient as long as we use queries with equality constraints on the leading column (which is the case for our query on history_transaction_participants). Is that not the case?

mollykarcher · 2024-07-23T17:23:28Z

I thought that multicolumn indexes could still be efficient as long as we use queries with equality constraints on the leading column (which is the case for our query on history_transaction_participants). Is that not the case?

I think that's probably true to some degree, and maybe what they even purport in the docs. But indexes in general are really finicky. A multicolumn index is optimized for queries that utilize both columns in the index together in the same query (and in the exact same order of appearance in the query). As you said, since history_account_id is the left-most prefix of the index, it should still have some performance benefits (as opposed to queries on history_transaction_id, for which it would not help at all). But I think a single-column index would undoubtedly still provide better performance and more efficient lookups. There's a chance I could be totally off base (cause again indexes and query planner is super finicky), but to me this is a more obvious, low-hanging-fruit type situation than my other 2 suggestions.

Unrelated to this, but another thought after reading the docs you linked on LATERAL JOIN. Tbh I have actually never used/heard of LATERAL JOIN before this, but it sounds quite similar to EXISTS, which I believe should short-circuit in the same way you're describing. That said, I have absolutely 0 context on how it would compare performance-wise (if at all) so feel free to ignore this. But I only bring it up because I personally find reading queries with EXISTS to be a little more intuitive. So something like:

WITH ... 
SELECT e1.id FROM ha_batch e1
WHERE NOT EXISTS (SELECT 1 FROM history_transaction_participants ....)
AND NOT EXISTS (SELECT 1 from history_effects ...)
AND NOT EXISTS ...

tamirms · 2024-07-26T16:46:10Z

@mollykarcher adding the index on the history_account_id column of the history_transaction_participants table proved to be extremely effective. The peak reaping duration reduced to just ~500 ms with the index:

Before adding the index, the peak reaping duration was at 2 seconds:

However, it took approximately 1.5 hours to add the index on staging (which has 1 year of history). But the biggest concern is that the index occupies 82 GB of disk space. Ironically the size of the history_accounts table on staging is 1612 MB (there are only ~4 million rows in history_accounts). It doesn't make sense to add an 82 GB index so we can reap a 1.6GB table more efficiently, so I have opted to forgo adding the history_account_id index.

tamirms · 2024-07-26T17:33:56Z

I have updated the PR to store the reap offsets in the horizon DB instead of in-memory. Thus allows reaping of lookup tables to resume from where it left off after horizon restarts

tamirms marked this pull request as ready for review July 23, 2024 09:13

tamirms requested a review from a team July 23, 2024 09:14

2opremio approved these changes Jul 23, 2024

View reviewed changes

tamirms added 5 commits July 26, 2024 12:23

Optimize query for reaping lookup tables

aa34343

use NOT EXISTS

f338004

add JSON tags for logging

d81c37d

record offsets in db

e86edf4

fix query

93f7ae9

tamirms force-pushed the reap-query branch from 775c8fd to 93f7ae9 Compare July 26, 2024 11:24

tamirms added 2 commits July 26, 2024 12:41

fix tests

57ed334

fix govet

6a49102

tamirms force-pushed the reap-query branch from c1fdd25 to 6a49102 Compare July 26, 2024 14:11

fix logging

7ae9b3f

tamirms added 2 commits July 26, 2024 18:17

make batch size configurable

d1f33c6

fix go vet

388ba6b

tamirms merged commit ecd28b6 into stellar:master Jul 29, 2024
23 checks passed

tamirms deleted the reap-query branch July 29, 2024 09:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

services/horizon/internal/db2/history: Optimize query for reaping lookup tables #5393

services/horizon/internal/db2/history: Optimize query for reaping lookup tables #5393

tamirms commented Jul 18, 2024 •

edited

Loading

mollykarcher commented Jul 23, 2024 •

edited

Loading

tamirms commented Jul 23, 2024

mollykarcher commented Jul 23, 2024

tamirms commented Jul 26, 2024

tamirms commented Jul 26, 2024

services/horizon/internal/db2/history: Optimize query for reaping lookup tables #5393

services/horizon/internal/db2/history: Optimize query for reaping lookup tables #5393

Conversation

tamirms commented Jul 18, 2024 • edited Loading

PR Structure

Thoroughness

Release planning

What

Known limitations

mollykarcher commented Jul 23, 2024 • edited Loading

tamirms commented Jul 23, 2024

mollykarcher commented Jul 23, 2024

tamirms commented Jul 26, 2024

tamirms commented Jul 26, 2024

tamirms commented Jul 18, 2024 •

edited

Loading

mollykarcher commented Jul 23, 2024 •

edited

Loading