Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon/internal/db2/history: Fix account transactions query #5229

Merged
merged 1 commit into from
Mar 5, 2024

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Mar 4, 2024

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

The transactions for account query is one the most common queries which appears in the slow query logs and in the Top SQL dashboard in the RDS performance insights page.

SELECT
    
ht.id, ht.transaction_hash, ht.ledger_sequence, ht.application_order, ht.account, ht.account_muxed, ht.account_sequence, ht.max_fee, COALESCE(ht.fee_charged, ht.max_fee) as fee_charged, ht.operation_count, ht.tx_envelope, ht.tx_result, ht.tx_meta, ht.tx_fee_meta, ht.created_at, ht.updated_at, COALESCE(ht.successful, true) as successful, ht.signatures, ht.memo_type, ht.memo, ht.time_bounds, ht.ledger_bounds, ht.min_account_sequence, ht.min_account_sequence_age, ht.min_account_sequence_ledger_gap, ht.extra_signers, hl.closed_at AS ledger_close_time, ht.inner_transaction_hash, ht.fee_account, ht.fee_account_muxed, ht.new_max_fee, ht.inner_signatures

FROM history_transactions ht
LEFT JOIN history_ledgers hl ON ht.ledger_sequence = hl.sequence
JOIN history_transaction_participants htp ON htp.history_transaction_id = ht.id
WHERE htp.history_account_id = $1 AND ht.id > $2 AND (ht.successful = true OR ht.successful IS NULL)
ORDER BY ht.id asc LIMIT 10

The output from explain analyze is:

                                                                                         QUERY PLAN                                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=32.89..233583.48 rows=8 width=2297) (actual time=21213.562..21588.159 rows=5 loops=1)
   ->  Nested Loop Left Join  (cost=32.89..233583.48 rows=8 width=2297) (actual time=21213.561..21588.155 rows=5 loops=1)
         ->  Merge Join  (cost=32.33..233563.11 rows=8 width=2289) (actual time=21213.531..21588.064 rows=5 loops=1)
               Merge Cond: (ht.id = htp.history_transaction_id)
               ->  Index Scan using hs_transaction_by_id on history_transactions ht  (cost=0.58..71163.40 rows=168386 width=2289) (actual time=0.025..1746.134 rows=305174 loops=1)
                     Index Cond: (id > '217362973387501568'::bigint)
                     Filter: (successful OR (successful IS NULL))
                     Rows Removed by Filter: 1248208
               ->  Index Only Scan using hist_tx_p_id on history_transaction_participants htp  (cost=0.71..165349.94 rows=305950 width=8) (actual time=1.417..19806.698 rows=33250 loops=1)
                     Index Cond: (history_account_id = '4231259336'::bigint)
                     Heap Fetches: 33271
         ->  Index Scan using index_history_ledgers_on_sequence on history_ledgers hl  (cost=0.56..2.55 rows=1 width=12) (actual time=0.011..0.011 rows=1 loops=5)
               Index Cond: (sequence = ht.ledger_sequence)
 Planning Time: 92.901 ms
 Execution Time: 21588.304 ms
(15 rows)

The problem is the combination of:

  • filtering transactions by account
  • filtering transactions with id > cursor

It has to use two separate indexes for this query:
hs_transaction_by_id to filter transactions with id > cursor:
"hs_transaction_by_id" UNIQUE, btree (id)

hist_tx_p_id to filter transactions by account:
"hist_tx_p_id" UNIQUE, btree (history_account_id, history_transaction_id)

But if you look closely at hist_tx_p_id you will see that index already includes the transaction id.

The problem is in the where clause:

WHERE htp.history_account_id = $1 AND ht.id > $2 AND (ht.successful = true OR ht.successful IS NULL)

If it is rewritten to:

WHERE htp.history_account_id = $1 AND htp.history_transaction_id > $2 AND (ht.successful = true OR ht.successful IS NULL)

that will make the query use just one index to handle both the account and transaction id filter

With the new where clause the query is extremely fast!

EXPLAIN ANALYZE
SELECT
    ht.id, ht.transaction_hash, ht.ledger_sequence, ht.application_order, ht.account, ht.account_muxed, ht.account_sequence, ht.max_fee, COALESCE(ht.fee_charged, ht.max_fee) as fee_charged, ht.operation_count, ht.tx_envelope, ht.tx_result, ht.tx_meta, ht.tx_fee_meta, ht.created_at, ht.updated_at, COALESCE(ht.successful, true) as successful, ht.signatures, ht.memo_type, ht.memo, ht.time_bounds, ht.ledger_bounds, ht.min_account_sequence, ht.min_account_sequence_age, ht.min_account_sequence_ledger_gap, ht.extra_signers, hl.closed_at AS ledger_close_time, ht.inner_transaction_hash, ht.fee_account, ht.fee_account_muxed, ht.new_max_fee, ht.inner_signatures
FROM history_transactions ht
LEFT JOIN history_ledgers hl ON ht.ledger_sequence = hl.sequence
JOIN history_transaction_participants htp ON htp.history_transaction_id = ht.id
WHERE htp.history_account_id = 4231259336 AND htp.history_transaction_id > 217362973387501568 AND (ht.successful = true OR ht.successful IS NULL)
ORDER BY ht.id asc LIMIT 10;
                                                                                 QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.86..81.94 rows=10 width=2297) (actual time=0.078..0.184 rows=5 loops=1)
   ->  Nested Loop Left Join  (cost=1.86..121.99 rows=15 width=2297) (actual time=0.078..0.182 rows=5 loops=1)
         ->  Nested Loop  (cost=1.29..83.90 rows=15 width=2289) (actual time=0.064..0.142 rows=5 loops=1)
               ->  Index Only Scan using hist_tx_p_id on history_transaction_participants htp  (cost=0.71..16.30 rows=26 width=8) (actual time=0.045..0.087 rows=5 loops=1)
                     Index Cond: ((history_account_id = '4231259336'::bigint) AND (history_transaction_id > '217362973387501568'::bigint))
                     Heap Fetches: 5
               ->  Index Scan using hs_transaction_by_id on history_transactions ht  (cost=0.58..2.60 rows=1 width=2289) (actual time=0.010..0.010 rows=1 loops=5)
                     Index Cond: (id = htp.history_transaction_id)
                     Filter: (successful OR (successful IS NULL))
         ->  Index Scan using index_history_ledgers_on_sequence on history_ledgers hl  (cost=0.56..2.54 rows=1 width=12) (actual time=0.007..0.007 rows=1 loops=5)
               Index Cond: (sequence = ht.ledger_sequence)
 Planning Time: 1.832 ms
 Execution Time: 0.247 ms
(13 rows)

After deploying this change to staging and we can observe a dramatic reduction in scans to the hs_transaction_by_id and hist_tx_p_id indexes:

Screenshot 2024-03-04 at 10 45 23 AM

We can also see a similar decline in database load on the AWS RDS performance insights dashboard when we specifically filter for the account transactions query:

Screenshot 2024-03-04 at 11 01 08 AM

Known limitations

[N/A]

@tamirms tamirms marked this pull request as ready for review March 4, 2024 10:30
@tamirms tamirms requested a review from a team March 4, 2024 11:02
@2opremio
Copy link
Contributor

2opremio commented Mar 4, 2024

Awesome results! :)

@tamirms tamirms merged commit 1379989 into stellar:master Mar 5, 2024
29 checks passed
@tamirms tamirms deleted the fix-account-transactions-query branch March 5, 2024 07:36
sreuland pushed a commit that referenced this pull request Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants