-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schedule B slow sql (65 to 800+ times faster by add a new index: idx_sched_b_2019_2020_disb_dt_sub_id #4531
Comments
==========case 2: same solution: 3mins to 0.9sec (65 times faster) ============== ------------------================================without xhnage execution plan: not use index ---------------------------------------------------After new index change==================== ------execution plan: using the new index ( Index Scan using idx_sched_b_2019_2020_disb_dt_sub_id on fec_fitem_sched_b_2019_2020 |
Related to #4491 |
Thanks for your work on this, @dzhang-fec, this looks good so far. There is one query that seems slower than before that might need another index:
Examples: https://docs.google.com/spreadsheets/d/1J5QK4mw7HXbMnaDV_ClCGWEgbbI3VZZhXDqmjIFwBqI/edit#gid=0 |
@lbeaufort the above sql slowness issue is different. We need to add another index. Can I deal this new issue at #4533
@lbeaufort the above sql slowness issue is different. We need to add another index. Can I deal this new issue at #4533 |
@
@dzhang-fec a new ticket is fine with me! I think after that we can monitor in production and add as needed. Is the best course of action to add a new migration file for the composite index in this issue, and removing any unneeded |
@dzhang-fec I see you're way ahead of me with this PR #4534 thank you! My only remaining question then is whether we should remove any of the |
@dzhang-fec and I discussed this and we're going to leave the indexes in place for now out of caution. |
Completion criteria
COALESCE
indexes?I am not sure where to put my long investigation with test cases. So I put the link here just in case.
related tickets:
#2791
#4491
Query from @lbeaufort
Problem: too slow (timeout by 5+mins)
Suggestion and analysis:
We suggest to add a new index to let the ORDER DESC to use index
CREATE INDEX idx_sched_b_2019_2020_disb_dt_sub_id
ON disclosure.fec_fitem_sched_b_2019_2020 USING btree
(disb_dt DESC NULLS FIRST, sub_id DESC NULLS FIRST)
TABLESPACE pg_default;
" -> Index Scan using idx_sched_b_2019_2020_disb_dt_sub_id on fec_fitem_sched_b_2019_2020 (cost=0.57..31662387.50 rows=70824809 width=1353)"
Result: 14mins to 0.6 sec. (800+ times faster)
Tests: new index is at the "dev", no new index at stage for your further index.
Note: we need add these indexes at all of partitions (23)/
SELECT *
FROM disclosure.fec_fitem_sched_b LEFT OUTER JOIN ofec_committee_history_mv AS ofec_committee_history_mv_1 ON disclosure.fec_fitem_sched_b.cmte_id = ofec_committee_history_mv_1.committee_id AND disclosure.fec_fitem_sched_b.two_year_transaction_period = ofec_committee_history_mv_1.cycle LEFT OUTER JOIN ofec_committee_history_mv AS ofec_committee_history_mv_2 ON disclosure.fec_fitem_sched_b.clean_recipient_cmte_id = ofec_committee_history_mv_2.committee_id AND disclosure.fec_fitem_sched_b.two_year_transaction_period = ofec_committee_history_mv_2.cycle
WHERE disclosure.fec_fitem_sched_b.two_year_transaction_period IN (2020)
AND disclosure.fec_fitem_sched_b.disb_dt >= '1/1/2019'
AND disclosure.fec_fitem_sched_b.disb_dt <= '12/31/2020'
ORDER BY disclosure.fec_fitem_sched_b.disb_dt DESC, disclosure.fec_fitem_sched_b.sub_id DESC
LIMIT 20
-------old: at stage--------------------------
execution time: Total query runtime: 14 min 44 secs.
Successfully run. Total query runtime: 14 min 44 secs.
20 rows affected.
Execution plan:
"Limit (cost=52247515.16..52247515.21 rows=20 width=2852)"
" -> Sort (cost=52247515.16..52424579.40 rows=70825696 width=2852)"
" Sort Key: fec_fitem_sched_b.disb_dt DESC, fec_fitem_sched_b.sub_id DESC"
" -> Gather (cost=41824.15..50362868.86 rows=70825696 width=2852)"
" Workers Planned: 2"
" -> Hash Left Join (cost=40824.15..43279299.26 rows=29510707 width=2852)"
" Hash Cond: ((fec_fitem_sched_b.two_year_transaction_period = ofec_committee_history_mv_2.cycle) AND ((fec_fitem_sched_b.clean_recipient_cmte_id)::text = (ofec_committee_history_mv_2.committee_id)::text))"
" -> Hash Left Join (cost=20412.08..22462791.97 rows=29510707 width=2078)"
" Hash Cond: ((fec_fitem_sched_b.two_year_transaction_period = ofec_committee_history_mv_1.cycle) AND ((fec_fitem_sched_b.cmte_id)::text = (ofec_committee_history_mv_1.committee_id)::text))"
" -> Append (cost=0.00..7051870.60 rows=29510707 width=1324)"
" -> Parallel Seq Scan on fec_fitem_sched_b (cost=0.00..0.00 rows=1 width=5322)"
" Filter: ((disb_dt >= '2019-01-01 00:00:00'::timestamp without time zone) AND (disb_dt <= '2020-12-31 00:00:00'::timestamp without time zone) AND (two_year_transaction_period = '2020'::numeric))"
" -> Parallel Seq Scan on fec_fitem_sched_b_2019_2020 (cost=0.00..7051870.60 rows=29510706 width=1324)"
" Filter: ((disb_dt >= '2019-01-01 00:00:00'::timestamp without time zone) AND (disb_dt <= '2020-12-31 00:00:00'::timestamp without time zone) AND (two_year_transaction_period = '2020'::numeric))"
" -> Hash (cost=18569.55..18569.55 rows=16635 width=754)"
" -> Bitmap Heap Scan on ofec_committee_history_mv ofec_committee_history_mv_1 (cost=387.34..18569.55 rows=16635 width=754)"
" Recheck Cond: (cycle = '2020'::numeric)"
" -> Bitmap Index Scan on idx_ofec_committee_history_mv_cycle_committee_id (cost=0.00..383.18 rows=16635 width=0)"
" Index Cond: (cycle = '2020'::numeric)"
" -> Hash (cost=18569.55..18569.55 rows=16635 width=754)"
" -> Bitmap Heap Scan on ofec_committee_history_mv ofec_committee_history_mv_2 (cost=387.34..18569.55 rows=16635 width=754)"
" Recheck Cond: (cycle = '2020'::numeric)"
" -> Bitmap Index Scan on idx_ofec_committee_history_mv_cycle_committee_id (cost=0.00..383.18 rows=16635 width=0)"
" Index Cond: (cycle = '2020'::numeric)"
---new: at dev add a new index:------------------------------------------------------------------------------------------
---add a new index:
CREATE INDEX idx_sched_b_2019_2020_disb_dt_sub_id
ON disclosure.fec_fitem_sched_b_2019_2020 USING btree
(disb_dt DESC NULLS FIRST, sub_id DESC )
TABLESPACE pg_default;
---execution time: 600msec
---execution plan: go tto use the new index
"Limit (cost=1.43..79.60 rows=20 width=2879)"
" -> Nested Loop Left Join (cost=1.43..276842326.65 rows=70824810 width=2879)"
" -> Nested Loop Left Join (cost=1.01..154517950.13 rows=70824810 width=2106)"
" -> Merge Append (cost=0.59..32193573.60 rows=70824810 width=1353)"
" Sort Key: fec_fitem_sched_b.disb_dt DESC, fec_fitem_sched_b.sub_id DESC"
" -> Sort (cost=0.01..0.02 rows=1 width=5322)"
" Sort Key: fec_fitem_sched_b.disb_dt DESC, fec_fitem_sched_b.sub_id DESC"
" -> Seq Scan on fec_fitem_sched_b (cost=0.00..0.00 rows=1 width=5322)"
" Filter: ((disb_dt >= '2019-01-01 00:00:00'::timestamp without time zone) AND (disb_dt <= '2020-12-31 00:00:00'::timestamp without time zone) AND (two_year_transaction_period = '2020'::numeric))"
" -> Index Scan using idx_sched_b_2019_2020_disb_dt_sub_id on fec_fitem_sched_b_2019_2020 (cost=0.57..31662387.50 rows=70824809 width=1353)"
" Index Cond: ((disb_dt >= '2019-01-01 00:00:00'::timestamp without time zone) AND (disb_dt <= '2020-12-31 00:00:00'::timestamp without time zone))"
" Filter: (two_year_transaction_period = '2020'::numeric)"
" -> Index Scan using idx_ofec_committee_history_mv_cycle_committee_id on ofec_committee_history_mv ofec_committee_history_mv_1 (cost=0.42..1.72 rows=1 width=753)"
" Index Cond: ((fec_fitem_sched_b.two_year_transaction_period = cycle) AND (cycle = '2020'::numeric) AND ((fec_fitem_sched_b.cmte_id)::text = (committee_id)::text))"
" -> Index Scan using idx_ofec_committee_history_mv_cycle_committee_id on ofec_committee_history_mv ofec_committee_history_mv_2 (cost=0.42..1.72 rows=1 width=753)"
" Index Cond: ((fec_fitem_sched_b.two_year_transaction_period = cycle) AND (cycle = '2020'::numeric) AND ((fec_fitem_sched_b.clean_recipient_cmte_id)::text = (committee_id)::text))"
The text was updated successfully, but these errors were encountered: