-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use "UNION ALL+limit" for full 45 years SA search by using "two_year_transaction_period" #4414
Comments
@lbeaufort found a good one to do the join at end and only one time to reduce the code size. little bit of slower, but it is less than 1 time. SELECT anon_2.anon_1_disclosure_fec_fitem_sched_a_cmte_id, anon_2.anon_1_disclosure_fec_fitem_sched_a_rpt_yr, anon_2.anon_1_disclosure_fec_fitem_sched_a_rpt_tp, anon_2.anon_1_disclosure_fec_fitem_sched_a_image_num, anon_2.anon_1_disclosure_fec_fitem_sched_a_line_num, anon_2.anon_1_disclosure_fec_fitem_sched_a_tran_id, anon_2.anon_1_disclosure_fec_fitem_sched_a_file_num, anon_2.anon_1_disclosure_fec_fitem_sched_a_cmte_nm, anon_2.anon_1_disclosure_fec_fitem_sched_a_entity_tp, anon_2.anon_1_disclosure_fec_fitem_sched_a_entity_tp_desc, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_id, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_prefix, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_nm, anon_2.anon_1_disclosure_fec_fitem_sched_a_cmte_tp, anon_2.anon_1_disclosure_fec_fitem_sched_a_org_tp, anon_2.anon_1_disclosure_fec_fitem_sched_a_cmte_dsgn, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_nm_first, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_m_nm, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_nm_last, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_suffix, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_st1, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_st2, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_city, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_st, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_zip, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_employer, anon_2.anon_1_disclosure_fec_fitem_sched_a_contbr_occupation, anon_2.anon_1_disclosure_fec_fitem_sched_a_clean_contbr_id, anon_2.anon_1_disclosure_fec_fitem_sched_a_receipt_tp, anon_2.anon_1_disclosure_fec_fitem_sched_a_receipt_tp_desc, anon_2.anon_1_disclosure_fec_fitem_sched_a_receipt_desc, anon_2.anon_1_disclosure_fec_fitem_sched_a_memo_cd, anon_2.anon_1_disclosure_fec_fitem_sched_a_memo_cd_desc, anon_2.anon_1_disclosure_fec_fitem_sched_a_contb_receipt_dt, anon_2.anon_1_disclosure_fec_fitem_sched_a_contb_receipt_amt, anon_2.anon_1_disclosure_fec_fitem_sched_a_contb_aggregate_ytd, anon_2.anon_1_disclosure_fec_fitem_sched_a_cand_id, anon_2.anon_1_disclosure_fec_fitem_sched_a_cand_nm, anon_2.anon_1_disclosure_fec_fitem_sched_a_cand_nm_first, anon_2.anon_1_disclosure_fec_fitem_sched_a_cand_nm_last, anon_2.anon_1_disclosure_fec_fitem_sched_a_cand_m_nm, anon_2.anon_1_disclosure_fec_fitem_sched_a_cand_prefix, anon_2.anon_1_disclosure_fec_fitem_sched_a_cand_suffix, anon_2.anon_1_disclosure_fec_fitem_sched_a_cand_office, anon_2.anon_1_disclosure_fec_fitem_sched_a_cand_office_desc, anon_2.anon_1_disclosure_fec_fitem_sched_a_cand_office_st, anon_2.anon_1_disclosure_fec_fitem_sched_a_cand_office_st_desc, anon_2.anon_1_disclosure_fec_fitem_sched_a_cand_office_district, anon_2.anon_1_disclosure_fec_fitem_sched_a_conduit_cmte_id, anon_2.anon_1_disclosure_fec_fitem_sched_a_conduit_cmte_nm, anon_2.anon_1_disclosure_fec_fitem_sched_a_conduit_cmte_st1, anon_2.anon_1_disclosure_fec_fitem_sched_a_conduit_cmte_st2, anon_2.anon_1_disclosure_fec_fitem_sched_a_conduit_cmte_city, anon_2.anon_1_disclosure_fec_fitem_sched_a_conduit_cmte_st, anon_2.anon_1_disclosure_fec_fitem_sched_a_conduit_cmte_zip, anon_2.anon_1_disclosure_fec_fitem_sched_a_donor_cmte_nm, anon_2.anon_1_disclosure_fec_fitem_sched_a_national_cmte_nonfed__1, anon_2.anon_1_disclosure_fec_fitem_sched_a_election_tp, anon_2.anon_1_disclosure_fec_fitem_sched_a_election_tp_desc, anon_2.anon_1_disclosure_fec_fitem_sched_a_fec_election_tp_desc, anon_2.anon_1_disclosure_fec_fitem_sched_a_fec_election_yr, anon_2.anon_1_disclosure_fec_fitem_sched_a_action_cd, anon_2.anon_1_disclosure_fec_fitem_sched_a_action_cd_desc, anon_2.anon_1_disclosure_fec_fitem_sched_a_schedule_type_desc, anon_2.anon_1_disclosure_fec_fitem_sched_a_pg_date, anon_2.anon_1_disclosure_fec_fitem_sched_a_orig_sub_id, anon_2.anon_1_disclosure_fec_fitem_sched_a_back_ref_tran_id, anon_2.anon_1_disclosure_fec_fitem_sched_a_back_ref_sched_nm, anon_2.anon_1_disclosure_fec_fitem_sched_a_filing_form, anon_2.anon_1_disclosure_fec_fitem_sched_a_link_id, anon_2.anon_1_disclosure_fec_fitem_sched_a_contributor_name_text, anon_2.anon_1_disclosure_fec_fitem_sched_a_contributor_employer__2, anon_2.anon_1_disclosure_fec_fitem_sched_a_contributor_occupatio_3, anon_2.anon_1_disclosure_fec_fitem_sched_a_is_individual, anon_2.anon_1_disclosure_fec_fitem_sched_a_memo_text, anon_2.anon_1_disclosure_fec_fitem_sched_a_two_year_transaction__4, anon_2.anon_1_disclosure_fec_fitem_sched_a_schedule_type, anon_2.anon_1_disclosure_fec_fitem_sched_a_increased_limit, anon_2.anon_1_disclosure_fec_fitem_sched_a_sub_id, anon_2.anon_1_disclosure_fec_fitem_sched_a_pdf_url, anon_2.anon_1_disclosure_fec_fitem_sched_a_line_number_label, ofec_committee_history_mv_1.idx, ofec_committee_history_mv_1.name, ofec_committee_history_mv_1.committee_id, ofec_committee_history_mv_1.cycles, ofec_committee_history_mv_1.treasurer_name, ofec_committee_history_mv_1.treasurer_text, ofec_committee_history_mv_1.committee_type, ofec_committee_history_mv_1.committee_type_full, ofec_committee_history_mv_1.filing_frequency, ofec_committee_history_mv_1.designation, ofec_committee_history_mv_1.designation_full, ofec_committee_history_mv_1.organization_type, ofec_committee_history_mv_1.organization_type_full, ofec_committee_history_mv_1.affiliated_committee_name, ofec_committee_history_mv_1.party, ofec_committee_history_mv_1.party_full, ofec_committee_history_mv_1.state, ofec_committee_history_mv_1.street_1, ofec_committee_history_mv_1.street_2, ofec_committee_history_mv_1.city, ofec_committee_history_mv_1.state_full, ofec_committee_history_mv_1.zip, ofec_committee_history_mv_1.candidate_ids, ofec_committee_history_mv_1.cycle, ofec_committee_history_mv_1.cycles_has_financial, ofec_committee_history_mv_1.last_cycle_has_financial, ofec_committee_history_mv_1.cycles_has_activity, ofec_committee_history_mv_1.last_cycle_has_activity, ofec_committee_history_mv_1.is_active, ofec_committee_history_mv_2.idx, ofec_committee_history_mv_2.name, ofec_committee_history_mv_2.committee_id, ofec_committee_history_mv_2.cycles, ofec_committee_history_mv_2.treasurer_name, ofec_committee_history_mv_2.treasurer_text, ofec_committee_history_mv_2.committee_type, ofec_committee_history_mv_2.committee_type_full, ofec_committee_history_mv_2.filing_frequency, ofec_committee_history_mv_2.designation, ofec_committee_history_mv_2.designation_full, ofec_committee_history_mv_2.organization_type, ofec_committee_history_mv_2.organization_type_full, ofec_committee_history_mv_2.affiliated_committee_name, ofec_committee_history_mv_2.party, ofec_committee_history_mv_2.party_full, ofec_committee_history_mv_2.state, ofec_committee_history_mv_2.street_1, ofec_committee_history_mv_2.street_2, ofec_committee_history_mv_2.city, ofec_committee_history_mv_2.state_full, ofec_committee_history_mv_2.zip, ofec_committee_history_mv_2.candidate_ids, ofec_committee_history_mv_2.cycle, ofec_committee_history_mv_2.cycles_has_financial, ofec_committee_history_mv_2.last_cycle_has_financial, ofec_committee_history_mv_2.cycles_has_activity, ofec_committee_history_mv_2.last_cycle_has_activity, ofec_committee_history_mv_2.is_active |
Execution plan "before change", cost is 31550196.55. about 400 times slower than "after" one) |
Execution plan "after" code change, cost about 22533.85 |
@dzhang-fec would we get a similar benefit if we appended the |
@lbeaufort Yes, they are very similar. Basically, we want use And
|
======union all sql test result (filter by contributor_name) 1)old sql (current sql): 2)new sql:(union all) SELECT * FROM disclosure.fec_fitem_sched_a LEFT OUTER JOIN ofec_committee_history_mv AS ofec_committee_history_mv_1 ON disclosure.fec_fitem_sched_a.cmte_id = ofec_committee_history_mv_1.committee_id AND disclosure.fec_fitem_sched_a.two_year_transaction_period = ofec_committee_history_mv_1.cycle LEFT OUTER JOIN ofec_committee_history_mv AS ofec_committee_history_mv_2 ON disclosure.fec_fitem_sched_a.clean_contbr_id = ofec_committee_history_mv_2.committee_id AND disclosure.fec_fitem_sched_a.two_year_transaction_period = ofec_committee_history_mv_2.cycle test result: -----------day1------------ dev: stage: d1:9am: 1 () 2min 48 sec --old sql -----------day2------------ d2:9am: 1 () 2min 24 sec --old sql stage: new1:9am 29 secs, cache 0.5sec.--new sql d2: 9am: 1) 2min 55sec --old sql
|
Locust testing had very variable results for the same setup, so relying more in UNION ALL |
Summary:
Code change for "All year Search" to "use partitions cluster index and "limit"" , in case of system crash by accumulated long sqls and even no any result to clients of 19 mins sql run. new sql 36 secs
Completion criteria:
Problem:
Why we need to change this?
This is the top 1 slow sqls at production:
and we saw cpu spikes.
The exact sql code is
SELECT * FROM disclosure.fec_fitem_sched_a LEFT OUTER JOIN ofec_committee_history_mv AS ofec_committee_history_mv_1 ON disclosure.fec_fitem_sched_a.cmte_id = ofec_committee_history_mv_1.committee_id AND disclosure.fec_fitem_sched_a.two_year_transaction_period = ofec_committee_history_mv_1.cycle LEFT OUTER JOIN ofec_committee_history_mv AS ofec_committee_history_mv_2 ON disclosure.fec_fitem_sched_a.clean_contbr_id = ofec_committee_history_mv_2.committee_id AND disclosure.fec_fitem_sched_a.two_year_transaction_period = ofec_committee_history_mv_2.cycle
WHERE disclosure.fec_fitem_sched_a.is_individual = true
AND disclosure.fec_fitem_sched_a.contributor_name_text @@ to_tsquery('ActBlue:*')
ORDER BY disclosure.fec_fitem_sched_a.contb_receipt_dt DESC, disclosure.fec_fitem_sched_a.sub_id DESC
LIMIT 30
Analysis and Suggestions:
Analysis:
The issue is searching for "all" yeas from 1975 to 2020 of schedule_a and the sql is not using our index at all. full table scan 332,757,433 rows. We have the partitions from 1975 (23 partitions now). I will be much better to use the fec_fitem_sched_a.two_year_transaction_period index as we are using for others sqls.
There is only 30 rows return but current it has to scan 332millions rows.
There is some ideas on this, but it may take more time before the election of this year.
Suggestions:
This is to use the cluster index of the partitions we have now.
2)use "UNION all" to union all of the years, to keep the same sql return.
3)Code suggestion as below (but it needs code change).
Thanks for the review.
The text was updated successfully, but these errors were encountered: