Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable parquet pushdown_filter by default #12524

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Sep 18, 2024

Draft as I am:

  • Run performance tests
  • Get CI to pass

Which issue does this PR close?

Closes #3463

Rationale for this change

Now that #4028 is completed thanks to @itsjunetime in #12135 we do not re-apply the filter twice it may be possible to enable filter pushdown for ParquetExec
in all cases and get faster performance.

What changes are included in this PR?

  1. This PR turns the option on by default

Are these changes tested?

Yes, functionally by CI

Are there any user-facing changes?

Performance:
(RUNNING)

@github-actions github-actions bot added the common Related to common crate label Sep 18, 2024
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 18, 2024
@alamb
Copy link
Contributor Author

alamb commented Sep 18, 2024

This is a fascinating result. Som of the queries become crazy fast (the ones with selective predicates) but some become slower

Query1

SELECT COUNT(*) FROM hits WHERE "AdvEngineID" <> 0;

Query 20:

SELECT COUNT(*) FROM hits WHERE "URL" LIKE '%google%';
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ alamb_enable_pushdown_default ┃           Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.65ms │                        0.65ms │        no change │
│ QQuery 1     │    70.27ms │                        0.77ms │   +91.54x faster │
│ QQuery 2     │   124.58ms │                      125.58ms │        no change │
│ QQuery 3     │   130.86ms │                      131.07ms │        no change │
│ QQuery 4     │   994.72ms │                      984.82ms │        no change │
│ QQuery 5     │  1098.90ms │                     1069.84ms │        no change │
│ QQuery 6     │    68.81ms │                       68.55ms │        no change │
│ QQuery 7     │    82.52ms │                       93.66ms │     1.14x slower │
│ QQuery 8     │  1465.85ms │                     1482.08ms │        no change │
│ QQuery 9     │  1392.38ms │                     1404.72ms │        no change │
│ QQuery 10    │   467.05ms │                      469.56ms │        no change │
│ QQuery 11    │   507.18ms │                      522.15ms │        no change │
│ QQuery 12    │  1239.13ms │                     1569.34ms │     1.27x slower │
│ QQuery 13    │  2273.55ms │                     2773.38ms │     1.22x slower │
│ QQuery 14    │  1658.67ms │                     2064.14ms │     1.24x slower │
│ QQuery 15    │  1140.34ms │                     1128.32ms │        no change │
│ QQuery 16    │  3061.76ms │                     3004.23ms │        no change │
│ QQuery 17    │  2842.35ms │                     2779.97ms │        no change │
│ QQuery 18    │  5902.65ms │                     5873.68ms │        no change │
│ QQuery 19    │   121.98ms │                      123.86ms │        no change │
│ QQuery 20    │  1653.74ms │                        0.84ms │ +1965.08x faster │
│ QQuery 21    │  2111.03ms │                     1870.85ms │    +1.13x faster │
│ QQuery 22    │  5051.49ms │                     4519.72ms │    +1.12x faster │
│ QQuery 23    │ 11855.82ms │                     4572.32ms │    +2.59x faster │
│ QQuery 24    │   810.32ms │                     1247.04ms │     1.54x slower │
│ QQuery 25    │   713.56ms │                     1043.16ms │     1.46x slower │
│ QQuery 26    │   873.50ms │                     1284.79ms │     1.47x slower │
│ QQuery 27    │  2550.62ms │                     3498.56ms │     1.37x slower │
│ QQuery 28    │ 16098.85ms │                    17196.12ms │     1.07x slower │
│ QQuery 29    │   568.92ms │                      570.81ms │        no change │
│ QQuery 30    │  1297.52ms │                     1591.46ms │     1.23x slower │
│ QQuery 31    │  1380.48ms │                     1636.01ms │     1.19x slower │
│ QQuery 32    │  4894.61ms │                     4815.56ms │        no change │
│ QQuery 33    │  5290.89ms │                     5220.16ms │        no change │
│ QQuery 34    │  5370.64ms │                     5251.96ms │        no change │
│ QQuery 35    │  1861.53ms │                     1881.09ms │        no change │
│ QQuery 36    │   314.36ms │                      330.03ms │        no change │
│ QQuery 37    │   221.61ms │                      242.99ms │     1.10x slower │
│ QQuery 38    │   202.01ms │                      130.08ms │    +1.55x faster │
│ QQuery 39    │  1064.41ms │                      880.11ms │    +1.21x faster │
│ QQuery 40    │    91.54ms │                      115.38ms │     1.26x slower │
│ QQuery 41    │    83.22ms │                      103.32ms │     1.24x slower │
│ QQuery 42    │    95.78ms │                      102.88ms │     1.07x slower │
└──────────────┴────────────┴───────────────────────────────┴──────────────────┘

I need to spend some time looking at the queries that got slower in more detail

TPCH also shows many queries getting slower:

--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_enable_pushdown_default ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  252.62ms │                      195.92ms │ +1.29x faster │
│ QQuery 2     │  131.31ms │                      137.69ms │     no change │
│ QQuery 3     │  138.74ms │                      308.70ms │  2.23x slower │
│ QQuery 4     │   92.29ms │                      169.34ms │  1.83x slower │
│ QQuery 5     │  180.88ms │                      241.49ms │  1.34x slower │
│ QQuery 6     │   60.26ms │                       79.72ms │  1.32x slower │
│ QQuery 7     │  223.37ms │                      262.70ms │  1.18x slower │
│ QQuery 8     │  165.38ms │                      344.30ms │  2.08x slower │
│ QQuery 9     │  259.49ms │                      293.68ms │  1.13x slower │
│ QQuery 10    │  244.94ms │                      332.83ms │  1.36x slower │
│ QQuery 11    │  104.23ms │                      102.57ms │     no change │
│ QQuery 12    │  135.08ms │                      246.39ms │  1.82x slower │
│ QQuery 13    │  304.56ms │                      209.53ms │ +1.45x faster │
│ QQuery 14    │   97.08ms │                       99.63ms │     no change │
│ QQuery 15    │  128.14ms │                      146.55ms │  1.14x slower │
│ QQuery 16    │   85.69ms │                       80.65ms │ +1.06x faster │
│ QQuery 17    │  246.47ms │                      256.35ms │     no change │
│ QQuery 18    │  344.79ms │                      317.44ms │ +1.09x faster │
│ QQuery 19    │  162.78ms │                      132.63ms │ +1.23x faster │
│ QQuery 20    │  145.63ms │                      190.68ms │  1.31x slower │
│ QQuery 21    │  276.08ms │                      477.99ms │  1.73x slower │
│ QQuery 22    │   65.63ms │                       97.64ms │  1.49x slower │
└──────────────┴───────────┴───────────────────────────────┴───────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable parquet filter pushdown by default
1 participant