Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctness integration test for parquet filter pushdown #3976

Merged
merged 6 commits into from
Nov 4, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Oct 26, 2022

Which issue does this PR close?

Part of #3463

Rationale for this change

I want to avoid tricky wrong results bugs related to applying predicates during the parquet scans. It also found at least four bugs (see list below).

What changes are included in this PR?

  1. Add a new parquet_predicate_pushdown integration test
  2. Refactor common code out of parquet_predicate_pushdown performance test

In case you are curious, and want to play around, here is the generated parquet file: data.zip

This test creates a parquet file and then

  1. Applies the same filter with various parquet filter options
  2. Verifies the results are the same

Are there any user-facing changes?

No

Future items

I will file a ticket for some additional testing

  • Do more testing / verification that this filtering is being tested across pages (needs Add Page Row Count Limit arrow-rs#2941 from arrow)
  • Add additional coverage of parquet data layouts (so the rows fall in different layouts)
  • Add more expressions (especially ones that have various mixes of data overlap)
  • Test with subsets of columns (where the predicates need to work on files that don't have all the columns

Draft until:

@github-actions github-actions bot added the core Core DataFusion crate label Oct 26, 2022
@alamb alamb force-pushed the alamb/predicate_pushdown_tests branch 2 times, most recently from 2ca7a31 to 1faf508 Compare October 28, 2022 16:11
@alamb alamb changed the title Add integration test for parquet predicate pushdown Add correctness integration test for parquet predicate pushdown Oct 28, 2022
@@ -263,251 +182,15 @@ fn gen_data(
scale_factor: f32,
page_size: Option<usize>,
row_group_size: Option<usize>,
) -> Result<(ObjectStoreUrl, ObjectMeta)> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this code was moved into parquet-test-utils/src/lib.rs so i could reuse it

@alamb alamb changed the title Add correctness integration test for parquet predicate pushdown Correctness integration test for parquet predicate pushdown Oct 31, 2022
@alamb alamb changed the title Correctness integration test for parquet predicate pushdown Correctness integration test for parquet filter pushdown Oct 31, 2022
@alamb alamb force-pushed the alamb/predicate_pushdown_tests branch from 3ab17f3 to 43af7a9 Compare October 31, 2022 13:07
@alamb alamb force-pushed the alamb/predicate_pushdown_tests branch 5 times, most recently from 653a8ef to 644bdf5 Compare October 31, 2022 14:29
@alamb
Copy link
Contributor Author

alamb commented Oct 31, 2022

Ok, I am pretty happy with the state of pushdown predicates with this test -- I think we are go for launch

@alamb alamb force-pushed the alamb/predicate_pushdown_tests branch from 585c85b to c1b42f2 Compare November 1, 2022 13:38
.await
}

#[cfg(not(target_family = "windows"))]
Copy link
Contributor Author

@alamb alamb Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am skipping windows for now as these tests fail with some sort of path issue I don't have time to debug. Since there is no logic difference on linux/windows/mac there is no reason this also needs to be run on windows

Example failure:
https://github.com/apache/arrow-datafusion/actions/runs/3369837748/jobs/5589984001

---- basic_conjunction stdout ----
Writing test data to "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\.tmpDGlfus\\data.parquet"
Generated test dataset with 53819 rows
thread 'basic_conjunction' panicked at 'Error writing data: IO error: The filename, directory name, or volume label syntax is incorrect. (os error 123)', datafusion\core\tests\parquet_filter_pushdown.rs:66:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@alamb alamb marked this pull request as ready for review November 1, 2022 20:57
@alamb
Copy link
Contributor Author

alamb commented Nov 1, 2022

This is finally ready for review ! cc @Ted-Jiang @tustvold and @thinkharderdev

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving, but I think we should fix the temporary file handling prior to merge. I already have severe memory issues running the DataFusion tests on my laptop, leaking large files will make this exponentially worse.

Perhaps we could merge the cases into a single test to avoid needing to use lazy_static?


// Only create the parquet file once as it is fairly large
lazy_static! {
static ref TEMPDIR: TempDir = TempDir::new().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be mistaken but I believe static destructors are not run, so this will leak the file. This is likely not ideal... Particularly on linux machines where /tmp is typically backed by memory


assert_eq!(no_pushdown, pushdown_and_reordering);

// page index filtering is not correct:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be enabled now?

Copy link
Contributor Author

@alamb alamb Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- done in #4062

@alamb
Copy link
Contributor Author

alamb commented Nov 3, 2022

I could be mistaken but I believe static destructors are not run, so this will leak the file. This is likely not ideal... Particularly on linux machines where /tmp is typically backed by memory

I think you are correct . I added be746ac to drop the temp file but yet still only created it once. Can you give it a look?

I tested this code by running

cargo test -p datafusion --test parquet_filter_pushdown -- --nocapture

And you can see the file is created once :

...
Writing test data to "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
...

And then cleaned up afterwards

$ ls /var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet
ls: /var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet: No such file or directory
Full logs:
running 12 tests
Writing test data to "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
Generated test dataset with 53819 rows
Completed generating test data in 2.498437474s
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container != Utf8("backend_container_0")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") OR pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter response_status = UInt16(429)
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_method = Utf8("POST") AND response_status = UInt16(503)
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter response_status > UInt16(0)
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
Using pre-existing file
  reading with filter request_method = Utf8("GET")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_method != Utf8("GET") OR response_status = UInt16(400) OR service = Utf8("backend")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_method != Utf8("GET")
Filter: container != Utf8("backend_container_0"), total records: 53819, after filter: 15963, selectivty: 0.29660528809528236
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container != Utf8("backend_container_0")
Filter: container = Utf8("backend_container_0") OR pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 39982, selectivty: 0.7428974897341087
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") OR pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: container = Utf8("backend_container_0"), total records: 53819, after filter: 37856, selectivty: 0.7033947119047177
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0")
Filter: request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 3052, selectivty: 0.05670859733551348
Filter: response_status = UInt16(429), total records: 53819, after filter: 0, selectivty: 0
  pushdown_rows_filtered: 0
  pushdown_rows_filtered: 0
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter response_status = UInt16(429)
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: request_method = Utf8("POST") AND response_status = UInt16(503), total records: 53819, after filter: 1729, selectivty: 0.0321262007841097
Filter: response_status = UInt16(429), total records: 53819, after filter: 0, selectivty: 0
Filter: response_status > UInt16(0), total records: 53819, after filter: 53819, selectivty: 1
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
  pushdown_rows_filtered: 0
  pushdown_rows_filtered: 0
  pushdown_rows_filtered: 53819
Filter: request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter response_status = UInt16(429)
Filter: request_method = Utf8("GET"), total records: 53819, after filter: 8886, selectivty: 0.16510897638380498
Filter: request_method != Utf8("GET") OR response_status = UInt16(400) OR service = Utf8("backend"), total records: 53819, after filter: 53819, selectivty: 1
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter response_status > UInt16(0)
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_method = Utf8("POST") AND response_status = UInt16(503)
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_method = Utf8("GET")
  pushdown_rows_filtered: 0
Filter: request_method != Utf8("GET"), total records: 53819, after filter: 44933, selectivty: 0.834891023616195
Filter: response_status = UInt16(429), total records: 53819, after filter: 0, selectivty: 0
  pushdown_rows_filtered: 0
  pushdown_rows_filtered: 53819
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_method != Utf8("GET") OR response_status = UInt16(400) OR service = Utf8("backend")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: container != Utf8("backend_container_0"), total records: 53819, after filter: 15963, selectivty: 0.29660528809528236
Filter: request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 3052, selectivty: 0.05670859733551348
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_method != Utf8("GET")
  pushdown_rows_filtered: 53731
  pushdown_rows_filtered: 50767
test everything ...   pushdown_rows_filtered: 37856
ok
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: container = Utf8("backend_container_0"), total records: 53819, after filter: 37856, selectivty: 0.7033947119047177
Filter: container = Utf8("backend_container_0") OR pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 39982, selectivty: 0.7428974897341087
  pushdown_rows_filtered: 15963
  pushdown_rows_filtered: 13837
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container != Utf8("backend_container_0")
Filter: request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 3052, selectivty: 0.05670859733551348
  pushdown_rows_filtered: 50767
Filter: request_method = Utf8("POST") AND response_status = UInt16(503), total records: 53819, after filter: 1729, selectivty: 0.0321262007841097
  pushdown_rows_filtered: 52090
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_method = Utf8("POST") AND response_status = UInt16(503)
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") OR pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Filter: response_status > UInt16(0), total records: 53819, after filter: 53819, selectivty: 1
  pushdown_rows_filtered: 0
Filter: container != Utf8("backend_container_0"), total records: 53819, after filter: 15963, selectivty: 0.29660528809528236
  pushdown_rows_filtered: 37856
Filter: request_method = Utf8("GET"), total records: 53819, after filter: 8886, selectivty: 0.16510897638380498
  pushdown_rows_filtered: 44933
test dict_non_selective ... ok
Filter: request_method != Utf8("GET") OR response_status = UInt16(400) OR service = Utf8("backend"), total records: 53819, after filter: 53819, selectivty: 1
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_method = Utf8("GET")
Filter: request_method = Utf8("POST") AND response_status = UInt16(503), total records: 53819, after filter: 1729, selectivty: 0.0321262007841097
  pushdown_rows_filtered: 52090
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter response_status = UInt16(503) AND request_method = Utf8("POST")
Filter: request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter response_status > UInt16(0)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 3052, selectivty: 0.05670859733551348
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Filter: container = Utf8("backend_container_0"), total records: 53819, after filter: 37856, selectivty: 0.7033947119047177
  pushdown_rows_filtered: 15963
Filter: request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Filter: container = Utf8("backend_container_0") OR pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 39982, selectivty: 0.7428974897341087
  pushdown_rows_filtered: 13837
Filter: request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Filter: request_method != Utf8("GET"), total records: 53819, after filter: 44933, selectivty: 0.834891023616195
  pushdown_rows_filtered: 8886
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 3052, selectivty: 0.05670859733551348
  pushdown_rows_filtered: 50767
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_method != Utf8("GET") OR response_status = UInt16(400) OR service = Utf8("backend")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Filter: request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
test dict_selective ... ok
Filter: request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
test dict_disjunction ... ok
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 3052, selectivty: 0.05670859733551348
  pushdown_rows_filtered: 50767
test dict_conjunction ... ok
Filter: response_status = UInt16(503) AND request_method = Utf8("POST"), total records: 53819, after filter: 1729, selectivty: 0.0321262007841097
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter response_status = UInt16(503) AND request_method = Utf8("POST")
Filter: request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_method != Utf8("GET")
Filter: request_method = Utf8("GET"), total records: 53819, after filter: 8886, selectivty: 0.16510897638380498
  pushdown_rows_filtered: 44933
Filter: response_status > UInt16(0), total records: 53819, after filter: 53819, selectivty: 1
  pushdown_rows_filtered: 0
test selective ... ok
Filter: container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: request_method != Utf8("GET") OR response_status = UInt16(400) OR service = Utf8("backend"), total records: 53819, after filter: 53819, selectivty: 1
  pushdown_rows_filtered: 0
Filter: response_status = UInt16(503) AND request_method = Utf8("POST"), total records: 53819, after filter: 1729, selectivty: 0.0321262007841097
  pushdown_rows_filtered: 52090
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter response_status = UInt16(503) AND request_method = Utf8("POST")
test nothing ... ok
Filter: container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
test dict_disjunction3 ... ok
Filter: container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000)
Filter: response_status = UInt16(503) AND request_method = Utf8("POST"), total records: 53819, after filter: 1729, selectivty: 0.0321262007841097
  pushdown_rows_filtered: 52090
test basic_conjunction ... ok
Filter: container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000)
Filter: request_method != Utf8("GET"), total records: 53819, after filter: 44933, selectivty: 0.834891023616195
  pushdown_rows_filtered: 8886
test non_selective ... ok
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000)
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000)
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000)
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000)
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0")
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0")
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0")
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0")
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0")
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0")
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
test dict_very_selective ... ok
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
test dict_very_selective2 ... ok

test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 6.65s

@tustvold
Copy link
Contributor

tustvold commented Nov 3, 2022

The use of the weak pointer is effectively making an assumption about test execution order, namely that more than one is executed in parallel and test are run in module order. I'm not hugely comfortable about this.

What is the hesistance to just bundle them into a single test, we don't really need the additional runner parallelism (on my machine at least I have to throttle the parallelism to avoid running out of memory).

@alamb
Copy link
Contributor Author

alamb commented Nov 3, 2022

The use of the weak pointer is effectively making an assumption about test execution order, namely that more than one is executed in parallel and test are run in module order. I'm not hugely comfortable about this.

What is the hesistance to just bundle them into a single test, we don't really need the additional runner parallelism (on my machine at least I have to throttle the parallelism to avoid running out of memory).

For me it was a matter of running the tests more quickly when there is more than 1 core available -- it takes a bit to run them serially and running them concurrently made it faster. I can make a single (single threaded) test if you prefer, but that seems to be like it reduces the speed at which the CI will run 🤔

@alamb
Copy link
Contributor Author

alamb commented Nov 3, 2022

The use of the weak pointer is effectively making an assumption about test execution order, namely that more than one is executed in parallel and test are run in module order. I'm not hugely comfortable about this.

Right -- the weaker pointer is an optimization for this case to avoid recreating the same file multiple times

@tustvold
Copy link
Contributor

tustvold commented Nov 3, 2022

but that seems to be like it reduces the speed at which the CI will run

I sincerely doubt that it will be the tall pole when running test, especially when compilation is so slow. I'd vote for simple first, and if it is an issue then try to be clever

@alamb
Copy link
Contributor Author

alamb commented Nov 3, 2022

I sincerely doubt that it will be the tall pole when running test, especially when compilation is so slow. I'd vote for simple first, and if it is an issue then try to be clever

Right -- I think it is on the order of a few more seconds on my machine

Given you feel so strongly about it I will change the test to be serial

@alamb
Copy link
Contributor Author

alamb commented Nov 3, 2022

😭 running these tests serially takes more 3x longer on my laptop.

Oh well, it is the price of progress I suppose. We can optimize the tests later if/when the people writing them get frustrated with the lack of time

Multi-threaded time on my laptop:

test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 4.94s

Single threaded time:

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 16.62s

@alamb alamb merged commit 695cedc into apache:master Nov 4, 2022
@ursabot
Copy link

ursabot commented Nov 4, 2022

Benchmark runs are scheduled for baseline = 60f3ef6 and contender = 695cedc. 695cedc is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Dandandan pushed a commit to yuuch/arrow-datafusion that referenced this pull request Nov 5, 2022
* parquet filter pushdown correctness tests

* Do not run tests on windows

* Drop shared file after tests are over

* Rework to be single threaded
Ted-Jiang pushed a commit to Ted-Jiang/arrow-datafusion that referenced this pull request Nov 5, 2022
* parquet filter pushdown correctness tests

* Do not run tests on windows

* Drop shared file after tests are over

* Rework to be single threaded
@alamb alamb deleted the alamb/predicate_pushdown_tests branch August 8, 2023 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants