Correctness integration test for parquet filter pushdown #3976

alamb · 2022-10-26T19:31:04Z

Which issue does this PR close?

Part of #3463

Rationale for this change

I want to avoid tricky wrong results bugs related to applying predicates during the parquet scans. It also found at least four bugs (see list below).

What changes are included in this PR?

Add a new parquet_predicate_pushdown integration test
Refactor common code out of parquet_predicate_pushdown performance test

In case you are curious, and want to play around, here is the generated parquet file: data.zip

This test creates a parquet file and then

Applies the same filter with various parquet filter options
Verifies the results are the same

Are there any user-facing changes?

No

Future items

I will file a ticket for some additional testing

Do more testing / verification that this filtering is being tested across pages (needs Add Page Row Count Limit arrow-rs#2941 from arrow)
Add additional coverage of parquet data layouts (so the rows fall in different layouts)
Add more expressions (especially ones that have various mixes of data overlap)
Test with subsets of columns (where the predicates need to work on files that don't have all the columns

Draft until:

alamb · 2022-10-28T16:11:26Z

benchmarks/src/bin/parquet_filter_pushdown.rs

@@ -263,251 +182,15 @@ fn gen_data(
    scale_factor: f32,
    page_size: Option<usize>,
    row_group_size: Option<usize>,
-) -> Result<(ObjectStoreUrl, ObjectMeta)> {


All this code was moved into parquet-test-utils/src/lib.rs so i could reuse it

benchmarks/src/bin/parquet_filter_pushdown.rs

alamb · 2022-10-31T14:29:49Z

Ok, I am pretty happy with the state of pushdown predicates with this test -- I think we are go for launch

alamb · 2022-11-01T16:26:50Z

datafusion/core/tests/parquet_filter_pushdown.rs

+        .await
+}
+
+#[cfg(not(target_family = "windows"))]


I am skipping windows for now as these tests fail with some sort of path issue I don't have time to debug. Since there is no logic difference on linux/windows/mac there is no reason this also needs to be run on windows

Example failure:
https://github.com/apache/arrow-datafusion/actions/runs/3369837748/jobs/5589984001

---- basic_conjunction stdout ---- Writing test data to "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\.tmpDGlfus\\data.parquet" Generated test dataset with 53819 rows thread 'basic_conjunction' panicked at 'Error writing data: IO error: The filename, directory name, or volume label syntax is incorrect. (os error 123)', datafusion\core\tests\parquet_filter_pushdown.rs:66:17 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

alamb · 2022-11-01T20:57:59Z

This is finally ready for review ! cc @Ted-Jiang @tustvold and @thinkharderdev

tustvold

I'm approving, but I think we should fix the temporary file handling prior to merge. I already have severe memory issues running the DataFusion tests on my laptop, leaking large files will make this exponentially worse.

Perhaps we could merge the cases into a single test to avoid needing to use lazy_static?

tustvold · 2022-11-02T19:48:50Z

datafusion/core/tests/parquet_filter_pushdown.rs

+
+// Only create the parquet file once as it is fairly large
+lazy_static! {
+    static ref TEMPDIR: TempDir = TempDir::new().unwrap();


I could be mistaken but I believe static destructors are not run, so this will leak the file. This is likely not ideal... Particularly on linux machines where /tmp is typically backed by memory

tustvold · 2022-11-02T19:49:25Z

datafusion/core/tests/parquet_filter_pushdown.rs

+
+        assert_eq!(no_pushdown, pushdown_and_reordering);
+
+        // page index filtering is not correct:


I think this can be enabled now?

I agree -- done in #4062

…hdown_tests

alamb · 2022-11-03T14:28:12Z

I could be mistaken but I believe static destructors are not run, so this will leak the file. This is likely not ideal... Particularly on linux machines where /tmp is typically backed by memory

I think you are correct . I added be746ac to drop the temp file but yet still only created it once. Can you give it a look?

I tested this code by running

cargo test -p datafusion --test parquet_filter_pushdown -- --nocapture

And you can see the file is created once :

...
Writing test data to "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
...

And then cleaned up afterwards

$ ls /var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet
ls: /var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet: No such file or directory

Full logs:

running 12 tests
Writing test data to "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
Generated test dataset with 53819 rows
Completed generating test data in 2.498437474s
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container != Utf8("backend_container_0")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") OR pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter response_status = UInt16(429)
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_method = Utf8("POST") AND response_status = UInt16(503)
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter response_status > UInt16(0)
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
Using pre-existing file
  reading with filter request_method = Utf8("GET")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_method != Utf8("GET") OR response_status = UInt16(400) OR service = Utf8("backend")
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_method != Utf8("GET")
Filter: container != Utf8("backend_container_0"), total records: 53819, after filter: 15963, selectivty: 0.29660528809528236
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container != Utf8("backend_container_0")
Filter: container = Utf8("backend_container_0") OR pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 39982, selectivty: 0.7428974897341087
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") OR pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: container = Utf8("backend_container_0"), total records: 53819, after filter: 37856, selectivty: 0.7033947119047177
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0")
Filter: request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 3052, selectivty: 0.05670859733551348
Filter: response_status = UInt16(429), total records: 53819, after filter: 0, selectivty: 0
  pushdown_rows_filtered: 0
  pushdown_rows_filtered: 0
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter response_status = UInt16(429)
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: request_method = Utf8("POST") AND response_status = UInt16(503), total records: 53819, after filter: 1729, selectivty: 0.0321262007841097
Filter: response_status = UInt16(429), total records: 53819, after filter: 0, selectivty: 0
Filter: response_status > UInt16(0), total records: 53819, after filter: 53819, selectivty: 1
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
  pushdown_rows_filtered: 0
  pushdown_rows_filtered: 0
  pushdown_rows_filtered: 53819
Filter: request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter response_status = UInt16(429)
Filter: request_method = Utf8("GET"), total records: 53819, after filter: 8886, selectivty: 0.16510897638380498
Filter: request_method != Utf8("GET") OR response_status = UInt16(400) OR service = Utf8("backend"), total records: 53819, after filter: 53819, selectivty: 1
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter response_status > UInt16(0)
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_method = Utf8("POST") AND response_status = UInt16(503)
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_method = Utf8("GET")
  pushdown_rows_filtered: 0
Filter: request_method != Utf8("GET"), total records: 53819, after filter: 44933, selectivty: 0.834891023616195
Filter: response_status = UInt16(429), total records: 53819, after filter: 0, selectivty: 0
  pushdown_rows_filtered: 0
  pushdown_rows_filtered: 53819
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_method != Utf8("GET") OR response_status = UInt16(400) OR service = Utf8("backend")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: container != Utf8("backend_container_0"), total records: 53819, after filter: 15963, selectivty: 0.29660528809528236
Filter: request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 3052, selectivty: 0.05670859733551348
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_method != Utf8("GET")
  pushdown_rows_filtered: 53731
  pushdown_rows_filtered: 50767
test everything ...   pushdown_rows_filtered: 37856
ok
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: container = Utf8("backend_container_0"), total records: 53819, after filter: 37856, selectivty: 0.7033947119047177
Filter: container = Utf8("backend_container_0") OR pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 39982, selectivty: 0.7428974897341087
  pushdown_rows_filtered: 15963
  pushdown_rows_filtered: 13837
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container != Utf8("backend_container_0")
Filter: request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 3052, selectivty: 0.05670859733551348
  pushdown_rows_filtered: 50767
Filter: request_method = Utf8("POST") AND response_status = UInt16(503), total records: 53819, after filter: 1729, selectivty: 0.0321262007841097
  pushdown_rows_filtered: 52090
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_method = Utf8("POST") AND response_status = UInt16(503)
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") OR pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Filter: response_status > UInt16(0), total records: 53819, after filter: 53819, selectivty: 1
  pushdown_rows_filtered: 0
Filter: container != Utf8("backend_container_0"), total records: 53819, after filter: 15963, selectivty: 0.29660528809528236
  pushdown_rows_filtered: 37856
Filter: request_method = Utf8("GET"), total records: 53819, after filter: 8886, selectivty: 0.16510897638380498
  pushdown_rows_filtered: 44933
test dict_non_selective ... ok
Filter: request_method != Utf8("GET") OR response_status = UInt16(400) OR service = Utf8("backend"), total records: 53819, after filter: 53819, selectivty: 1
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_method = Utf8("GET")
Filter: request_method = Utf8("POST") AND response_status = UInt16(503), total records: 53819, after filter: 1729, selectivty: 0.0321262007841097
  pushdown_rows_filtered: 52090
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter response_status = UInt16(503) AND request_method = Utf8("POST")
Filter: request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter response_status > UInt16(0)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 3052, selectivty: 0.05670859733551348
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Filter: container = Utf8("backend_container_0"), total records: 53819, after filter: 37856, selectivty: 0.7033947119047177
  pushdown_rows_filtered: 15963
Filter: request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Filter: container = Utf8("backend_container_0") OR pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 39982, selectivty: 0.7428974897341087
  pushdown_rows_filtered: 13837
Filter: request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Filter: request_method != Utf8("GET"), total records: 53819, after filter: 44933, selectivty: 0.834891023616195
  pushdown_rows_filtered: 8886
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 3052, selectivty: 0.05670859733551348
  pushdown_rows_filtered: 50767
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_method != Utf8("GET") OR response_status = UInt16(400) OR service = Utf8("backend")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
Filter: request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0")
test dict_selective ... ok
Filter: request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
test dict_disjunction ... ok
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 3052, selectivty: 0.05670859733551348
  pushdown_rows_filtered: 50767
test dict_conjunction ... ok
Filter: response_status = UInt16(503) AND request_method = Utf8("POST"), total records: 53819, after filter: 1729, selectivty: 0.0321262007841097
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter response_status = UInt16(503) AND request_method = Utf8("POST")
Filter: request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter request_method != Utf8("GET")
Filter: request_method = Utf8("GET"), total records: 53819, after filter: 8886, selectivty: 0.16510897638380498
  pushdown_rows_filtered: 44933
Filter: response_status > UInt16(0), total records: 53819, after filter: 53819, selectivty: 1
  pushdown_rows_filtered: 0
test selective ... ok
Filter: container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: request_method != Utf8("GET") OR response_status = UInt16(400) OR service = Utf8("backend"), total records: 53819, after filter: 53819, selectivty: 1
  pushdown_rows_filtered: 0
Filter: response_status = UInt16(503) AND request_method = Utf8("POST"), total records: 53819, after filter: 1729, selectivty: 0.0321262007841097
  pushdown_rows_filtered: 52090
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter response_status = UInt16(503) AND request_method = Utf8("POST")
test nothing ... ok
Filter: container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
Filter: container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg")
test dict_disjunction3 ... ok
Filter: container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000)
Filter: response_status = UInt16(503) AND request_method = Utf8("POST"), total records: 53819, after filter: 1729, selectivty: 0.0321262007841097
  pushdown_rows_filtered: 52090
test basic_conjunction ... ok
Filter: container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000) AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000)
Filter: request_method != Utf8("GET"), total records: 53819, after filter: 44933, selectivty: 0.834891023616195
  pushdown_rows_filtered: 8886
test non_selective ... ok
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000)
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000)
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000)
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000)
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0")
Filter: container = Utf8("backend_container_0") AND pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0")
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0")
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0")
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0")
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0")
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND request_bytes > Int32(2000000000) AND container = Utf8("backend_container_0"), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Using pre-existing file
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: false, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 0
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: false, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
Querying "/var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/.tmpeaNruZ/data.parquet"
  scan options: ParquetScanOptions { pushdown_filters: true, reorder_filters: true, enable_page_index: false }
  reading with filter pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000)
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
test dict_very_selective ... ok
Filter: pod = Utf8("aqcathnxqsphdhgjtgvxsfyiwbmhlmg") AND container = Utf8("backend_container_0") AND request_bytes > Int32(2000000000), total records: 53819, after filter: 88, selectivty: 0.001635110277039707
  pushdown_rows_filtered: 53731
test dict_very_selective2 ... ok

test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 6.65s

tustvold · 2022-11-03T19:13:38Z

The use of the weak pointer is effectively making an assumption about test execution order, namely that more than one is executed in parallel and test are run in module order. I'm not hugely comfortable about this.

What is the hesistance to just bundle them into a single test, we don't really need the additional runner parallelism (on my machine at least I have to throttle the parallelism to avoid running out of memory).

alamb · 2022-11-03T19:29:46Z

The use of the weak pointer is effectively making an assumption about test execution order, namely that more than one is executed in parallel and test are run in module order. I'm not hugely comfortable about this.

What is the hesistance to just bundle them into a single test, we don't really need the additional runner parallelism (on my machine at least I have to throttle the parallelism to avoid running out of memory).

For me it was a matter of running the tests more quickly when there is more than 1 core available -- it takes a bit to run them serially and running them concurrently made it faster. I can make a single (single threaded) test if you prefer, but that seems to be like it reduces the speed at which the CI will run 🤔

alamb · 2022-11-03T19:30:16Z

The use of the weak pointer is effectively making an assumption about test execution order, namely that more than one is executed in parallel and test are run in module order. I'm not hugely comfortable about this.

Right -- the weaker pointer is an optimization for this case to avoid recreating the same file multiple times

tustvold · 2022-11-03T19:31:55Z

but that seems to be like it reduces the speed at which the CI will run

I sincerely doubt that it will be the tall pole when running test, especially when compilation is so slow. I'd vote for simple first, and if it is an issue then try to be clever

alamb · 2022-11-03T19:40:54Z

I sincerely doubt that it will be the tall pole when running test, especially when compilation is so slow. I'd vote for simple first, and if it is an issue then try to be clever

Right -- I think it is on the order of a few more seconds on my machine

Given you feel so strongly about it I will change the test to be serial

…hdown_tests

alamb · 2022-11-03T20:14:35Z

😭 running these tests serially takes more 3x longer on my laptop.

Oh well, it is the price of progress I suppose. We can optimize the tests later if/when the people writing them get frustrated with the lack of time

Multi-threaded time on my laptop:

test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 4.94s

Single threaded time:

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 16.62s

ursabot · 2022-11-04T15:11:09Z

Benchmark runs are scheduled for baseline = 60f3ef6 and contender = 695cedc. 695cedc is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

* parquet filter pushdown correctness tests * Do not run tests on windows * Drop shared file after tests are over * Rework to be single threaded

github-actions bot added the core Core DataFusion crate label Oct 26, 2022

alamb force-pushed the alamb/predicate_pushdown_tests branch 2 times, most recently from 2ca7a31 to 1faf508 Compare October 28, 2022 16:11

alamb mentioned this pull request Oct 28, 2022

Wrong results when parquet page index filtering is enabled #4002

Closed

alamb changed the title ~~Add integration test for parquet predicate pushdown~~ Add correctness integration test for parquet predicate pushdown Oct 28, 2022

This was referenced Oct 28, 2022

Add parquet predicate pushdown metrics #3989

Merged

Incorrect results with parquet filtering pushdown enabled #4005

Closed

alamb commented Oct 28, 2022

View reviewed changes

alamb mentioned this pull request Oct 28, 2022

Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate:" #4006

Closed

alamb commented Oct 28, 2022

View reviewed changes

benchmarks/src/bin/parquet_filter_pushdown.rs Outdated Show resolved Hide resolved

alamb mentioned this pull request Oct 31, 2022

Support pushdown multi-columns in PageIndex pruning. #3967

Merged

alamb changed the title ~~Add correctness integration test for parquet predicate pushdown~~ Correctness integration test for parquet predicate pushdown Oct 31, 2022

alamb changed the title ~~Correctness integration test for parquet predicate pushdown~~ Correctness integration test for parquet filter pushdown Oct 31, 2022

alamb force-pushed the alamb/predicate_pushdown_tests branch from 3ab17f3 to 43af7a9 Compare October 31, 2022 13:07

alamb mentioned this pull request Oct 31, 2022

Extract common parquet testing code to parquet-test-util crate #4042

Merged

alamb force-pushed the alamb/predicate_pushdown_tests branch 5 times, most recently from 653a8ef to 644bdf5 Compare October 31, 2022 14:29

This was referenced Oct 31, 2022

Enable parquet filter pushdown by default #3463

Open

Another Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate: #4046

Closed

Fix multicolumn parquet predicate pushdown (#4046) #4048

Merged

alamb force-pushed the alamb/predicate_pushdown_tests branch 2 times, most recently from 3176d02 to 585c85b Compare November 1, 2022 13:25

parquet filter pushdown correctness tests

c1b42f2

alamb force-pushed the alamb/predicate_pushdown_tests branch from 585c85b to c1b42f2 Compare November 1, 2022 13:38

Do not run tests on windows

62efa29

alamb commented Nov 1, 2022

View reviewed changes

alamb mentioned this pull request Nov 1, 2022

Enable tests for page index filtering in parquet filter pushdown test #4062

Merged

3 tasks

alamb marked this pull request as ready for review November 1, 2022 20:57

This was referenced Nov 2, 2022

[EPIC] Parquet filter pushdown into scan #3462

Open

Add additional testing to parquet predicate pushdown integration tests #4087

Closed

alamb requested review from tustvold, Ted-Jiang and thinkharderdev November 2, 2022 19:16

tustvold approved these changes Nov 2, 2022

View reviewed changes

alamb added 2 commits November 3, 2022 10:02

Merge remote-tracking branch 'apache/master' into alamb/predicate_pus…

df521fd

…hdown_tests

Drop shared file after tests are over

be746ac

alamb added 2 commits November 3, 2022 16:03

Rework to be single threaded

89003c0

Merge remote-tracking branch 'apache/master' into alamb/predicate_pus…

d363072

…hdown_tests

alamb merged commit 695cedc into apache:master Nov 4, 2022

alamb deleted the alamb/predicate_pushdown_tests branch August 8, 2023 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctness integration test for parquet filter pushdown #3976

Correctness integration test for parquet filter pushdown #3976

alamb commented Oct 26, 2022 •

edited

Loading

alamb Oct 28, 2022

alamb commented Oct 31, 2022

alamb Nov 1, 2022 •

edited

Loading

alamb commented Nov 1, 2022

tustvold left a comment

tustvold Nov 2, 2022

tustvold Nov 2, 2022

alamb Nov 3, 2022 •

edited

Loading

alamb commented Nov 3, 2022

tustvold commented Nov 3, 2022

alamb commented Nov 3, 2022

alamb commented Nov 3, 2022

tustvold commented Nov 3, 2022 •

edited

Loading

alamb commented Nov 3, 2022

alamb commented Nov 3, 2022

ursabot commented Nov 4, 2022


		assert_eq!(no_pushdown, pushdown_and_reordering);

		// page index filtering is not correct:

Correctness integration test for parquet filter pushdown #3976

Correctness integration test for parquet filter pushdown #3976

Conversation

alamb commented Oct 26, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Future items

alamb Oct 28, 2022

Choose a reason for hiding this comment

alamb commented Oct 31, 2022

alamb Nov 1, 2022 • edited Loading

Choose a reason for hiding this comment

alamb commented Nov 1, 2022

tustvold left a comment

Choose a reason for hiding this comment

tustvold Nov 2, 2022

Choose a reason for hiding this comment

tustvold Nov 2, 2022

Choose a reason for hiding this comment

alamb Nov 3, 2022 • edited Loading

Choose a reason for hiding this comment

alamb commented Nov 3, 2022

tustvold commented Nov 3, 2022

alamb commented Nov 3, 2022

alamb commented Nov 3, 2022

tustvold commented Nov 3, 2022 • edited Loading

alamb commented Nov 3, 2022

alamb commented Nov 3, 2022

ursabot commented Nov 4, 2022

alamb commented Oct 26, 2022 •

edited

Loading

alamb Nov 1, 2022 •

edited

Loading

alamb Nov 3, 2022 •

edited

Loading

tustvold commented Nov 3, 2022 •

edited

Loading