-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug in Parquet pruning code? #8309
Comments
Added to #8227 I agree this looks like a bug in the pruning code. |
I don't know if this helps, but the min and max values seem to be incorrect in the parquet file (or maybe I have a bug in bdt).
|
nm, it is a bug in bdt |
Here is the corrected version:
|
This still seems incorrect. I would expect min of
|
I used alamb@MacBook-Pro-8:~/Downloads$ parquet-tools meta a7546b6b206d882e928a1325f8cbcce4.parquet
file: file:/Users/alamb/Downloads/a7546b6b206d882e928a1325f8cbcce4.parquet
creator: UrbanLogiq
extra: ARROW:schema = /////+gAAAAQAAAAAAAKAA4ADAALAAQACgAAABQAAAAAAAABBAAKAAwAAAAIAAQACgAAAAgAAAAIAAAAAAAAAAMAAAB8AAAAPAAAAAQAAACg////GAAAACAAAAAAAAACHAAAAAgADAAEAAsACAAAACAAAAAAAAABAAAAAAMAAABhZHQA1P///xQAAAAMAAAAAAAABQwAAAAAAAAAxP///wkAAABkaXJlY3Rpb24AAAAQABQAEAAAAA8ABAAAAAgAEAAAABgAAAAMAAAAAAAABRAAAAAAAAAABAAEAAQAAAAKAAAAdWxfbm9kZV9pZAAA
file schema: arrow_schema
--------------------------------------------------------------------------------
ul_node_id: REQUIRED BINARY L:STRING R:0 D:0
direction: REQUIRED BINARY L:STRING R:0 D:0
adt: REQUIRED INT32 R:0 D:0
row group 1: RC:301 TS:3384 OFFSET:4
--------------------------------------------------------------------------------
ul_node_id: BINARY ZSTD DO:4 FPO:1796 SZ:2143/3187/1.49 VC:301 ENC:RLE_DICTIONARY,RLE,PLAIN ST:[min: /ehIvdei+UGfkQ4Gy5fr1w==, max: zThqpswvY6fa3VHF4BKWfw==, num_nulls not defined]
direction: BINARY ZSTD DO:2243 FPO:2311 SZ:195/177/0.91 VC:301 ENC:RLE_DICTIONARY,RLE,PLAIN ST:[min: Merged, max: Outgoing, num_nulls not defined]
adt: INT32 ZSTD DO:2500 FPO:3159 SZ:1046/1503/1.44 VC:301 ENC:RLE_DICTIONARY,RLE,PLAIN ST:[min: 15, max: 23116, num_nulls not defined] |
|
Describe the bug
I'm running into a weird issue with a datafusion query on a parquet file where if I select with a condition testing certain values, I get no results back.
To Reproduce
For example:
I've checked to see if there's trailing whitespace in the values "Incoming" and "Two Way in the table and there isn't. pandas seems to be able to do similar queries just fine:
The parquet file above is attached.
a7546b6b206d882e928a1325f8cbcce4.parquet.zip
Expected behavior
The query should return values.
Additional context
Andy Grove demonstrated that disabling Parquet pruning causes the query to return the correct values: https://the-asf.slack.com/archives/C01QUFS30TD/p1700671664714069
The text was updated successfully, but these errors were encountered: