-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: omit min/max for interval columns when writing stats #5147
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me
What ColumnOrder are we currently writing for these columns? |
I'm not sure, actually. I tried running this test in arrow_writer/mod.rs on master branch: #[test]
fn test_123() {
let a = Int32Array::from(vec![1, 2, 3, 4, 5]);
let b = IntervalDayTimeArray::from(vec![0; 5]);
let batch = RecordBatch::try_from_iter(vec![
("a", Arc::new(a) as ArrayRef),
("b", Arc::new(b) as ArrayRef),
])
.unwrap();
let mut buf = Vec::with_capacity(1024);
let mut writer = ArrowWriter::try_new(&mut buf, batch.schema(), None).unwrap();
writer.write(&batch).unwrap();
writer.close().unwrap();
let bytes = Bytes::from(buf);
let options = ReadOptionsBuilder::new().with_page_index().build();
let reader = SerializedFileReader::new_with_options(bytes, options).unwrap();
dbg!(reader.metadata().file_metadata().column_orders());
} Running: arrow-rs$ cargo test -p parquet --lib arrow::arrow_writer::tests::test_123 -- --nocapture --exact
Blocking waiting for file lock on build directory
Compiling parquet v49.0.0 (/home/jeffrey/Code/arrow-rs/parquet)
Finished test [unoptimized + debuginfo] target(s) in 11.49s
Running unittests src/lib.rs (/media/jeffrey/1tb_860evo_ssd/.cargo_target_cache/debug/deps/parquet-a4f7a499e85a325c)
running 1 test
[parquet/src/arrow/arrow_writer/mod.rs:2760] reader.metadata().file_metadata().column_orders() = None
test arrow::arrow_writer::tests::test_123 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 667 filtered out; finished in 0.00s Even when I change it to only write the Int32Array, it is still none. Not sure if I'm doing something wrong here? |
I noticed this: arrow-rs/parquet/src/file/writer.rs Lines 326 to 336 in 6d4b8bb
Looks like might be a separate issue, to implement writing ColumnOrder |
Raised #5152 for the column order issue |
Which issue does this PR close?
Closes #5145
Rationale for this change
What changes are included in this PR?
Add extra checks before calculating min/max for chunks/pages, to ignore Interval columns
Are there any user-facing changes?