-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AvroReader: Add union resolving for nested struct arrays #12686
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution @JonasDev1 -- I started the CI and I left a comment.
@@ -1643,6 +1643,84 @@ mod test { | |||
assert_batches_eq!(expected, &[batch]); | |||
} | |||
|
|||
#[test] | |||
fn test_avro_nullable_struct_array() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this test without the changes in this PR and it still passes 🤔 Thus I don't think it covers the code change
andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ git diff
diff --git a/datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs b/datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs
index 24bc4a61c..8735eac5f 100644
--- a/datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs
+++ b/datafusion/core/src/datasource/avro_to_arrow/arrow_array_reader.rs
@@ -573,7 +573,7 @@ impl<'a, R: Read> AvroArrowArrayReader<'a, R> {
// extract list values, with non-lists converted to Value::Null
let array_item_count = rows
.iter()
- .map(|row| match maybe_resolve_union(row) {
+ .map(|row| match row {
Value::Array(values) => values.len(),
_ => 1,
})
And then I ran
andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ cargo test --lib -p datafusion --all-features -- avro_to_arrow
Finished `test` profile [unoptimized + debuginfo] target(s) in 0.12s
Running unittests src/lib.rs (target/debug/deps/datafusion-3bc79b97e7a0ad1b)
running 16 tests
test datasource::avro_to_arrow::schema::test::test_invalid_avro_schema ... ok
test datasource::avro_to_arrow::schema::test::test_non_record_schema ... ok
test datasource::avro_to_arrow::schema::test::test_alias ... ok
test datasource::avro_to_arrow::schema::test::test_plain_types_schema ... ok
test datasource::avro_to_arrow::schema::test::test_external_props ... ok
test datasource::avro_to_arrow::schema::test::test_nested_schema ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_avro_read_nested_list ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_time_avro_milliseconds ... ok
test datasource::avro_to_arrow::reader::tests::test_avro_basic ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_avro_read_list ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_avro_iterator ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_avro_nullable_struct ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_avro_nullable_struct_array ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_complex_list ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_deep_nullable_struct ... ok
test datasource::avro_to_arrow::arrow_array_reader::test::test_complex_struct ... ok
test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 729 filtered out; finished in 0.00s
andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment. Yes the test rather checks the happy path. I will add a test for the index error tomorrow and update the formating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the test. It now fails without the change:
thread 'datasource::avro_to_arrow::arrow_array_reader::test::test_avro_nullable_struct_array' panicked at /Users/JONSchmi/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-52.2.0/src/util/bit_util.rs:55:5:
index out of bounds: the len is 1 but the index is 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed -- I double checked and it does fail!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @JonasDev1
This looks great to me.
Since I had this PR checkout anyways to test it, I also took the liberty to push a commit that merges with main and fixes |
Which issue does this PR close?
Resolves #12682
What changes are included in this PR?
Nullable values are passed as unions and this requires a resolving of the union to have the correct null_buffer size.
Are these changes tested?
Added a new test for this issue