-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader #9253
Conversation
The Int96 timestamp was not using the specialised timestamp builder that takes the timezone as a paramenter. This changes that to use the builder that preserves timezones. I tested this change with the test file provided in the JIRA. It looks like we don't have a way of writing int96 from the arrow writer, so there isn't an easy way to add a testcase.
Codecov Report
@@ Coverage Diff @@
## master #9253 +/- ##
=======================================
Coverage 81.61% 81.61%
=======================================
Files 215 215
Lines 51867 51896 +29
=======================================
+ Hits 42329 42357 +28
- Misses 9538 9539 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - it will be nice to a have a test case but I understand the difficulty.
} | ||
|
||
Ok(builder.finish()) | ||
Ok(TimestampNanosecondArray::from_opt_vec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to change other converters to use this pattern as well? Also I'm not sure what the performance looks like with this new approach though - seems it needs to allocate extra memory for the intermediate Vec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've found the builder pattern to be slower than allocating vecs. There's FromIter
for PrimitiveArray, but no equivalent for TimestampArray::from_opt_vec
. I've filed ARROW-11312 to address this.
On the other field types, we don't use Array builder there, but use ArrayData::builder(timestamp_with_timezone)
. So they don't suffer from the same limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Good to know. Thanks.
Tnanks @sunchao, is it worthwhile to support writing int96 with the arrow writers? It's deprecated, and version 2.6.0 of the format introduces TIMESTAMP_NANOS, so when we support 2.6.0, users will have the ability to write those timestamps. |
@nevi-me yes agreed - I think we shouldn't support writing to int96 from arrow for the reasons you listed. |
I guess this will be merged after 3.0 is released? |
The Int96 timestamp was not using the specialised timestamp builder that takes the timezone as a paramenter. This changes that to use the builder that preserves timezones. I tested this change with the test file provided in the JIRA. It looks like we don't have a way of writing int96 from the arrow writer, so there isn't an easy way to add a testcase. Closes #9253 from nevi-me/ARROW-11269 Authored-by: Neville Dipale <nevilledips@gmail.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>
The Int96 timestamp was not using the specialised timestamp builder that takes the timezone as a paramenter. This changes that to use the builder that preserves timezones. I tested this change with the test file provided in the JIRA. It looks like we don't have a way of writing int96 from the arrow writer, so there isn't an easy way to add a testcase. Closes apache#9253 from nevi-me/ARROW-11269 Authored-by: Neville Dipale <nevilledips@gmail.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>
The Int96 timestamp was not using the specialised timestamp builder that takes the timezone as a paramenter. This changes that to use the builder that preserves timezones. I tested this change with the test file provided in the JIRA. It looks like we don't have a way of writing int96 from the arrow writer, so there isn't an easy way to add a testcase. Closes apache#9253 from nevi-me/ARROW-11269 Authored-by: Neville Dipale <nevilledips@gmail.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>
The Int96 timestamp was not using the specialised timestamp builder that takes the timezone as a paramenter.
This changes that to use the builder that preserves timezones.
I tested this change with the test file provided in the JIRA.
It looks like we don't have a way of writing int96 from the arrow writer, so there isn't an easy way to add a testcase.