Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11269: [Rust] [Parquet] Preserve timezone in int96 reader #9253

Closed
wants to merge 1 commit into from

Conversation

nevi-me
Copy link
Contributor

@nevi-me nevi-me commented Jan 18, 2021

The Int96 timestamp was not using the specialised timestamp builder that takes the timezone as a paramenter.
This changes that to use the builder that preserves timezones.

I tested this change with the test file provided in the JIRA.
It looks like we don't have a way of writing int96 from the arrow writer, so there isn't an easy way to add a testcase.

The Int96 timestamp was not using the specialised timestamp builder that takes the timezone as a paramenter.
This changes that to use the builder that preserves timezones.

I tested this change with the test file provided in the JIRA.
It looks like we don't have a way of writing int96 from the arrow writer, so there isn't an easy way to add a testcase.
@github-actions
Copy link

@nevi-me
Copy link
Contributor Author

nevi-me commented Jan 18, 2021

CC @maxburke @mcassels

@nevi-me nevi-me requested review from sunchao and andygrove January 18, 2021 22:44
@codecov-io
Copy link

Codecov Report

Merging #9253 (5938e70) into master (1393188) will increase coverage by 0.00%.
The diff coverage is 95.12%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #9253   +/-   ##
=======================================
  Coverage   81.61%   81.61%           
=======================================
  Files         215      215           
  Lines       51867    51896   +29     
=======================================
+ Hits        42329    42357   +28     
- Misses       9538     9539    +1     
Impacted Files Coverage Δ
rust/parquet/src/arrow/array_reader.rs 71.54% <83.33%> (+0.14%) ⬆️
rust/parquet/src/arrow/converter.rs 73.04% <83.33%> (ø)
rust/parquet/src/arrow/schema.rs 91.66% <100.00%> (+0.16%) ⬆️
rust/parquet/src/encodings/encoding.rs 94.86% <0.00%> (-0.20%) ⬇️
rust/arrow/src/array/transform/fixed_binary.rs 84.21% <0.00%> (+5.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69a9a1c...5938e70. Read the comment docs.

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - it will be nice to a have a test case but I understand the difficulty.

}

Ok(builder.finish())
Ok(TimestampNanosecondArray::from_opt_vec(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to change other converters to use this pattern as well? Also I'm not sure what the performance looks like with this new approach though - seems it needs to allocate extra memory for the intermediate Vec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've found the builder pattern to be slower than allocating vecs. There's FromIter for PrimitiveArray, but no equivalent for TimestampArray::from_opt_vec. I've filed ARROW-11312 to address this.

On the other field types, we don't use Array builder there, but use ArrayData::builder(timestamp_with_timezone). So they don't suffer from the same limitation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Good to know. Thanks.

@nevi-me
Copy link
Contributor Author

nevi-me commented Jan 19, 2021

Tnanks @sunchao, is it worthwhile to support writing int96 with the arrow writers? It's deprecated, and version 2.6.0 of the format introduces TIMESTAMP_NANOS, so when we support 2.6.0, users will have the ability to write those timestamps.

@sunchao
Copy link
Member

sunchao commented Jan 19, 2021

@nevi-me yes agreed - I think we shouldn't support writing to int96 from arrow for the reasons you listed.

@maxburke
Copy link
Contributor

I guess this will be merged after 3.0 is released?

@nevi-me nevi-me closed this in 555643a Jan 20, 2021
kszucs pushed a commit that referenced this pull request Jan 25, 2021
The Int96 timestamp was not using the specialised timestamp builder that takes the timezone as a paramenter.
This changes that to use the builder that preserves timezones.

I tested this change with the test file provided in the JIRA.
It looks like we don't have a way of writing int96 from the arrow writer, so there isn't an easy way to add a testcase.

Closes #9253 from nevi-me/ARROW-11269

Authored-by: Neville Dipale <nevilledips@gmail.com>
Signed-off-by: Neville Dipale <nevilledips@gmail.com>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
The Int96 timestamp was not using the specialised timestamp builder that takes the timezone as a paramenter.
This changes that to use the builder that preserves timezones.

I tested this change with the test file provided in the JIRA.
It looks like we don't have a way of writing int96 from the arrow writer, so there isn't an easy way to add a testcase.

Closes apache#9253 from nevi-me/ARROW-11269

Authored-by: Neville Dipale <nevilledips@gmail.com>
Signed-off-by: Neville Dipale <nevilledips@gmail.com>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
The Int96 timestamp was not using the specialised timestamp builder that takes the timezone as a paramenter.
This changes that to use the builder that preserves timezones.

I tested this change with the test file provided in the JIRA.
It looks like we don't have a way of writing int96 from the arrow writer, so there isn't an easy way to add a testcase.

Closes apache#9253 from nevi-me/ARROW-11269

Authored-by: Neville Dipale <nevilledips@gmail.com>
Signed-off-by: Neville Dipale <nevilledips@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants