-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-11265: [Rust] Made bool not ArrowNativeType #9212
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9212 +/- ##
=======================================
Coverage 81.61% 81.61%
=======================================
Files 215 215
Lines 51867 51866 -1
=======================================
Hits 42329 42329
+ Misses 9538 9537 -1
Continue to review full report at Codecov.
|
@@ -449,12 +449,10 @@ impl ArrowJsonBatch { | |||
for i in 0..col.len() { | |||
if col.is_null(i) { | |||
validity.push(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated, but unless I'm reading this incorrectly (likely am), I would have thought that we'd validity.push(0)
here for null values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation @jorgecarleitao
I apologize for the delay in merging Rust PRs -- the 3.0 release is being finalized now and are planning to minimize entropy by postponing merging changes not critical for the release until the process was complete. I hope the process is complete in the next few days. There is more discussion in the mailing list |
I did not see this, but had the same idea at around the same time :) This matches now also the distinction between PrimitiveArray and BooleanArray. Another motivation for this change is that this could make |
|
||
/// Trait expressing a Rust type that has the same in-memory representation | ||
/// as Arrow. This includes `i16`, `f32`, but excludes `bool` (which in arrow is represented in bits). | ||
/// In little endian machines, types that implement [`ArrowNativeType`] can be memcopied to arrow buffers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the in-memory format require little-endian? I'd expect conversion to happen when reading data into memory, which should make memcpy work on bot LE and BE machines. Assuming absence of other endianness bugs of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not require it but is little-endian by default.
I recall someone wanting to add big-endian support to the C++ impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, this has always been a source of corner cases (bools being bitpacked) so nice to clean it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jorgecarleitao
/// In little endian machines, types that implement [`ArrowNativeType`] can be memcopied to arrow buffers | ||
/// as is. | ||
pub trait ArrowNativeType: | ||
fmt::Debug + Send + Sync + Copy + PartialOrd + FromStr + Default + JsonSerializable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice approach (breaking out the JsonSerializable
part) 👍
I merged this branch locally into master and re-ran the tests. Things looked good. |
This PR removes the risk of boolean values to be converted to bytes via `ToByteSlice` by explicitly making `ArrowNativeType` be only used in types whose in-memory representation in Rust equates to the in-memory representation in Arrow. `bool` in Rust is a byte and in Arrow it is a bit. Overall, the direction of this PR is to have the traits represent one aspect of the type. In this case, `ArrowNativeType` is currently * a type that has the same in memory representation (ToByteSlice is implemented for it) * a json serializable type * something that can be casted to/from `usize`. This poses a problem because: 1. bools are serializable, not castable to usize, have different memory representation 2. fixed size (iX, uX) are serializable, castable to usize, have the same memory representation 3. fixed floating (f32, f64) are serializable, not castable to usize, have the same memory representation however, they all implement `ArrowNativeType`. This PR focus on splitting the json-serializable part of it. Closes #9212 from jorgecarleitao/fix_trait Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This PR removes the risk of boolean values to be converted to bytes via `ToByteSlice` by explicitly making `ArrowNativeType` be only used in types whose in-memory representation in Rust equates to the in-memory representation in Arrow. `bool` in Rust is a byte and in Arrow it is a bit. Overall, the direction of this PR is to have the traits represent one aspect of the type. In this case, `ArrowNativeType` is currently * a type that has the same in memory representation (ToByteSlice is implemented for it) * a json serializable type * something that can be casted to/from `usize`. This poses a problem because: 1. bools are serializable, not castable to usize, have different memory representation 2. fixed size (iX, uX) are serializable, castable to usize, have the same memory representation 3. fixed floating (f32, f64) are serializable, not castable to usize, have the same memory representation however, they all implement `ArrowNativeType`. This PR focus on splitting the json-serializable part of it. Closes apache#9212 from jorgecarleitao/fix_trait Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This PR removes the risk of boolean values to be converted to bytes via `ToByteSlice` by explicitly making `ArrowNativeType` be only used in types whose in-memory representation in Rust equates to the in-memory representation in Arrow. `bool` in Rust is a byte and in Arrow it is a bit. Overall, the direction of this PR is to have the traits represent one aspect of the type. In this case, `ArrowNativeType` is currently * a type that has the same in memory representation (ToByteSlice is implemented for it) * a json serializable type * something that can be casted to/from `usize`. This poses a problem because: 1. bools are serializable, not castable to usize, have different memory representation 2. fixed size (iX, uX) are serializable, castable to usize, have the same memory representation 3. fixed floating (f32, f64) are serializable, not castable to usize, have the same memory representation however, they all implement `ArrowNativeType`. This PR focus on splitting the json-serializable part of it. Closes apache#9212 from jorgecarleitao/fix_trait Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
This PR removes the risk of boolean values to be converted to bytes via
ToByteSlice
by explicitly makingArrowNativeType
be only used in types whose in-memory representation in Rust equates to the in-memory representation in Arrow.bool
in Rust is a byte and in Arrow it is a bit.Overall, the direction of this PR is to have the traits represent one aspect of the type. In this case,
ArrowNativeType
is currentlyusize
.This poses a problem because:
however, they all implement
ArrowNativeType
.This PR focus on splitting the json-serializable part of it.