-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support cast decimal to signed numeric #1073
support cast decimal to signed numeric #1073
Conversation
…d decimal to signed numeric type support decimal to unsigned numeric
d17180f
to
070bb40
Compare
@alamb PTAL |
Codecov Report
@@ Coverage Diff @@
## master #1073 +/- ##
==========================================
+ Coverage 82.27% 82.31% +0.03%
==========================================
Files 168 168
Lines 49281 49376 +95
==========================================
+ Hits 40547 40643 +96
+ Misses 8734 8733 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @liukun4515 -- looks good
I also recommend we add some Decimal cases to get_all_types
in https://github.com/liukun4515/arrow-rs/blob/support_decimal_to_signed_numeric/arrow/src/compute/kernels/cast.rs#L4350
Perhaps as a follow on PR
// check the overflow | ||
// For example: Decimal(128,10,0) as i8 | ||
// 128 is out of range i8 | ||
if v <= max_bound && v >= min_bound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
let div: i128 = 10_i128.pow(*$SCALE as u32); | ||
let min_bound = ($NATIVE_TYPE::MIN) as i128; | ||
let max_bound = ($NATIVE_TYPE::MAX) as i128; | ||
for i in 0..array.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW as a performance optimization in the future, we might be able to avoid the use of a builder
and create the arrays directly
So something like
let new_array: Int8Array = array.iter()
.map(|v| v.map(|v| {
let v = v / div;
if v <= max_bound && v >= min_bound {
...
}
}).collect()?;
Or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add follow-up issue to track this enhancement.
#1083
} | ||
|
||
// TODO remove this function if the decimal array has the creator function | ||
fn create_decimal_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked by #1009
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember this
arrow/src/compute/kernels/cast.rs
Outdated
use num::traits::Pow; | ||
|
||
macro_rules! generate_cast_test_case { | ||
($INPUT_ARRAY: expr, $INPUT_ARRAY_TYPE: expr, $OUTPUT_TYPE_ARRAY: ident, $OUTPUT_TYPE: expr, $OUTPUT_VALUES: expr) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could avoid having to pass in $INPUT_ARRAY_TYPE
by using the data_type()
function
like
let input_array_type = $INPUT_ARRAY.data_type()
vec![Some(1_i64), Some(2_i64), Some(3_i64), None, Some(5_i64)] | ||
); | ||
|
||
// overflow test: out of range of max i8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Float32Array, | ||
&DataType::Float32, | ||
vec![ | ||
Some(1.25_f32), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
let div: i128 = 10_i128.pow(*$SCALE as u32); | ||
let min_bound = ($NATIVE_TYPE::MIN) as i128; | ||
let max_bound = ($NATIVE_TYPE::MAX) as i128; | ||
for i in 0..array.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think the use of Builders is not as fast as constructing arrays using FromIter
as the bounds are checked on each element
In the following case, if we could structure the code in the following way
let output_array: $OUTPUT_ARRAY_TYPE = array
.iter()
.map(|v| {
v.map(|v| {
// scale the value v
})
}).collect()
It may be significantly faster.
Perhaps a good follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I will do this in the follow-up PR.
Maybe I need also to add some micro benchmark to get the result of performance improvement.
It appears this PR has a clippy error as well now |
@@ -108,6 +112,8 @@ pub fn can_cast_types(from_type: &DataType, to_type: &DataType) -> bool { | |||
| Dictionary(_, _), | |||
Null, | |||
) => true, | |||
(Decimal(_, _), _) => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All data types expect above signed numeric data type, the result will be false with decimal.
None, | ||
Some(525), | ||
Some(112345678), | ||
Some(112345679), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this case is used to test the float32 excessive precision
Some(325), | ||
None, | ||
Some(525), | ||
Some(112345678901234568), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is used to test float64 excessive precision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @liukun4515
* add cast test macro function; refactor other type to decimal type; add decimal to signed numeric type support decimal to unsigned numeric * address the comments and fix the clippy
Which issue does this PR close?
part of #1043
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?