Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support cast decimal to signed numeric #1073

Merged
merged 2 commits into from
Dec 22, 2021

Conversation

liukun4515
Copy link
Contributor

Which issue does this PR close?

part of #1043

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 21, 2021
…d decimal to signed numeric type

support decimal to unsigned numeric
@liukun4515 liukun4515 force-pushed the support_decimal_to_signed_numeric branch from d17180f to 070bb40 Compare December 21, 2021 13:38
@liukun4515 liukun4515 marked this pull request as ready for review December 21, 2021 13:39
@liukun4515
Copy link
Contributor Author

@alamb PTAL

@codecov-commenter
Copy link

Codecov Report

Merging #1073 (070bb40) into master (f3e452c) will increase coverage by 0.03%.
The diff coverage is 98.43%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1073      +/-   ##
==========================================
+ Coverage   82.27%   82.31%   +0.03%     
==========================================
  Files         168      168              
  Lines       49281    49376      +95     
==========================================
+ Hits        40547    40643      +96     
+ Misses       8734     8733       -1     
Impacted Files Coverage Δ
arrow/src/compute/kernels/cast.rs 95.13% <98.43%> (+0.14%) ⬆️
parquet/src/encodings/rle.rs 92.70% <0.00%> (ø)
arrow/src/array/transform/mod.rs 84.86% <0.00%> (+0.13%) ⬆️
parquet_derive/src/parquet_field.rs 66.43% <0.00%> (+0.45%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f3e452c...070bb40. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @liukun4515 -- looks good

I also recommend we add some Decimal cases to get_all_types in https://github.com/liukun4515/arrow-rs/blob/support_decimal_to_signed_numeric/arrow/src/compute/kernels/cast.rs#L4350

Perhaps as a follow on PR

// check the overflow
// For example: Decimal(128,10,0) as i8
// 128 is out of range i8
if v <= max_bound && v >= min_bound {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

let div: i128 = 10_i128.pow(*$SCALE as u32);
let min_bound = ($NATIVE_TYPE::MIN) as i128;
let max_bound = ($NATIVE_TYPE::MAX) as i128;
for i in 0..array.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW as a performance optimization in the future, we might be able to avoid the use of a builder and create the arrays directly

So something like

let new_array: Int8Array = array.iter()
.map(|v| v.map(|v| {
let v = v / div;
if v <= max_bound && v >= min_bound {
...
}
}).collect()?;


Or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add follow-up issue to track this enhancement.
#1083

}

// TODO remove this function if the decimal array has the creator function
fn create_decimal_array(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked by #1009

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this

use num::traits::Pow;

macro_rules! generate_cast_test_case {
($INPUT_ARRAY: expr, $INPUT_ARRAY_TYPE: expr, $OUTPUT_TYPE_ARRAY: ident, $OUTPUT_TYPE: expr, $OUTPUT_VALUES: expr) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could avoid having to pass in $INPUT_ARRAY_TYPE by using the data_type() function

like

let input_array_type = $INPUT_ARRAY.data_type()

vec![Some(1_i64), Some(2_i64), Some(3_i64), None, Some(5_i64)]
);

// overflow test: out of range of max i8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Float32Array,
&DataType::Float32,
vec![
Some(1.25_f32),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

let div: i128 = 10_i128.pow(*$SCALE as u32);
let min_bound = ($NATIVE_TYPE::MIN) as i128;
let max_bound = ($NATIVE_TYPE::MAX) as i128;
for i in 0..array.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think the use of Builders is not as fast as constructing arrays using FromIter as the bounds are checked on each element

In the following case, if we could structure the code in the following way

let output_array: $OUTPUT_ARRAY_TYPE = array
  .iter()
  .map(|v| {
    v.map(|v| { 
    // scale the value v
    })
  }).collect()

It may be significantly faster.

Perhaps a good follow on PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
I will do this in the follow-up PR.
Maybe I need also to add some micro benchmark to get the result of performance improvement.

@alamb
Copy link
Contributor

alamb commented Dec 21, 2021

It appears this PR has a clippy error as well now

@@ -108,6 +112,8 @@ pub fn can_cast_types(from_type: &DataType, to_type: &DataType) -> bool {
| Dictionary(_, _),
Null,
) => true,
(Decimal(_, _), _) => false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All data types expect above signed numeric data type, the result will be false with decimal.

None,
Some(525),
Some(112345678),
Some(112345679),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case is used to test the float32 excessive precision

Some(325),
None,
Some(525),
Some(112345678901234568),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is used to test float64 excessive precision

@houqp houqp added the enhancement Any new improvement worthy of a entry in the changelog label Dec 22, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @liukun4515

@alamb alamb merged commit 8f41a07 into apache:master Dec 22, 2021
alamb pushed a commit that referenced this pull request Dec 22, 2021
* add cast test macro function; refactor other type to decimal type; add decimal to signed numeric type
support decimal to unsigned numeric

* address the comments and fix the clippy
alamb added a commit that referenced this pull request Dec 22, 2021
* add cast test macro function; refactor other type to decimal type; add decimal to signed numeric type
support decimal to unsigned numeric

* address the comments and fix the clippy

Co-authored-by: Kun Liu <liukun@apache.org>
@liukun4515
Copy link
Contributor Author

thanks for your review.
@houqp @alamb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants