Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11221: [Rust] DF Implement GROUP BY support for Float32/Float64 #9175

Closed
wants to merge 1 commit into from

Conversation

ovr
Copy link
Contributor

@ovr ovr commented Jan 12, 2021

Rust doesn't provide Eq, Hash for f32/f64 types inside stdlib, it's why I am using an external library called ordered-float which implements this traits. It's better to use external library instead of implementing own inside this repository.

@github-actions
Copy link

@codecov-io
Copy link

codecov-io commented Jan 12, 2021

Codecov Report

Merging #9175 (f8da961) into master (6da7718) will increase coverage by 0.02%.
The diff coverage is 85.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9175      +/-   ##
==========================================
+ Coverage   81.55%   81.58%   +0.02%     
==========================================
  Files         215      215              
  Lines       51600    51716     +116     
==========================================
+ Hits        42084    42191     +107     
- Misses       9516     9525       +9     
Impacted Files Coverage Δ
rust/arrow/src/datatypes.rs 78.59% <0.00%> (-0.16%) ⬇️
rust/datafusion/src/physical_plan/hash_join.rs 84.43% <71.42%> (+0.16%) ⬆️
rust/datafusion/src/physical_plan/group_scalar.rs 70.83% <77.77%> (+2.97%) ⬆️
...ust/datafusion/src/physical_plan/hash_aggregate.rs 84.71% <81.81%> (+0.05%) ⬆️
rust/datafusion/tests/sql.rs 99.83% <100.00%> (+<0.01%) ⬆️
rust/parquet/src/encodings/encoding.rs 95.43% <0.00%> (+0.19%) ⬆️
rust/datafusion/src/scalar.rs 57.76% <0.00%> (+0.39%) ⬆️
rust/datafusion/src/physical_plan/common.rs 78.78% <0.00%> (+13.57%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff1a85a...f8da961. Read the comment docs.

@mqy
Copy link
Contributor

mqy commented Jan 12, 2021

No sure but I think it should be OK to use crate ordered-float:

  • it has only one rust file https://github.com/reem/rust-ordered-float/blob/master/src/lib.rs of about 800 lines.
  • the crate num_traits we are using does not provide Ord, Eq traits.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @ovr

@alamb Do you think we should make this a feature due to the added dependency?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mqy 's assessment that this is a relatively small new dependency (and it doesn't bring its own large depdneneyc stack https://github.com/reem/rust-ordered-float/blob/master/Cargo.toml) and so it should be fine.

@ovr ovr force-pushed the issue-11221 branch 2 times, most recently from 2769a30 to bb098a9 Compare January 12, 2021 22:03
@@ -776,6 +785,14 @@ pub(crate) fn create_group_by_values(
for i in 0..group_by_keys.len() {
let col = &group_by_keys[i];
match col.data_type() {
DataType::Float32 => {
Copy link
Contributor Author

@ovr ovr Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated PR, forget about this place. I am testing it now with real DB example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding an end-to-end sql test in https://github.com/apache/arrow/blob/master/rust/datafusion/tests/sql.rs to make sure the plumbing is all hooked up correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb I've added tests, but I found a strange bug probably with count.

012c1ac#diff-4b06103cf2132b1ab297fbe8cd42622ecbe1109ea26df7d8b358fa36d739549cR357

Thanks

@ovr ovr force-pushed the issue-11221 branch 2 times, most recently from 8bd7523 to 012c1ac Compare January 13, 2021 13:41
rust/datafusion/tests/sql.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great -- thanks again @ovr

@alamb alamb closed this in 1393188 Jan 14, 2021
@ovr ovr deleted the issue-11221 branch May 24, 2021 15:06
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
Rust doesn't provide Eq, Hash for f32/f64 types inside stdlib, it's why I am using an external library called ordered-float which implements this traits. It's better to use external library instead of implementing own inside this repository.

Closes apache#9175 from ovr/issue-11221

Authored-by: Dmitry Patsura <zaets28rus@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants