Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count distinct floats #252

Merged
merged 4 commits into from
May 4, 2021
Merged

Conversation

pjmore
Copy link
Contributor

@pjmore pjmore commented May 4, 2021

Which issue does this PR close?

Closes #199 .

What changes are included in this PR?

Modified try_from_array method on ScalarValue and added tests to physical_plan/distinct_expressions.rs

The tests cover all of the floating point weirdness that I could think of, namely NaN, the infinities, and subnormal numbers, but if I missed any I'm happy to add tests for them.

Are there any user-facing changes?

No

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @pjmore !

@codecov-commenter
Copy link

Codecov Report

Merging #252 (774d7b4) into master (e271e4d) will increase coverage by 0.00%.
The diff coverage is 70.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #252   +/-   ##
=======================================
  Coverage   76.80%   76.81%           
=======================================
  Files         133      133           
  Lines       23284    23294   +10     
=======================================
+ Hits        17884    17894   +10     
  Misses       5400     5400           
Impacted Files Coverage Δ
datafusion/src/scalar.rs 54.36% <0.00%> (+0.22%) ⬆️
...tafusion/src/physical_plan/distinct_expressions.rs 90.80% <87.50%> (-0.11%) ⬇️
datafusion/src/physical_plan/group_scalar.rs 58.82% <0.00%> (+1.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e271e4d...774d7b4. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it out locally and it works great. Thank you @pjmore !

alamb@ip-10-0-0-124:~/Software/arrow-datafusion$  echo "foo,1.23" > /tmp/foo.csv
alamb@ip-10-0-0-124:~/Software/arrow-datafusion$ cargo run -p datafusion-cli
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
     Running `target/debug/datafusion-cli`
> CREATE EXTERNAL TABLE t (a varchar, b float) STORED AS CSV LOCATION '/tmp/foo.csv';
0 rows in set. Query took 0 seconds.

> select count(distinct a) from t;
+-------------------+
| COUNT(DISTINCT a) |
+-------------------+
| 1                 |
+-------------------+
1 rows in set. Query took 0 seconds.
> select count(distinct b) from t;
+-------------------+
| COUNT(DISTINCT b) |
+-------------------+
| 1                 |
+-------------------+
1 rows in set. Query took 0 seconds.

@Dandandan Dandandan merged commit cb0a4a9 into apache:master May 4, 2021
@Dandandan
Copy link
Contributor

Thanks again @pjmore !

@alamb
Copy link
Contributor

alamb commented May 4, 2021

🎉

@houqp houqp added datafusion Changes in the datafusion crate enhancement New feature or request labels Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

COUNT DISTINCT does not support for Float64
6 participants