Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to create ArrayRef from an iterator of ScalarValues #381

Merged
merged 4 commits into from
May 24, 2021

Conversation

alamb
Copy link
Contributor

@alamb alamb commented May 21, 2021

Which issue does this PR close?

This is part of #363 where I am trying to evaluate pruning predicates on record batches that represent the min/max values of statistics.

Rationale for this change

The proposed interface for providing min/max values is as ScalarValues and I need a way to take a bunch of ScalarValues and turn them into an Array.

You can see how it is used in #380

What changes are included in this PR?

  1. Add ScalarValue::iter_to_array, tests for same

Are there any user-facing changes?

ScalarValue::iter_to_array is now available

@jorgecarleitao
Copy link
Member

Shouldn't we use the Builder API to create an array of the values from the statistics instead of stats -> Vec<Scalar> -> Array?

@alamb
Copy link
Contributor Author

alamb commented May 21, 2021

Shouldn't we use the Builder API to create an array of the values from the statistics instead of stats -> Vec -> Array?

@jorgecarleitao could you please elaborate on what you have in mind here? The reason ScalarValue got involved was that I proposed returning Min/Max values as ScalarValue (in a trait that I can implement for things outside of DataFusion) in #380.

@jorgecarleitao
Copy link
Member

ah, sorry, I misunderstand. The idea is to generalize over the parquet min/max. Yeap, makes total sense 👍; sorry for the noise

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great 👍 it should also be very useful for code around aggregations

datafusion/src/scalar.rs Outdated Show resolved Hide resolved
datafusion/src/scalar.rs Outdated Show resolved Hide resolved
@jimexist
Copy link
Member

this is definitely helpful while implementing #375 , I was this close of trying to implement this by myself :-)

let mut scalars = scalars.into_iter().peekable();

// figure out the type based on the first element
let data_type = match scalars.peek() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to have an iter of a mixture of null and present values? in that case the first value can be null

Copy link
Member

@jimexist jimexist May 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind - even if for null the data type info is present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- this confused me a little -- it is convenient that ScalarValue is always typed even when it is None

@alamb
Copy link
Contributor Author

alamb commented May 23, 2021

I'll plan to merge this when the CI is green

@codecov-commenter
Copy link

codecov-commenter commented May 23, 2021

Codecov Report

Merging #381 (b2f771e) into master (8b31714) will increase coverage by 0.01%.
The diff coverage is 72.56%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #381      +/-   ##
==========================================
+ Coverage   74.88%   74.90%   +0.01%     
==========================================
  Files         146      146              
  Lines       24368    24481     +113     
==========================================
+ Hits        18249    18338      +89     
- Misses       6119     6143      +24     
Impacted Files Coverage Δ
datafusion/src/scalar.rs 58.66% <72.56%> (+4.95%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b31714...b2f771e. Read the comment docs.

datafusion/src/scalar.rs Outdated Show resolved Hide resolved
@Dandandan
Copy link
Contributor

Dandandan commented May 23, 2021

@alamb there is still a typo in the code

alamb and others added 4 commits May 24, 2021 08:56
@alamb alamb merged commit 9fdc4fe into apache:master May 24, 2021
@alamb alamb deleted the alamb/iter_to_array branch May 24, 2021 13:33
@houqp houqp added api change Changes the API exposed to users of the crate datafusion Changes in the datafusion crate enhancement New feature or request and removed api change Changes the API exposed to users of the crate labels Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants