-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return Vec<bool> from PredicateBuilder rather than an Fn
#370
Conversation
pub fn build_pruning_predicate( | ||
&self, | ||
row_group_metadata: &[RowGroupMetaData], | ||
) -> Box<dyn Fn(&RowGroupMetaData, usize) -> bool> { | ||
statistics: &[RowGroupMetaData], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main change in this PR is this function's signature. The rest of the changes are fallout from doing that
Ok(values) => values, | ||
// stats filter array could not be built | ||
// return a closure which will not filter out any row groups | ||
_ => return Box::new(|_r, _i| true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing code also (silently) ignores error so we continue that tradition in this PR
9741d87
to
c8bea51
Compare
fyi @yordan-pavlov and @returnString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Makes sense, and I agree that it is a simpler approach. The number of record batches should be small anyways that is fine to allocate a Vec<bool>
here.
Great work and thanks a lot, @alamb !
Co-authored-by: Jorge Leitao <jorgecarleitao@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #370 +/- ##
==========================================
- Coverage 74.94% 74.93% -0.01%
==========================================
Files 146 146
Lines 24314 24318 +4
==========================================
+ Hits 18221 18223 +2
- Misses 6093 6095 +2
Continue to review full report at Codecov.
|
@alamb thank you for making this more generic so that it can be useful in more cases; I do like how the error case (where a |
(BTW @yordan-pavlov the more I work with this code the cooler I think it (and its algorithm) is. Thank you for contributing it in the first place) |
Which issue does this PR close?
re #363
Rationale for this change
As explained on #363 the high level idea goal is to make the parquet row group pruning logic generic to any types of min/max statistics (not just parquet metadata)
What changes are included in this PR?
bool
for each of the input statisticsParquetFileReader
) into the parquet.rs modulebuild_pruning_predicate
rather than silently ignoring them (though they are still silently ignored in parquet.rs as before)Are there any user-facing changes?
No change in parquet functionality is intended in this PR
Sequence:
My next PR will change the input of the
PruningPredicateBuilder
to be genericI am trying to do this in a few small PRs to reduce review burden; Here is how I plan that they will connect together:
Planned changes:
PruningStatstics
Trait (forthcoming PR)