Pushdown `RowFilter` in `ParquetExec` #3380

thinkharderdev · 2022-09-06T17:26:44Z

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Putting this out there for comment since it's at a point where I am more or less happy with the design and all the existing tests pass.

The key points for this PR:

Introduce a helper to build a RowFilter from a filter Expr pushed down to the ParquetExec.
If a pruning predicate is available on the ParquetExec, build a RowFilter (if we ca) and construct our ParquetRecordBatchStream with it.

To build a RowFilter we need to go through a few steps:

Take the pruning Expr (which has been combined into a single predicate) and tear it back apart into separate predicates which are ANDed together.
For each predicate, first determine whether we can use it as a row filter. We consider it valid for that purpose if it does not reference any non-primitive columns (per @tustvold suggestion) and if it does not reference any projected columns (which are not yet available).
Rewrite the Expr to replace any columns not present in the file schema to be null literal (to handle merged schemas without involving the SchemaAdapter),
Gather some stats to estimate the cost of evaluating the expression (currently just total size of all columns and whether all columns are sorted).
Now each predicate is a FilterCandidate and we can sort the candidates by evaluation cost so we can apply the "cheap" filters first.
Convert each candidate to an DatafusionArrowPredicate and build a RowFilter from the whole lot of them.

TODOs:

Need some more specific unit tests
Benchmarks!
I can't actually figure out how to tell whether columns are sorted from the ParquetMetadata :)
Current cost estimation (purely based on compressed size of columns) is the simplest thing that could possibly work but not sure if there is a better way to do it...

A separate conceptual question is around optimizing the number of distinct filters. In this design we simply assume that we want to break the filter into as many distinct predicates as we can but I'm not sure that is always the case given that this forces serial evaluation of the filters. I can imagine many cases where it would be better to group predicates together for evaluation. I didn't want to make the initial implementation too complicated so I punted on that for now, but eventually may want to do cost estimation at a higher level to determine the optimal grouping.

Are there any user-facing changes?

thinkharderdev · 2022-09-06T17:27:13Z

@Ted-Jiang @tustvold @alamb FYI. Would love any feedback you guys have on this one :)

tustvold · 2022-09-06T18:56:18Z

I plan to review this, time permitting, tomorrow. Very exciting 😁

codecov-commenter · 2022-09-06T18:56:30Z

Codecov Report

Merging #3380 (c46975c) into master (17f069d) will increase coverage by 0.03%.
The diff coverage is 90.41%.

@@            Coverage Diff             @@
##           master    #3380      +/-   ##
==========================================
+ Coverage   85.70%   85.73%   +0.03%     
==========================================
  Files         298      299       +1     
  Lines       54961    55170     +209     
==========================================
+ Hits        47103    47302     +199     
- Misses       7858     7868      +10

Impacted Files	Coverage Δ
datafusion/core/src/physical_optimizer/pruning.rs	`95.79% <ø> (+0.24%)`	⬆️
...tafusion/core/src/physical_plan/file_format/mod.rs	`96.95% <ø> (ø)`
datafusion/expr/src/expr_fn.rs	`90.13% <85.00%> (-1.13%)`	⬇️
...n/core/src/physical_plan/file_format/row_filter.rs	`87.60% <87.60%> (ø)`
...sion/core/src/physical_plan/file_format/parquet.rs	`94.90% <100.00%> (+0.34%)`	⬆️
datafusion/common/src/scalar.rs	`85.25% <0.00%> (+0.13%)`	⬆️
datafusion/expr/src/logical_plan/plan.rs	`78.01% <0.00%> (+0.66%)`	⬆️
datafusion/expr/src/window_frame.rs	`93.27% <0.00%> (+0.84%)`	⬆️
datafusion/physical-expr/src/execution_props.rs	`100.00% <0.00%> (+15.38%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Ted-Jiang · 2022-09-07T02:25:55Z

@thinkharderdev cool man !👍 i will review this today !

Ted-Jiang · 2022-09-07T02:51:51Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+    _schema: &Schema,
+    _metadata: &ParquetMetaData,
+) -> Result<bool> {
+    // TODO How do we know this?


If the file with page-index, it will stored at ParquetMetaData .page_indexes.boundary_order.
I think if without page-index there is no way to find out🤔

Parquet doesn't have a way to express multi-column sort orders that I am aware of. I thought DataFusion had some plumbing for this, but I think it doesn't extend into the physical scan operators.

Ted-Jiang · 2022-09-07T02:54:11Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+        metadata: &ParquetMetaData,
+    ) -> Result<Option<FilterCandidate>> {
+        let expr = self.expr.clone();
+        let expr = expr.rewrite(&mut self)?;


why should call this func 🤔

So this will rewrite Column expressions for columns not present in the current parquet file to Expr::Literal(ScalarValue::Null). Basically we want to build the PhysicalExpr to only deal with the projected columns required for the filter (rahter than the entire file) so we need to basically simulate what the SchamaAdapter would do.

Ted-Jiang · 2022-09-07T06:44:32Z

A separate conceptual question is around optimizing the number of distinct filters. In this design we simply assume that we want to break the filter into as many distinct predicates as we can but I'm not sure that is always the case given that this forces serial evaluation of the filters. I can imagine many cases where it would be better to group predicates together for evaluation. I didn't want to make the initial implementation too complicated so I punted on that for now, but eventually may want to do cost estimation at a higher level to determine the optimal grouping.

@thinkharderdev Agree! I remember each distinct filters will apply to the projected col with selection.

One thing i want to mention , when applying filter pushdowm to parquet, some filters exprs are partial_filters, it will also exits in filer operator. I think before all filters base on min_max are partial_filters(is there any situation pushDowan to parquet use full_filters🤔 ).

After use this row_filter i think it could be a full_filters （we need some code change in push down rule implemention）and then we could eliminate the filters exprs in filter operator.🤔 @alamb I think you are familiar with this（rewrite the push down expr）

thinkharderdev · 2022-09-07T11:09:24Z

A separate conceptual question is around optimizing the number of distinct filters. In this design we simply assume that we want to break the filter into as many distinct predicates as we can but I'm not sure that is always the case given that this forces serial evaluation of the filters. I can imagine many cases where it would be better to group predicates together for evaluation. I didn't want to make the initial implementation too complicated so I punted on that for now, but eventually may want to do cost estimation at a higher level to determine the optimal grouping.

@thinkharderdev Agree! I remember each distinct filters will apply to the projected col with selection.

One thing i want to mention , when applying filter pushdowm to parquet, some filters exprs are partial_filters, it will also exits in filer operator. I think before all filters base on min_max are partial_filters(is there any situation pushDowan to parquet use full_filters🤔 ).

After use this row_filter i think it could be a full_filters （we need some code change in push down rule implemention）and then we could eliminate the filters exprs in filter operator.🤔 @alamb I think you are familiar with this（rewrite the push down expr）

Yes! This is I think the next phase. Once we can push down exact filters to the scan we can represent that in the ListingTable. The pushdown doesn't actually rewrite the filters. The existing filter Expr just get pushed down and it's actually PruningPredicate which rewrites them as min/max filters on the statistics. But they all (currently) get pushed down as inexact which means they would get executed twice (once in the scan and once again in the filter operator). If the optimizer can push down ALL the filters as exact then we can eliminate the Filter operator entirely (which also unlocks the possibility of pushing the limit down to the scan as well if there is one)

tustvold

This is looking really cool, sorry it has taken so long to review.

Perhaps we could have an option or something to not re-order predicates, and this would allow the TableProvider and/or physical optimizer to use additional information to rewrite the predicates in a specific order? To give a concrete example of where this might be useful, in IOx the parquet files are sorted lexicographically, and it therefore makes sense for predicates to be evaluated in this order.

eventually may want to do cost estimation at a higher level to determine the optimal grouping.

Yeah, I suspect this will benefit from being handled at a higher-level than the individual ParquetExec operator

tustvold · 2022-09-07T10:26:04Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+            | DataType::LargeList(_)
+            | DataType::Struct(_)
+            | DataType::Union(_, _, _)
+            | DataType::Dictionary(_, _)


I think we should include predicates on dictionaries

tustvold · 2022-09-08T15:13:24Z

datafusion/core/src/physical_plan/file_format/parquet.rs

-                .with_batch_size(batch_size)
-                .with_row_groups(row_groups)
-                .build()?;
+            let stream = match row_filter {


It might be nicer to write this as

if let Some(filter) = row_filter { builder = builder.with_row_filter(filter) }

Or something

tustvold · 2022-09-08T15:17:50Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+        filters.push(Box::new(filter));
+    }
+
+    other_candidates.sort_by_key(|c| c.required_bytes);


Perhaps we could move this ahead of the partition, so that it also used to sort the indexed expressions?

tustvold · 2022-09-08T15:20:30Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+    }
+}
+
+impl<'a> ExprRewriter for FilterCandidateBuilder<'a> {


This is a nice use of the existing plumbing for walking expressions 👍

tustvold · 2022-09-08T15:21:34Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+            Ok(None)
+        } else {
+            let required_bytes =
+                size_of_columns(&self.required_columns, self.file_schema, metadata)?;


Why not pass required_column_indices here instead?

tustvold · 2022-09-08T15:23:05Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+    _schema: &Schema,
+    _metadata: &ParquetMetaData,
+) -> Result<bool> {
+    // TODO How do we know this?


Parquet doesn't have a way to express multi-column sort orders that I am aware of. I thought DataFusion had some plumbing for this, but I think it doesn't extend into the physical scan operators.

tustvold · 2022-09-08T15:25:04Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+            .map(|v| v.into_array(batch.num_rows()))
+        {
+            Ok(array) => {
+                if let Some(mask) = array.as_any().downcast_ref::<BooleanArray>() {


I think the logic in parquet already handles this, and so we can skip doing this here

@thinkharderdev do you plan on making this change in this PR or a follow on on?

I think @tustvold was referring to the prep_null_mask call which is removed. Unless I'm missing something then this should be good to go.

tustvold · 2022-09-08T15:26:08Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+    let candidates: Vec<FilterCandidate> = predicates
+        .into_iter()
+        .flat_map(|expr| {
+            if let Ok(candidate) =


This could be written as expr.ok().then(|| ...) up to you

tustvold · 2022-09-08T15:27:48Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+    expr: Expr,
+    file_schema: &'a Schema,
+    table_schema: &'a Schema,
+    required_columns: HashSet<Column>,


I don't think this is necessary, in addition to required_column_indices

yeah, I had this one first and then needed to add the indices for something else but we don't need both.

alamb

TLDR I really like this PR @thinkharderdev -- the structure is very cool.

We probably need to think about how to test it a bit more before turning it on by default.

Perhaps we could add a flag to the parquet reader exec to enable / disable the entire "push predicates down to the parquet scan" optimization, disabled by default (rather than just the reorder_predicates flag) and merge the PR in.

Then we can test in various places (like I could test with the IOx tests) and when we are confident it is ready to be enabled, we can turn it on by default.

(as a total aside, I love the name of this branch in coralogix https://github.com/coralogix/arrow-datafusion/tree/ludicrous-speed)

alamb · 2022-09-10T11:43:55Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+    projection: Vec<usize>,
+}
+
+/// Helper to build a `FilterCandidate`. This will do several things


alamb · 2022-09-10T11:46:29Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+/// 3. Rewrite column expressions in the predicate which reference columns not in the particular file schema.
+///    This is relevant in the case where we have determined the table schema by merging all individual file schemas
+///    and any given file may or may not contain all columns in the merged schema. If a particular column is not present
+///    we replace the column expression with a literal expression that produces a null value.


We do very similar rewrites in IOx; 👍

alamb · 2022-09-10T11:53:08Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+
+    if candidates.is_empty() {
+        Ok(None)
+    } else if reorder_predicates {


First of all, I think the idea of reordering predicates very close to the scan where we have the most data (rather than passing them down) is very good. 👍

I think it may help readability to articulate clearly somewhere (maybe the top of this file) what the expended filter ordering criteria are.

Some properties I have seen in heuristics like this that might be worth thinking about:

"single column filters" (aka ones that can be evaluated by decoding a single column and maybe can be evaluated by special case logic in the reader) -- especially since the parquet reader will have to re-decode duplicated columns

single column filters on columns with the fewest RLE runs (meaning they will be the cheapest to evauate) -- which may be already handled by looking at encoded size.

earlier in the sort order (as you have here).

The more I think about it, though, the more I like the idea of using size_of_columns / row_count as a proxy for sort order / speed of evaluation.

Always evaluating single column predicates first may also be valuable idea to explore

Yeah, everything else I could come up with I think ended up being pretty well-approximated by the column size. What will be interesting to figure out from an API design perspective is how to allow users to specify hints. There may be invariants or regularities in the dataset that aren't really expressed in any metadata that could still be relevant to ordering.

I think the API in this PR is pretty good in terms of Hints:

Option 1 (default): Allow implementation to reorder based on heuristics

Option 2: Do not reorder predicates and apply them in the order specified

One limitation of Option 2 is that the order can't be specified for different parquet files but if that is an important usecase we could design something more sophisticated later.

alamb · 2022-09-10T11:55:42Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+    }
+}
+
+fn is_primitive_field(field: &Field) -> bool {


🤔 eventually this is probably a good function to move to arrow -- so it doesn't get out of date if new types are added

There are similar examples like https://docs.rs/arrow/22.0.0/arrow/datatypes/enum.DataType.html#method.is_numeric already

Filed apache/arrow-rs#2704

alamb · 2022-09-10T11:56:49Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+/// Take combined filter (multiple boolean expressions ANDed together)
+/// and break down into distinct filters. This should be the inverse of
+/// `datafusion_expr::expr_fn::combine_filters`
+fn disjoin_filters(combined_expr: Expr) -> Vec<Expr> {


This is also quite common -- I recommend putting it in datafusion_expr::expr_fn::uncombine_filters or something

I bet there are several copies of this code in the datafusion codebase already (we certainly have some in IOx 😆 )

alamb · 2022-09-11T11:40:07Z

@thinkharderdev please let me know when you think this PR is ready for final review / merge (it still says "RFC" in the title so I am not quite sure)

thinkharderdev · 2022-09-11T14:22:11Z

@thinkharderdev please let me know when you think this PR is ready for final review / merge (it still says "RFC" in the title so I am not quite sure)

I think it's good now (pending any other feedback/comments). I can work on the benchmarks in a follow-up PR.

alamb · 2022-09-11T21:05:31Z

Will review carefully tomorrow

alamb

I think this PR can be merged in as is and we can keep iterating on it subsequently.

Really nice @thinkharderdev and kudos to the rest of the predicate pushdown team @Ted-Jiang and @tustvold (sorry if I have forgotten others)

It would be nice to file a follow on issue (I can do so if you like) listing the steps we felt were necessary to turn on predicate pushdown by default. In my mind this is primarily about testing (functional and performance).

datafusion/core/src/physical_plan/file_format/parquet.rs

alamb · 2022-09-12T12:37:47Z

datafusion/core/src/physical_plan/file_format/parquet.rs

@@ -839,6 +893,7 @@ mod tests {
        projection: Option<Vec<usize>>,
        schema: Option<SchemaRef>,
        predicate: Option<Expr>,
+        pushdown_predicate: bool,


Maybe in a follow on PR we can adjust the tests to do the same scan both with and without pushdown

alamb · 2022-09-12T12:38:34Z

datafusion/core/src/physical_plan/file_format/parquet.rs

+                .await
+                .unwrap();
+
+        // This does not look correct since the "c2" values in the result do not in fact match the predicate `c2 == 0`


This comment seems out of place / out of date as the predicate is c2 == 1 and the actual rows returned are in fact where c2 = 1

Yeah, lazy copy pasta :)

alamb · 2022-09-12T12:41:08Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+            .map(|v| v.into_array(batch.num_rows()))
+        {
+            Ok(array) => {
+                if let Some(mask) = array.as_any().downcast_ref::<BooleanArray>() {


@thinkharderdev do you plan on making this change in this PR or a follow on on?

alamb · 2022-09-12T12:44:23Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+
+        for candidate in indexed_candidates {
+            let filter =
+                DatafusionArrowPredicate::try_new(candidate, file_schema, metadata)?;


Can this ever return an error due to some sort of unsupported predicate ? If so, perhaps we could log the error (in debug!) and continue.

It shouldn't ever fail unless there is a bug (i.e. there is no expected failure case). I think if this fails then the query will almost certainly fail somewhere else.

alamb · 2022-09-12T12:45:19Z

datafusion/expr/src/expr_fn.rs

@@ -429,6 +429,40 @@ pub fn combine_filters(filters: &[Expr]) -> Option<Expr> {
    Some(combined_filter)
 }

+/// Take combined filter (multiple boolean expressions ANDed together)


datafusion/expr/src/expr_fn.rs

alamb

Since this feature defaults to off, I think it would be ok to merge in prior to the 12.0.0 release candidate but I defer to @andygrove on that one

alamb · 2022-09-12T12:56:24Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+    }
+}
+
+fn is_primitive_field(field: &Field) -> bool {


Filed apache/arrow-rs#2704

datafusion/core/src/physical_plan/file_format/row_filter.rs

thinkharderdev · 2022-09-12T14:54:17Z

I think this PR can be merged in as is and we can keep iterating on it subsequently.

Really nice @thinkharderdev and kudos to the rest of the predicate pushdown team @Ted-Jiang and @tustvold (sorry if I have forgotten others)

It would be nice to file a follow on issue (I can do so if you like) listing the steps we felt were necessary to turn on predicate pushdown by default. In my mind this is primarily about testing (functional and performance).

Thanks! I can clean up the last few things mentioned in your review this afternoon/evening. I'll also create a followup ticket with remaining items. In general I think the follow-ons are:

Add some proper benchmarks
Once we are on arrow-rs 23 (I think) and using the offset index is stabilized we should allow users to pass that down in the ParquetScanOption.
Once we are all comfortable with it, turn filter pushdown on by default.

thinkharderdev · 2022-09-12T23:49:35Z

Created issues for follow-up work:

#3456
#3457

thinkharderdev · 2022-09-12T23:53:09Z

I'm not sure what's going on with the checks. The errors look like it is building a different version of the code....

tustvold · 2022-09-13T07:19:01Z

Github actions merges master when running tests, try merging master into this branch perhaps?

alamb · 2022-09-13T10:52:12Z

I collected follow on work and other related items I could find into a tracking ticket / epic in #3462 -- it is great story / project.

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

alamb · 2022-09-13T19:16:29Z

🚀

ursabot · 2022-09-13T19:22:48Z

Benchmark runs are scheduled for baseline = af0d50a and contender = 9fbee1a. 9fbee1a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

liukun4515 · 2022-09-13T21:40:50Z

datafusion/expr/src/expr_fn.rs

+
+        assert_predicates(actual, vec![col("a"), col("b")]);
+
+        let expr = col("a").and(col("b")).or(col("c"));


we should add other test case for this.
The expr is a and b or c which will be handle lik `binary( a and b, or ,c).

But I think the above expr can be converted to c or a and b, and will get error result.

cc @thinkharderdev @alamb

Maybe I am wrong

Could you be a more specific @liukun4515 -- I am not sure if you are pointing out a potential bug . This line makes the predicate (a AND b) OR c I believe. What do you mean by c or a and b?

Could you be a more specific @liukun4515 -- I am not sure if you are pointing out a potential bug . This line makes the predicate (a AND b) OR c I believe. What do you mean by c or a and b?

sorry for that, I think it's my wrong for this.

liukun4515 · 2022-09-14T00:48:45Z

datafusion/core/src/physical_plan/file_format/row_filter.rs

+        Ok(RewriteRecursion::Continue)
+    }
+
+    fn mutate(&mut self, expr: Expr) -> Result<Expr> {


@alamb @thinkharderdev
I find a issue caused by this rewrite in my work #3396
if we have two parquet, one has c1,c2, the other has c1,c3.
If the filter is c2 = 1, then it will produce a expr NULL = 1 for the parquet with c1,c3 column.
After I remove the binary type coercion in the physical phase, the binary physical expr of NULL = INT(1) can't be created, because we don't need to support type coercion in creation physical expr.

We need to make sure the data type of the NULL value.

we need to get the data type from the table schema.

liukun4515 · 2022-09-14T00:53:37Z

datafusion/core/src/physical_plan/file_format/parquet.rs

+                .then(|| pruning_predicate.as_ref().map(|p| p.logical_expr()))
+                .flatten()
+            {
+                if let Ok(Some(filter)) = build_row_filter(


if there is an error when build the row filter, why not throw the error out?

github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions labels Sep 6, 2022

Ted-Jiang reviewed Sep 7, 2022

View reviewed changes

tustvold reviewed Sep 8, 2022

View reviewed changes

alamb reviewed Sep 10, 2022

View reviewed changes

thinkharderdev changed the title ~~RFC: Integrate RowFilter into ParquetExec~~ Integrate RowFilter into ParquetExec Sep 11, 2022

thinkharderdev changed the title ~~Integrate RowFilter into ParquetExec~~ Pushdown RowFilter in ParquetExec Sep 11, 2022

alamb approved these changes Sep 12, 2022

View reviewed changes

alamb reviewed Sep 12, 2022

View reviewed changes

thinkharderdev mentioned this pull request Sep 12, 2022

Add benchmarks for parquet queries with filter pushdown enabled #3457

Closed

alamb mentioned this pull request Sep 13, 2022

Enable parquet filter pushdown by default #3463

Open

6 tasks

thinkharderdev added 3 commits September 13, 2022 06:58

Initial implementation

3ff4d40

Remove debugging cruft

f78627c

Add license header

d4b9f01

thinkharderdev and others added 11 commits September 13, 2022 06:58

PR comments

ad1920f

no need to prep null mask

b945298

PR comments

3b969b0

unit tests

285f978

Fix comment

5c711dc

fix doc string

d2e4d70

Update datafusion/core/src/physical_plan/file_format/row_filter.rs

14d022b

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Update datafusion/expr/src/expr_fn.rs

a8ca802

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Update datafusion/core/src/physical_plan/file_format/parquet.rs

a9efaf8

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

PR comments

06223ae

Fix compiler error after rebase

c46975c

thinkharderdev force-pushed the parquet-row-filter branch from e3f1d33 to c46975c Compare September 13, 2022 11:01

alamb merged commit 9fbee1a into apache:master Sep 13, 2022

liukun4515 reviewed Sep 13, 2022

View reviewed changes

liukun4515 mentioned this pull request Sep 13, 2022

remove type coercion in the binary physical expr #3396

Merged

liukun4515 reviewed Sep 14, 2022

View reviewed changes

liukun4515 mentioned this pull request Sep 14, 2022

The data type of predicate in the row filter should be same in the binary expr #3469

Closed

Ted-Jiang mentioned this pull request Sep 14, 2022

[feat] Support columns_sorted in row_filters with pageIndex #3477

Closed


		assert_predicates(actual, vec![col("a"), col("b")]);

		let expr = col("a").and(col("b")).or(col("c"));

Pushdown RowFilter in ParquetExec #3380

Pushdown RowFilter in ParquetExec #3380

Conversation

thinkharderdev commented Sep 6, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

thinkharderdev commented Sep 6, 2022

tustvold commented Sep 6, 2022

codecov-commenter commented Sep 6, 2022 • edited Loading

Codecov Report

Ted-Jiang commented Sep 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ted-Jiang commented Sep 7, 2022 • edited Loading

thinkharderdev commented Sep 7, 2022

tustvold left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Sep 11, 2022 • edited Loading

thinkharderdev commented Sep 11, 2022

alamb commented Sep 11, 2022

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thinkharderdev commented Sep 12, 2022

thinkharderdev commented Sep 12, 2022

thinkharderdev commented Sep 12, 2022

tustvold commented Sep 13, 2022

alamb commented Sep 13, 2022

alamb commented Sep 13, 2022

ursabot commented Sep 13, 2022

liukun4515 Sep 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liukun4515 Sep 14, 2022 • edited Loading

Choose a reason for hiding this comment

Pushdown `RowFilter` in `ParquetExec` #3380

Pushdown `RowFilter` in `ParquetExec` #3380

codecov-commenter commented Sep 6, 2022 •

edited

Loading

Ted-Jiang commented Sep 7, 2022 •

edited

Loading

tustvold left a comment •

edited

Loading

alamb commented Sep 11, 2022 •

edited

Loading

liukun4515 Sep 13, 2022 •

edited

Loading

liukun4515 Sep 14, 2022 •

edited

Loading