Optimize filter executor in pull-based executor #4421

xudong963 · 2022-11-29T10:22:35Z

Which issue does this PR close?

No

Rationale for this change

If Selection operator doesn't produce any rows, we can just pull next RecordBatches from it.

What changes are included in this PR?

Add loop for filter executor.

Are these changes tested?

Covered by existing tests

Are there any user-facing changes?

No

xudong963 · 2022-11-29T10:27:04Z

I think the more elegant way is to directly Skip current iterator to the next iterator.

For example:

trait A {
    fn execute() -> Result<Stream>;
}


Struct B {
    input: C,
}

Struct C {
    input: D
}

Struct D {
    
}

// All D, B, C have implemented trait A and trait Stream
// Each execute method will call its input's execute method


fn main {
    let b = B::new();
    let data_stream = b.execute();
    while let Some(stream) = data_stream.next().await {
        ...
    }
}

impl Stream for C {
    type Item = ..;

    fn poll_next(
        mut self: std::pin::Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Option<Self::Item>> {
        ...
        if !predicate(value) {
            // Skip current iteration
        }
    }
}

But I don't find a proper way to implement it, Stream crate seems not to provide related API. Any thoughts? @tustvold

datafusion/core/src/physical_plan/filter.rs

alamb

Seems like a reasonable change to me. Thanks @xudong963

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Ted-Jiang

Nice find! 👍

xudong963 · 2022-11-30T11:38:27Z

Thanks for your review!

ursabot · 2022-11-30T11:41:45Z

Benchmark runs are scheduled for baseline = fdc83e8 and contender = 522a2a4. 522a2a4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

optimzie filter executor

70bac4d

github-actions bot added the core Core DataFusion crate label Nov 29, 2022

xudong963 requested a review from tustvold November 29, 2022 10:27

alamb changed the title ~~Optimzie filter executor in pull-based executor~~ Optimize filter executor in pull-based executor Nov 29, 2022

alamb reviewed Nov 29, 2022

View reviewed changes

datafusion/core/src/physical_plan/filter.rs Show resolved Hide resolved

alamb approved these changes Nov 29, 2022

View reviewed changes

Update datafusion/core/src/physical_plan/filter.rs

0549c5f

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Ted-Jiang approved these changes Nov 30, 2022

View reviewed changes

xudong963 merged commit 522a2a4 into apache:master Nov 30, 2022

xudong963 deleted the filter_executor branch November 30, 2022 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize filter executor in pull-based executor #4421

Optimize filter executor in pull-based executor #4421

xudong963 commented Nov 29, 2022

xudong963 commented Nov 29, 2022

alamb left a comment

Ted-Jiang left a comment

xudong963 commented Nov 30, 2022

ursabot commented Nov 30, 2022

Optimize filter executor in pull-based executor #4421

Optimize filter executor in pull-based executor #4421

Conversation

xudong963 commented Nov 29, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

xudong963 commented Nov 29, 2022

alamb left a comment

Choose a reason for hiding this comment

Ted-Jiang left a comment

Choose a reason for hiding this comment

xudong963 commented Nov 30, 2022

ursabot commented Nov 30, 2022