reimplement `limit_push_down` to remove global-state, enhance optimize and simplify code. #4276

jackwener · 2022-11-18T09:22:10Z

Which issue does this PR close?

Part #4267 .
Close #4263

In this PR, I reimplement this rule.

I use pattern-match to get subtree that we want to match, and then pushdown.
If not match, top-down optimize.

Original implmentation:

use global state Ancestor to record the whole tree limit information. It's easy to cause bug and complex
we must travese whole tree one time.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jackwener · 2022-11-18T09:35:41Z

datafusion/optimizer/src/limit_push_down.rs

+        \n  Limit: skip=10, fetch=1000\
        \n    TableScan: test, fetch=1010";


As for limit-tablescan.
I'm not sure whether tablescan can precise limit, so I keep the limit.

If tablescan limit is completely accurate, we can remove this limit for limit-tablescan

I think we currently don't expect for tablescan to return exactly at most the limit nr of rows AFAIK.
For this to be possible we need to put some more info into the table scan trait, just like we do with statistics.

https://github.com/apache/arrow-datafusion/blob/8c02485cfc3d2c48bc48a557db58e3e5a0f75777/datafusion/core/src/datasource/datasource.rs#L66-L70

Implies that in fact we still need the Limit node even when it has been pushed to the scan as well

Dandandan

Looks like a good improvement

alamb

I didn't review the code but I reviewed all the plan changes in the tests and they looked good to me. Thank you @jackwener

FYI @ming535 who originally contributed this code in #2638

alamb · 2022-11-18T20:32:37Z

datafusion/optimizer/src/limit_push_down.rs

+        \n  Limit: skip=10, fetch=1000\
        \n    TableScan: test, fetch=1010";


https://github.com/apache/arrow-datafusion/blob/8c02485cfc3d2c48bc48a557db58e3e5a0f75777/datafusion/core/src/datasource/datasource.rs#L66-L70

Implies that in fact we still need the Limit node even when it has been pushed to the scan as well

alamb · 2022-11-18T20:32:56Z

datafusion/optimizer/src/limit_push_down.rs

-        \n  Limit: skip=0, fetch=10\
-        \n    Limit: skip=10, fetch=10\
-        \n      TableScan: test, fetch=20";
+        let expected = "Limit: skip=10, fetch=10\


Those plans look much nicer

jackwener · 2022-11-22T03:35:31Z

datafusion/optimizer/src/limit_push_down.rs

+        let expected = "Limit: skip=1000, fetch=0\
+        \n  TableScan: test, fetch=1000";
+
+        assert_optimized_plan_eq(&plan, expected)


Original test miss case to confirm merge limit, avoid overflow problem.
Add more UT to check them.

liukun4515 · 2022-11-22T08:30:05Z

@jackwener Thanks

ursabot · 2022-11-22T08:31:58Z

Benchmark runs are scheduled for baseline = afe2333 and contender = 1bcb333. 1bcb333 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added the optimizer Optimizer rules label Nov 18, 2022

jackwener commented Nov 18, 2022

View reviewed changes

This was referenced Nov 18, 2022

reimplement eliminate_limit #4277

Closed

optimize limit-full join in the limit push down rule #4275

Closed

Dandandan approved these changes Nov 18, 2022

View reviewed changes

alamb approved these changes Nov 18, 2022

View reviewed changes

jackwener force-pushed the reimplement_limit_push_down branch 3 times, most recently from 303452a to ad78bf1 Compare November 19, 2022 17:06

jackwener mentioned this pull request Nov 20, 2022

CI failed in Compare to postgres #4294

Closed

jackwener added 3 commits November 21, 2022 11:14

reimplement limit_push_down.

53c24ee

add comment

1d40670

polish ut

2fd9813

jackwener force-pushed the reimplement_limit_push_down branch from ad78bf1 to 2fd9813 Compare November 21, 2022 03:14

fix bug, add UT

35651c8

jackwener commented Nov 22, 2022

View reviewed changes

jackwener mentioned this pull request Nov 22, 2022

bugfix: fix bug in LimitPushDown which pushdown into limit wrong. #4309

Closed

jackwener changed the title ~~Reimplement limit_push_down~~ reimplement limit_push_down to remove global-state, enhance optimize and simplify code. Nov 22, 2022

liukun4515 approved these changes Nov 22, 2022

View reviewed changes

liukun4515 merged commit 1bcb333 into apache:master Nov 22, 2022

jackwener deleted the reimplement_limit_push_down branch November 24, 2022 02:48

HaoYang670 mentioned this pull request Jan 22, 2023

Simplify the PushDownLimit. #5021

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reimplement `limit_push_down` to remove global-state, enhance optimize and simplify code. #4276

reimplement `limit_push_down` to remove global-state, enhance optimize and simplify code. #4276

jackwener commented Nov 18, 2022 •

edited

Loading

jackwener Nov 18, 2022

Dandandan Nov 18, 2022

alamb Nov 18, 2022

Dandandan left a comment

alamb left a comment

alamb Nov 18, 2022

alamb Nov 18, 2022

jackwener Nov 22, 2022

liukun4515 commented Nov 22, 2022

ursabot commented Nov 22, 2022

		\n Limit: skip=10, fetch=1000\
		\n TableScan: test, fetch=1010";

reimplement limit_push_down to remove global-state, enhance optimize and simplify code. #4276

reimplement limit_push_down to remove global-state, enhance optimize and simplify code. #4276

Conversation

jackwener commented Nov 18, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jackwener Nov 18, 2022

Choose a reason for hiding this comment

Dandandan Nov 18, 2022

Choose a reason for hiding this comment

alamb Nov 18, 2022

Choose a reason for hiding this comment

Dandandan left a comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Nov 18, 2022

Choose a reason for hiding this comment

alamb Nov 18, 2022

Choose a reason for hiding this comment

jackwener Nov 22, 2022

Choose a reason for hiding this comment

liukun4515 commented Nov 22, 2022

ursabot commented Nov 22, 2022

reimplement `limit_push_down` to remove global-state, enhance optimize and simplify code. #4276

reimplement `limit_push_down` to remove global-state, enhance optimize and simplify code. #4276

jackwener commented Nov 18, 2022 •

edited

Loading