Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Limit/TopN/Sort/DistinctLimit node if it's source is a scalar #441

Merged
merged 6 commits into from
May 17, 2019

Conversation

Praveen2112
Copy link
Member

No description provided.

@cla-bot cla-bot bot added the cla-signed label Mar 11, 2019
@Praveen2112
Copy link
Member Author

@martint , @findepi , @kokosing Can you please review this ?

@Praveen2112
Copy link
Member Author

@findepi Have updated it. Please let me know if there are any more changes.

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also DistinctLimitNode

public class PruneLimitOverScalar
implements Rule<LimitNode>
{
private static final Pattern<LimitNode> PATTERN = limit().matching(limitNode -> limitNode.getCount() != 0L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -64,6 +64,7 @@
import io.prestosql.sql.planner.iterative.rule.PruneJoinChildrenColumns;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update commit message, here in other commits


/**
* Remove TopN node when source is scalar
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comments that do not add value to the code, the name PruneTopNOverScalar says the same.

p.limit(
10,
p.aggregation(
(Consumer<PlanBuilder.AggregationBuilder>) aggregationBuilder -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.limit(
10,
p.aggregation(
(Consumer<PlanBuilder.AggregationBuilder>) aggregationBuilder -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

p.limit(
0,
p.aggregation(
(Consumer<PlanBuilder.AggregationBuilder>) aggregationBuilder -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

import static io.prestosql.sql.planner.plan.Patterns.limit;

/**
* Remove Limit node when source is scalar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... when the subplan is guaranteed to produce fewer rows than the limit"

/**
* Remove Limit node when source is scalar
*/
public class PruneLimitOverScalar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this "RemoveRedundantLimit"

@Praveen2112 Praveen2112 force-pushed the scalar_query_simplification branch 2 times, most recently from 53d2872 to 38f4a2a Compare March 12, 2019 12:46
@Praveen2112 Praveen2112 changed the title Remove Limit/TopN/Sort node if it's source is a scalar Remove Limit/TopN/Sort/DistinctLimit node if it's source is a scalar Mar 12, 2019
@Praveen2112
Copy link
Member Author

@findepi , @martint , @kokosing Have updated the code. Please let me know if there is any changes.

public class RemoveRedundantLimit
implements Rule<LimitNode>
{
private static final Pattern<LimitNode> PATTERN = limit().with(count().matching(limitCount -> limitCount != 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with limitCount == 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both RemoveRedundantLimit and EvaluateZeroLimit rules are in the same optimizer , we don't want LimitNode with 0 count to be removed if its source is scalar as it might not fire EvaluateZeroLimit rule

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could merge these two rules together, they are very simple and they both remove redundant limit.
Another option is add third function which replaces any plan that isAtMost(node, context.getLookup(), 0) with Values. Then order would not bother.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is add third function which replaces any plan that isAtMost(node, context.getLookup(), 0) with Values. Then order would not bother.

I like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kokosing , @martint Have merged both of the two rules

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. Also, can you add some tests similar to those in presto-main/src/test/java/io/prestosql/sql/query ?

public Result apply(TopNNode node, Captures captures, Context context)
{
if (isAtMost(node.getSource(), context.getLookup(), node.getCount())) {
return Result.ofPlanNode(node.getSource());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TopN cannot be removed blindly. The ordering matters. You can replace it with a SortNode in this case.

It's only safe to remove if the row count is guaranteed to be 1.

import static io.prestosql.sql.planner.optimizations.QueryCardinalityUtil.isScalar;
import static io.prestosql.sql.planner.plan.Patterns.sort;

public class PruneSortOverScalar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to RemoveSingleRowSort. "scalar" is no a proper classification for a subquery -- it's a feature of the context in which it's used. I.e., "in a place that expects a scalar value". (The "isScalar" method in QueryCardinalityUtil is misnamed)

import static io.prestosql.sql.planner.plan.Patterns.DistinctLimit.limit;
import static io.prestosql.sql.planner.plan.Patterns.distinctLimit;

public class PruneDistinctLimitOverScalar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to RemoveSingleRowDistinctLimit

@Praveen2112 Praveen2112 force-pushed the scalar_query_simplification branch 2 times, most recently from 5b1c546 to 3ec0abb Compare March 13, 2019 09:43
public class RemoveRedundantLimit
implements Rule<LimitNode>
{
private static final Pattern<LimitNode> PATTERN = limit().with(count().matching(limitCount -> limitCount != 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could merge these two rules together, they are very simple and they both remove redundant limit.
Another option is add third function which replaces any plan that isAtMost(node, context.getLookup(), 0) with Values. Then order would not bother.

anyNot(
LimitNode.class,
anyTree(
tableScan("orders")))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach would be to use something similar to io.prestosql.sql.planner.TestLogicalPlanner#assertPlanContainsNoApplyOrAnyJoin where you would check that there is no Limit or TopN in the plan. Plan assertion is a bit simpler that way. Reasoning of anyNot that is wrapping anyTree might be not trivial, and I am not sure if the pattern is correct.

@@ -79,7 +79,7 @@ public void testUnsupportedSubqueriesWithCoercions()
{
// coercion from subquery symbol type to correlation type
assertions.assertFails(
"select (select count(*) from (values 1) t(a) where t.a=t2.b limit 1) from (values 1.0) t2(b)",
"select (select count(*) from (values 1) t(a) where t.a=t2.b GROUP BY t.a limit 1) from (values 1.0) t2(b)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please upper case SQL keywords, here in other tests below as separate commit before this change.

// explicit LIMIT in subquery
assertQueryFails(
"SELECT (SELECT count(*) FROM (VALUES (7,1)) t(orderkey, value) WHERE orderkey = corr_key LIMIT 1) FROM (values 7) t(corr_key)",
"SELECT (SELECT count(*) FROM (VALUES (7,1)) t(orderkey, value) WHERE orderkey = corr_key GROUP BY value LIMIT 1) FROM (values 7) t(corr_key)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, you have just extended support for correlated subqueries a bit ;)

{
assertPlan(
"SELECT count(*) FROM orders ORDER BY 1",
output(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments

{
assertPlan(
"SELECT count(*) FROM orders ORDER BY 1 LIMIT 10",
output(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments

public class RemoveSingleRowDistinctLimit
implements Rule<DistinctLimitNode>
{
private static final Pattern<DistinctLimitNode> PATTERN = distinctLimit().with(limit().matching(limit -> limit != 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you extend this rule to support regular MarkDistinctNode as well

assertPlan(
"SELECT distinct(c) FROM (SELECT count(*) as c FROM orders GROUP BY orderkey) LIMIT 10",
output(
node(DistinctLimitNode.class,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DisitinctLimitNode was not pruned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the subplan here is not a scalar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I overlooked GROUP BY.

public Result apply(DistinctLimitNode node, Captures captures, Context context)
{
if (isScalar(node.getSource(), context.getLookup())) {
return Result.ofPlanNode(node.getSource());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are losing output symbols here. See hashSymbol in DistinctLimitNode. Also notice that DistinctLimitNode::getOutputSymbolsreturn distinctSymbols which might be different than node.getSource().getOutputSymbols().

I wonder why test didn't find that already, so please make sure that there is test coverage for that. Can you please run your test from TestLogicalPlanner with coverage or debugging to see if you rule was triggered?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but the hashSymbol in DistinctLimitNode will be added only if we set optimize_hash_generation as true and IIRC it will be added in HashGenerationOptimizer which will be invoked after this optimizer. So we can safely assume that hashSymbol will be empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please verify in rule that hashSymbol is empty. Also verify that DistinctLimitNode::getOutputSymbols are same as node.getSource().getOutputSymbols().

@Praveen2112 Praveen2112 force-pushed the scalar_query_simplification branch 2 times, most recently from 6bd6771 to fd9e2a9 Compare March 14, 2019 03:16
{
assertFalse(
searchFrom(plan("SELECT distinct(c) FROM (SELECT count(*) as c FROM orders) LIMIT 10", LogicalPlanner.Stage.OPTIMIZED).getRoot())
.where(isInstanceOfAny(DistinctLimitNode.class))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check there is no MarkDistinctNode. You could also extract a method from this and reuse in assertion below, like:

assertFalse(planContainsDistinctNode("SELECT distinct(c) FROM (SELECT count(*) as c FROM orders) LIMIT 10");
assertTrue(planContainsDistinctNode("SELECT distinct(c) FROM (SELECT count(*) as c FROM orders GROUP BY orderkey) LIMIT 10"));

Please do the same for TopN and Sort.

assertPlan(
"SELECT distinct(c) FROM (SELECT count(*) as c FROM orders GROUP BY orderkey) LIMIT 10",
output(
node(DistinctLimitNode.class,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I overlooked GROUP BY.

searchFrom(plan("SELECT count(*) FROM orders ORDER BY 1", LogicalPlanner.Stage.OPTIMIZED).getRoot())
.where(isInstanceOfAny(SortNode.class))
.matches(),
"Unexpected node for the above query");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unexpected node for the above query -> format("Unexpected sort node for query: '%s'", query). To the same for all below and above.

@martint martint self-requested a review April 22, 2019 22:07
@Praveen2112 Praveen2112 force-pushed the scalar_query_simplification branch 2 times, most recently from 582e861 to 354c49f Compare April 29, 2019 14:10
Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Praveen2112, I think there are still some comments that need to be addressed.

searchFrom(plan(query, LogicalPlanner.Stage.OPTIMIZED).getRoot())
.where(isInstanceOfAny(SortNode.class))
.matches(),
format("Unexpected sor node for query: '%s'", query));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "sor"

{
String query = "SELECT count(*) FROM orders LIMIT 10";
assertFalse(
searchFrom(plan(query, LogicalPlanner.Stage.OPTIMIZED).getRoot())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static import for OPTIMIZED

@@ -79,7 +79,7 @@ public void testUnsupportedSubqueriesWithCoercions()
{
// coercion FROM subquery symbol type to correlation type
assertions.assertFails(
"SELECT (SELECT count(*) FROM (VALUES 1) t(a) WHERE t.a=t2.b LIMIT 1) FROM (VALUES 1.0) t2(b)",
"SELECT (SELECT count(*) from (VALUES 1) t(a) where t.a=t2.b GROUP BY t.a LIMIT 1) from (VALUES 1.0) t2(b)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should belong to previous commit

limit(10,
anyTree(
tableScan("orders")))));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also test SELECT * FROM (VALUES 1,2,3,4,5,6) LIMIT 10?

"SELECT (SELECT t.a FROM (VALUES 1, 2) t(a) WHERE t.a=t2.b LIMIT 2) FROM (VALUES 1) t2(b)",
"VALUES 1");
// cannot enforce limit is less than cardinality of correlated subquery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// cannot enforce LIMIT on correlated subquery

@Override
public Result apply(TopNNode node, Captures captures, Context context)
{
if (isScalar(node.getSource(), context.getLookup())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you use isAtMost here?

public Result apply(TopNNode node, Captures captures, Context context)
{
if (isScalar(node.getSource(), context.getLookup())) {
return Result.ofPlanNode(new SortNode(context.getIdAllocator().getNextId(), node.getSource(), node.getOrderingScheme()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but inlining this here would save us one iterative optimizer loop. Also I think you could handle the case with limit = 0 here.

However, as you pointed it could be a matter of taste. Up to you.

@Override
public Result apply(SortNode node, Captures captures, Context context)
{
if (isScalar(node.getSource(), context.getLookup())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also handle the case where cardinality is 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump

@@ -94,6 +94,7 @@
import io.prestosql.sql.planner.iterative.rule.RemoveFullSample;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, DistinctLimit with limit higher than cardinality of its source node can be rewritten to DistinctNode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge that feature to RemoveSingleRowDistinctLimit

@Praveen2112 Praveen2112 force-pushed the scalar_query_simplification branch 2 times, most recently from febdab8 to e4d4d1b Compare May 5, 2019 05:26
@@ -449,7 +451,7 @@ public void testCorrelatedSubqueries()
{
assertPlan(
"SELECT orderkey FROM orders WHERE 3 = (SELECT orderkey)",
LogicalPlanner.Stage.OPTIMIZED,
OPTIMIZED,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This could be extracted as separate commit.

@@ -42,7 +42,6 @@
import io.prestosql.sql.planner.iterative.rule.DetermineSemiJoinDistributionType;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message

Prune unnecessary TopNNode

Replace TopN node
1. With a Sort node when the subplan is guaranteed to produce fewer rows than N
2. With it's source node when the subplan produces single row
3. With a Values node when N is 0

if (isScalar(node.getSource(), context.getLookup())) {
return Result.ofPlanNode(node.getSource());
}
else if (isAtMost(node.getSource(), context.getLookup(), node.getCount())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: else is redundant

@@ -906,4 +906,35 @@ public void testRemoveSingleRowSort()
anyTree(
tableScan("orders")))));
}

public void testRemoveRedundantTopN()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testRedundantTopNRemoval?

output(
node(ValuesNode.class)));

query = "SELECT * FROM (VALUES 1,2,3,4,5,6) AS t1 ORDER BY 1 LIMIT 10";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please extract each test case as separate test method?

node(AggregationNode.class,
node(ValuesNode.class)));

tester().assertThat(new RemoveRedundantTopN())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please extract each test case as separate test method?

@Override
public Result apply(SortNode node, Captures captures, Context context)
{
if (isScalar(node.getSource(), context.getLookup())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump

.matches(),
format("Unexpected sort node for query: '%s'", query));

query = "SELECT orderkey, count(*) FROM orders GROUP BY orderkey ORDER BY 1";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please extract each test case as separate test method?

return Result.ofPlanNode(node.getSource());
}
else if (isAtMost(node.getSource(), context.getLookup(), node.getLimit())) {
return Result.ofPlanNode(new AggregationNode(node.getId(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why replacing Distinct with Aggregation is better? Shouldn't you use regular DistinctNode here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using MarkDistinct node requires an additional FilterNode and ProjectNode so used the AggregatioNode with no aggregation functions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's ok. MarkDistinct serves a different purpose. @kokosing, there's no explicit DistinctNode -- it's planned as an GROUP BY with no aggregation functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

anyTree(
tableScan("orders")))));

query = "SELECT distinct(id) FROM (VALUES 1, 2, 3, 4, 5, 6) as t1 (id) LIMIT 10";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please extract each test case as separate test method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have added each test method for the optimizer we implemented. So should we write each pattern of queries as separate method ?

@martint martint self-requested a review May 6, 2019 23:08
fgwang7w added a commit to fgwang7w/presto that referenced this pull request Dec 14, 2020
…n is know to single row or less rows than requested

Cherry-pick of trinodb/trino#441

Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>
fgwang7w added a commit to fgwang7w/presto that referenced this pull request Dec 28, 2020
In addition to the Cherry-pick for removing redundant Limit/TopN/Sort/DistinctLimit,
there are a few more rules added to replace any input that is zero-TopN/DistinctLimit/Limit

Cherry-pick of trinodb/trino#441

Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>
rschlussel pushed a commit to prestodb/presto that referenced this pull request Dec 28, 2020
In addition to the Cherry-pick for removing redundant Limit/TopN/Sort/DistinctLimit,
there are a few more rules added to replace any input that is zero-TopN/DistinctLimit/Limit

Cherry-pick of trinodb/trino#441

Co-authored-by: praveenkrishna <praveenkrishna@tutanota.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants