-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize distinct from semi join #8092
Conversation
bd57243
to
28fb6ba
Compare
Updated to use in transaction Rule Assert #7979 |
* - Source | ||
* - FilteringSource | ||
* - ProjectNode | ||
* - AggregationNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see distinct node here, do you mean it is under the aggregation?
@Override | ||
public Optional<PlanNode> apply(PlanNode node, Lookup lookup, PlanNodeIdAllocator idAllocator, SymbolAllocator symbolAllocator, Session session) | ||
{ | ||
if (node instanceof SemiJoinNode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implement getPattern
method. Also instead of nested IF
blocks use a short-circuit IF
s
if (!cond1) {
return Optional.empty()
}
if (!cond2) {
return Optional.empty()
}
if (!cond3) {
return Optional.empty()
}
return actualRewrite;
if (node instanceof AggregationNode) { | ||
AggregationNode aggregationNode = (AggregationNode) node; | ||
if (isDistinctNode(aggregationNode)) { | ||
return Optional.of(lookup.resolve(aggregationNode.getSource())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you do not need to resolve this node if you are not trying to cast it to actual node type
return Optional.empty(); | ||
} | ||
|
||
private Optional<PlanNode> rewriteUpstream(PlanNode node, Lookup lookup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use PlanNodeSearcher
in order to do such plan rewrites
"IN (SELECT distinct custkey FROM customer)", | ||
anyTree( | ||
semiJoin("Source", "Filter", "Output", | ||
anyTree(tableScan("orders", ImmutableMap.of("Source", "custkey"))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put tableScan
in separate line
} | ||
|
||
@Test | ||
public void testDoesNotFire() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put some information why it does not fire into the test name
p.project(Assignments.of(filteringSourceKey, expression("x")), | ||
p.aggregation(ab -> ab.step(AggregationNode.Step.SINGLE) | ||
.groupingSets(ImmutableList.of(ImmutableList.of(filteringSourceKey))) | ||
.source(p.tableScan("customer", ImmutableMap.of("custkey_1", "custkey"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of usin nested lists, you could use addGroupingSet
method
@@ -199,20 +207,50 @@ public ApplyNode apply(Assignments subqueryAssignments, List<Symbol> correlation | |||
return new ApplyNode(idAllocator.getNextId(), input, subquery, subqueryAssignments, correlation); | |||
} | |||
|
|||
public SemiJoinNode semiJoin(PlanNode source, PlanNode filteringSource, Symbol sourceJoinSymbol, Symbol filteringSourceJoinSymbol, Symbol semiJoinOutput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put symbols before plan nodes
@@ -390,6 +390,12 @@ private synchronized CatalogMetadata getTransactionCatalogMetadata(ConnectorId c | |||
CatalogMetadata catalogMetadata = this.catalogMetadata.get(connectorId); | |||
if (catalogMetadata == null) { | |||
Catalog catalog = catalogsByConnectorId.get(connectorId); | |||
if (catalog == null) { | |||
// For rule tester, getConnectorId has never been called because the plan was generated outside this transaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if need to change the product because the testing infrastructure code. Maybe we should testing code instead to not have this issue?
@@ -42,11 +47,23 @@ public void test() | |||
Symbol filteringSourceKey = p.symbol("custkey_1", BigintType.BIGINT); | |||
Symbol outputKey = p.symbol("orderkey", BigintType.BIGINT); | |||
return p.semiJoin( | |||
p.tableScan("orders", ImmutableMap.of("custkey", "custkey")), | |||
p.tableScan( | |||
new TableHandle( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe plan builder could have a method, p.tableHandle("local", "local", "orders", TINY_SCALE_FACTOR)
where you could create transactions
@hellium01 are you still planning to work on this? |
Sorry, didn't see the last comment and lost the context since it is a while back. Let me take a look. |
This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions! |
Fixes #7781