Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize distinct from semi join #8092

Closed

Conversation

hellium01
Copy link
Contributor

@hellium01 hellium01 commented May 19, 2017

Fixes #7781

@hellium01 hellium01 force-pushed the OptimizeDistinctFromSemiJoin branch from bd57243 to 28fb6ba Compare May 19, 2017 23:32
@hellium01 hellium01 requested a review from martint May 19, 2017 23:34
@hellium01
Copy link
Contributor Author

hellium01 commented May 19, 2017

Updated to use in transaction Rule Assert #7979

@kokosing kokosing self-assigned this Jun 20, 2017
* - Source
* - FilteringSource
* - ProjectNode
* - AggregationNode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see distinct node here, do you mean it is under the aggregation?

@Override
public Optional<PlanNode> apply(PlanNode node, Lookup lookup, PlanNodeIdAllocator idAllocator, SymbolAllocator symbolAllocator, Session session)
{
if (node instanceof SemiJoinNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implement getPattern method. Also instead of nested IF blocks use a short-circuit IFs

if (!cond1) {
  return Optional.empty()
}
if (!cond2) {
  return Optional.empty()
}
if (!cond3) {
  return Optional.empty()
}

return actualRewrite;

if (node instanceof AggregationNode) {
AggregationNode aggregationNode = (AggregationNode) node;
if (isDistinctNode(aggregationNode)) {
return Optional.of(lookup.resolve(aggregationNode.getSource()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you do not need to resolve this node if you are not trying to cast it to actual node type

return Optional.empty();
}

private Optional<PlanNode> rewriteUpstream(PlanNode node, Lookup lookup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use PlanNodeSearcher in order to do such plan rewrites

"IN (SELECT distinct custkey FROM customer)",
anyTree(
semiJoin("Source", "Filter", "Output",
anyTree(tableScan("orders", ImmutableMap.of("Source", "custkey"))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put tableScan in separate line

}

@Test
public void testDoesNotFire()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put some information why it does not fire into the test name

p.project(Assignments.of(filteringSourceKey, expression("x")),
p.aggregation(ab -> ab.step(AggregationNode.Step.SINGLE)
.groupingSets(ImmutableList.of(ImmutableList.of(filteringSourceKey)))
.source(p.tableScan("customer", ImmutableMap.of("custkey_1", "custkey")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of usin nested lists, you could use addGroupingSet method

@@ -199,20 +207,50 @@ public ApplyNode apply(Assignments subqueryAssignments, List<Symbol> correlation
return new ApplyNode(idAllocator.getNextId(), input, subquery, subqueryAssignments, correlation);
}

public SemiJoinNode semiJoin(PlanNode source, PlanNode filteringSource, Symbol sourceJoinSymbol, Symbol filteringSourceJoinSymbol, Symbol semiJoinOutput)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put symbols before plan nodes

@@ -390,6 +390,12 @@ private synchronized CatalogMetadata getTransactionCatalogMetadata(ConnectorId c
CatalogMetadata catalogMetadata = this.catalogMetadata.get(connectorId);
if (catalogMetadata == null) {
Catalog catalog = catalogsByConnectorId.get(connectorId);
if (catalog == null) {
// For rule tester, getConnectorId has never been called because the plan was generated outside this transaction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if need to change the product because the testing infrastructure code. Maybe we should testing code instead to not have this issue?

@@ -42,11 +47,23 @@ public void test()
Symbol filteringSourceKey = p.symbol("custkey_1", BigintType.BIGINT);
Symbol outputKey = p.symbol("orderkey", BigintType.BIGINT);
return p.semiJoin(
p.tableScan("orders", ImmutableMap.of("custkey", "custkey")),
p.tableScan(
new TableHandle(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe plan builder could have a method, p.tableHandle("local", "local", "orders", TINY_SCALE_FACTOR) where you could create transactions

@findepi
Copy link
Contributor

findepi commented Nov 13, 2018

@hellium01 are you still planning to work on this?

@hellium01
Copy link
Contributor Author

Sorry, didn't see the last comment and lost the context since it is a while back. Let me take a look.

@stale
Copy link

stale bot commented May 12, 2019

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

@stale stale bot added the stale label May 12, 2019
@stale stale bot closed this May 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove DISTINCT inside IN subquery expression
4 participants