Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid exchange for Connectors that support multiple layouts #21366

Open
16pierre opened this issue Nov 10, 2023 · 0 comments
Open

Avoid exchange for Connectors that support multiple layouts #21366

16pierre opened this issue Nov 10, 2023 · 0 comments
Assignees

Comments

@16pierre
Copy link

16pierre commented Nov 10, 2023

Some connectors support a wide range of table layouts / on-demand partitioning.
For such connectors, we can skip the initial exchange that wraps TableScanNode when it's not partitioned properly for its parent nodes (for example a join)

As far as I can tell, this isn't trivial to solve in Presto currently.

Some similar logic was added to avoid exchanges for Hive partitionings that are compatible. See this PR. Unfortunately, this logic isn't flexible enough, i.e. the initial partitioning columns needs to match the desired exchange partitioning.

Another related primitive is the multi-layout support in Connector's metadata: connectors are able to provide different layouts depending on requested columns, unfortunately this logic in PickTableLayout & Metadata#GetLayout doesn't tell the connector which partitioning is desired, therefore the connector can't pick the right partitioning to avoid an exchange...

TableLayoutResult layout = metadata.getLayout(
session,
tableHandle,
Constraint.alwaysTrue(),
Optional.of(tableScanNode.getOutputVariables().stream()
.map(variable -> tableScanNode.getAssignments().get(variable))
.collect(toImmutableSet())));

It looks like there's been quite a bit of discussion on such topics, for example #12674 suggested a path forward was to allow the Connectors to participate in the planning process instead of adding smart layouts APIs. I guess that's what ConnectorPlanOptimizerProvider solved.

I guess I could write a ConnectorPlanOptimizer that detects which nodes trigger exchanges and select the right partitioning to rewrite its TableScanNodes, but this comes with a couple problems:

  • it feels expensive to write and it feels like we're duplicating the AddExchanges logic
  • I'm not sure it would work in all cases, if I read ConnectorPlanOptimizer javadocs correctly, ConnectorPlanOptimizer won't reach nodes whose children involve different connectors, for example the JoinNode for query some_connector JOIN other_connector wouldn't get processed by a ConnectorPlanOptimizer. Here, even though different connectors are involved, we can still pick the proper partitioning to avoid the exchange on the TableScanNode.

Did I miss something here ? What would be the idiomatic way to solve this problem ? Can we extend the Connector API and add callbacks that rewrite the partitioning scheme in PickTableLayout / AddExchanges ?

@tdcmeehan tdcmeehan self-assigned this Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants