Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tableLayoutHandle #12674

Merged
merged 8 commits into from
Apr 30, 2019
Merged

Conversation

hellium01
Copy link
Contributor

@hellium01 hellium01 commented Apr 15, 2019

This PR is to remove tableLayoutHandle completely from PlanNodes and metadata APIs.

For background and future works, see the following comments:

@highker
Copy link
Contributor

highker commented Apr 18, 2019

Is this ready for review?

@hellium01
Copy link
Contributor Author

I thought it was ok but it seems like there is merge conflict now... Let me rebase it first.

@hellium01 hellium01 force-pushed the RefactorPickLayout branch 8 times, most recently from 945d64f to 9c9a07d Compare April 19, 2019 08:28
@hellium01
Copy link
Contributor Author

hellium01 commented Apr 19, 2019

All the problem fixed, it is now ready to review.

Adding @wenleix /@andrii as it affected TableHandle and getAlternativeTableLayoutHandle, createTemporaryTable.

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • "Rename listTableLayouts method" . Looks good (see also trinodb/trino@d7cde8c )
    Why rename to pushPredicateIntoTableScan instead of pushFilterIntoTableScan ?

  • "Add missing null checks on TableLayoutResult constructor" . Looks good. (See also trinodb/trino@c68109c)

  • "Remove support for multiple layouts" . Looks good. (See also trinodb/trino@801423b).
    Please remove unrelated changes in HiveIntegrationSmokeTest.

@wenleix wenleix self-assigned this Apr 20, 2019
Copy link
Contributor

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice a patch. Reviewed the first 10 commits and skimmed through the rest. There are some api change needs more discussion.

@highker
Copy link
Contributor

highker commented Apr 21, 2019

Might worth splitting everything related to PickTableLayout to a separate PR given @mbasmanova is working on it. There is no logical conflict but better avoid conflicts as early as possible.

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • "Simplify unconditional PickLayout" and "Inline pushPredicateIntoTableScan" Looks good.

  • "Add tableLayoutHandle and transactionHandle to tableHandle"

Generally looks good. This looks like part of trinodb/trino@47ce7e6 but get split and only adding TableLayoutHandle and TransactionHandle to TableHandle.

Now since we don't remove TableLayoutHandle from TableScanNode and other places, there is an interesting question that for object can both holds TableHandle and TableLayoutHandle, should we check their layout is the same? -- Or maybe we can squash these split commits before merge so this is no longer a problem ? :)

Also, I would use "TableLayoutHandle", "TransactionHandle" and "TableHandle"

@mbasmanova
Copy link
Contributor

Looks reasonable. I have a question though. pushPredicateIntoTableScan uses Metadata.getLayout API:

Optional<TableLayoutResult> getLayout(Session session, TableHandle tableHandle, Constraint<ColumnHandle> constraint, Optional<Set<ColumnHandle>> desiredColumns);

which uses the following connector API:

    /**
     * Return a list of table layouts that satisfy the given constraint.
     * <p>
     * For each layout, connectors must return an "unenforced constraint" representing the part of the constraint summary that isn't guaranteed by the layout.
     */
    List<ConnectorTableLayoutResult> getTableLayouts(
            ConnectorSession session,
            ConnectorTableHandle table,
            Constraint<ColumnHandle> constraint,
            Optional<Set<ColumnHandle>> desiredColumns);

Is it correct to assume that pushed down predicate is encoded in ConnectorTableLayoutHandle? If so, are you expecting to keep it that way or do you envision changes. If you expected changes, would you describe what they might be?

@hellium01
Copy link
Contributor Author

hellium01 commented Apr 22, 2019

We generally should hide ConnectorTableLayoutHandle in ConnectorTableHandle and use ConnectorTableHandle in all metadata API calls. Engine only needs to know that ConnectorTableHandle is updated after the pushdown. It is up to connector to decide what/whether it stores it in a sub data structure or not.

But it is discussible if we want to go that far since what we gain is only code cleanness.

If we allow connector to participate query planning, all these getLayout calls won't be necessary. Connector basically just needs a rule (or a default rule for connectors that does want to implement itself) to take in a subtree + constrained trait set and return a subtree + provided trait set. However, we need to come up with a plan to support both old connector API and new connector rule based planning. Today, we used a lot of specific data structure to bypass the problem connector cannot rewrite query plan (MetadataDelete can be a good example). But it will be a long way to go if we want to move everything over.

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I skip "Use tableHandle in getInfo" to "Remove TableLayoutHandle in TableScanNode" since it looks like continues to backport trinodb/trino@47ce7e6

The following 3 commits looks reasonable:

  • "Rename TableLayout to TableProperties"
  • "Rename ConnectorTableLayout to ConnectoLayoutProperties"
  • "Rename TableLayout to TableProperties in metadata"

Although I feel the third commit should be squashed to the first commit :)

I also agree with @highker 's comment on #12674 (comment) (do we really need to alter the name?) -- or maybe rename it as TableLayoutProperties ?

Copy link
Contributor Author

@hellium01 hellium01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing, I will address these comments and as synced offline, will remove the renaming part.

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Use tableHandle in getInfo"

Looks good. Note the semantic is now interesting:

  • What does TableHandle.getLayout is empty mean?
  • Should MetadataManager.getLayout always return a layout?

Let's discuss in person and add necessary comments. Otherwise it will be difficult for us to maintain the code (and the open source community :) )


Update:

What does TableHandle.getLayout is empty mean?

Now ......

In the future empty should mean the default access path.

@hellium01 hellium01 force-pushed the RefactorPickLayout branch 2 times, most recently from d990f5e to 4e1a6c2 Compare April 25, 2019 09:22
Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Use tableHandle in metadataDelete" Looks good % one nit.

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Use tableHandle in getLayout" . Looks good with a few small questions.

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Removed since the comment somehow not published, see next comment instead)

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Use tableHandle in getAlternativeTableHandle". Looks good.

@wenleix
Copy link
Contributor

wenleix commented Apr 25, 2019

  • "Remove TableLayoutHandle in TableScanNode" Looks good.
  • "Use tableHandle in TableProperties". The commit message needs to be updated since we don't have TableProperties anymore :)

@wenleix
Copy link
Contributor

wenleix commented Apr 26, 2019

Per offline discussion, let's try to squash from "Add tableLayoutHandle and transactionHandle to tableHandle" to "Remove TableLayoutHandle in TableScanNode". Remember to have a backup before squash :)

Copy link
Contributor Author

@hellium01 hellium01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing, comments has been addressed.

@@ -74,7 +74,7 @@

Optional<TableHandle> getTableHandleForStatisticsCollection(Session session, QualifiedObjectName tableName, Map<String, Object> analyzeProperties);

List<TableLayoutResult> getLayouts(Session session, TableHandle tableHandle, Constraint<ColumnHandle> constraint, Optional<Set<ColumnHandle>> desiredColumns);
Optional<TableLayoutResult> getLayout(Session session, TableHandle tableHandle, Constraint<ColumnHandle> constraint, Optional<Set<ColumnHandle>> desiredColumns);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in previous commit ("Remove support for multiple layouts").

@hellium01 hellium01 force-pushed the RefactorPickLayout branch 3 times, most recently from 210128f to fc1261b Compare April 27, 2019 00:49
Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits. Will continue review.

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. All comments are minor. Thanks for clean it up!

Let's quickly go through the comments marked with "talk in person".

Minor comments about commit message:

  • "Hide TableLayout from engine"
    Remember to wrap the commit message to 72 characters :)

  • "Use TableHandle in TableLayout"
    Technically TableHandle is not used in TableLayout -- TableLayout now contains the information to construct a TableHandle, but it doesn't store TableHandle as a field.
    So what about "Remove TableLayoutHandle from TableLayout" ?

session,
node.getTable(),
constraint,
Optional.of(node.getOutputSymbols().stream()
.map(node.getAssignments()::get)
.collect(toImmutableSet())));

if (layouts.isEmpty()) {
return ImmutableList.of(new ValuesNode(idAllocator.getNextId(), node.getOutputSymbols(), ImmutableList.of()));
if (layout.getLayout().getPredicate().isNone()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused about the logic from line 301-303, although I understand the semantic is correct since it's doing the same thing as before. Let's quickly discuss in person :)

Looks like TableLayout. getPredicate.isNone() somewhat indicates the scan result is empty? (So use an empty ValuesNode to replace TableScanNode )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, before we even pushdown, we will eliminate unnecessary table Scan node. An example is if we are reading from a view where intersect of predicate from the view and from the query will yield empty result.

return context.defaultRewrite(node);
}
return new MetadataDeleteNode(idAllocator.getNextId(), delete.get().getTarget(), Iterables.getOnlyElement(node.getOutputSymbols()), tableScanNode.getLayout().get());
// delete target is always the table in source tableScanNode, see BeginTableWrite for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought i understand the code but I don't quite understand this comment. Let's quickly discuss in person :)

martint and others added 5 commits April 29, 2019 13:23
More accurately, constructs a set of alternate plans with the
filter pushed into the table scan based on available table layouts.
Currently multiple layout is used for streaming aggregation/pre-sorted
layout selection for connector that supported it. We should have better
way to support this functionality once we start to allow connector
participate planning. We now expects connector to return at least one
layout.
It's just selecting a layout for the raw table scan, so
no need to go through the logic for pushing a predicate etc.
@hellium01 hellium01 force-pushed the RefactorPickLayout branch 2 times, most recently from 5d4b78f to 2f22bff Compare April 29, 2019 21:22
This commit hide tableLayout from engine. Since connector will
rewrite the sub plan and use an new TableHandle to represent the
current provided view of the data set, layout is no longer
a useful concept.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants