Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indicate progress only if limit is applied #618

Merged
merged 1 commit into from
Apr 12, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -356,10 +356,12 @@ public Optional<LimitApplicationResult<ConnectorTableHandle>> applyLimit(Connect
{
MemoryTableHandle table = (MemoryTableHandle) handle;

if (!table.getLimit().isPresent() || limit < table.getLimit().getAsLong()) {
table = new MemoryTableHandle(table.getId(), OptionalLong.of(limit));
if (table.getLimit().isPresent() && table.getLimit().getAsLong() <= limit) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange when given limit is supported we return Optional.empty() like in the case where limit is not supported.

It is not practical case, but in case where you have subsequent limits then only first of the would be pushed down to connector. However this example gets more practical in case of predicate pushdown.

IMO we should handle that in optimizer (engine) code, instead requiring all connectors to have if like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange when given limit is supported we return Optional.empty() like in the case where limit is not supported.

Optional.empy() just means "applying the provided limit has no effect (so don't try to do it again)". I don't think we need to distinguish between supported and not supported. It makes for awkward implementations that never return Optional.empty() and indicate no progress with a special flag in the object they return.

In the long term (when we connectors can provide custom Rule instances), this will be replaced by a Rule. Connectors would indicate they support limit pushdown by providing a rule, and the rule will return Rule.Result.empty() to indicate it had no effect or couldn't be applied in that particular case.

It is not practical case, but in case where you have subsequent limits then only first of the would be pushed down to connector. However this example gets more practical in case of predicate pushdown.

Subsequent limits will be pushed if they are smaller than the current limit. If, for some reason, you end up with a larger limit on top of a table scan that has a smaller limit applied, the right way to fix this is for the engine to eliminate the Limit after determining that the max number of rows produced by the TableScan is going to be smaller, similar to how we do it in #441

IMO we should handle that in optimizer (engine) code, instead requiring all connectors to have if like that.

How do you suggest we do that? There has to be a way for the connector to signal that the "applying the provided limit has no effect (so don't try to do it again)", or the optimizer will loop forever. An alternative is to introduce machinery like we have for Rule (patterns, etc), but that would be a lot of work and would probably require duplicating a lot of code due to limitations of what can go in the SPI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you suggest we do that?

I think rule could get table property before and after applyLimit. Then it could derive that returned limit is the same as previously. However, currently there is no such thing as LimitProperty so it might just not be worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be nice if the engine could handle this, otherwise it's more difficult and error prone to write connectors. But it sounds like we don't have a feasible way to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LimitProperty sounds good to me.

Consider an example that connector knows a lot about its tables. It knows things like tables cardinality, partitioning. For a table with a lower cardinality than limit, connector in first rule application would need to return Optional.empty() (saying no progress), but for engine it looks like connector is not capable to push down limit. In regards to LIMIT is not a big deal, however in case of partitioning or other SQL fragments it might affect query latency.

In my opinion if we are going to push down some property to connector, we need to have an ability for connector to say what are actual properties. The there is no need to even call applyLimit in case where required properties are already satisfied with actual properties.

An alternative is to introduce machinery like we have for Rule (patterns, etc),

We do not need such machinery. We can handle this manually in Rule code. Like patterns that are too complicated for pattern matching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be nice if the engine could handle this[...]. But it sounds like we don't have a feasible way to do that.

Let's have a test. It might not catch all of the cases, but it still may catch some.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LimitProperty, or rather, "max row count" or "max cardinality", would work for the case where the limit is guaranteed by the connector. However, if the connector can't guarantee a limit, such as a when pushing the limit to a parallel database, it wouldn't be able to describe the "max cardinality" of the derived table, so the engine would have no way of knowing whether it's safe to call applyLimit again.

The other downside of relying on that property is that it requires connectors to implement two seemingly disconnected APIs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a test. It might not catch all of the cases, but it still may catch some.

There's already a test that would catch this if connectors implemented it wrong: AbstractTestIntegrationSmokeTest.testLimit().

return Optional.empty();
}

return Optional.of(new LimitApplicationResult<>(table, true));
return Optional.of(new LimitApplicationResult<>(
new MemoryTableHandle(table.getId(), OptionalLong.of(limit)),
true));
}
}