-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove futile sort operations in sub queries #759
Comments
It makes sense. However, instead of removing sort I would add |
Ah... adding LIMIT is incorrect. Please go ahead with your origin approach and remove sort. |
One case that i missed previously was that if there is a limit in the sub query on top of order by result then, we cannot remove it. |
We can go for rule based optimizer ( as other visitor based optimizer might move a similar rule based one ) and these rules has to be placed post the merging of |
I'm copying a comment I made in #807 so that it doesn't get lost, as it applies to the general approach to how we solve this. The fundamental problem is that there's currently no way for the engine to reason about whether a sort is required (i.e., was it added by the planner to enforce some physical organization, was it a result of the user typing ORDER BY in their query in a place where it matters, or was it the result of the user typing ORDER BY in a place where it doesn't matter). The SQL spec says that ORDER BY is relevant only for the immediate query expression that contains it. Since ORDER BY logically evaluates before FETCH FIRST (LIMIT) and OFFSET, those two operations in a query expression are sensitive the ORDER BY. If the ORDER BY was in a subquery, then it wouldn't. So: SELECT * FROM t ORDER BY c LIMIT 1 and SELECT * FROM (
SELECT * FROM t ORDER BY c
)
LIMIT 1 have the same query plan:
yet, the Sort is required in the first one but not in the second one. I think a better approach for now, instead of attempting to implement this as an optimization rule, is for the planner skip adding sort nodes in those cases altogether. The analyzer could tag the ORDER BY as irrelevant and the planner would just ignore it. To ease the transition for the cases where people are relying on that behavior, we should:
|
Here's a prototype of the approach I described above: #818 |
Add an optimization rule to remove unnecessary SORT in inner queries
Sort operation is unnecessary and results in extra exchange and sorts.
We should write an optimizer which removes these inner sorts. This seems slightly similar to #551, prestodb/presto#7781.
@martint @electrum @Praveen2112 If this makes sense, i can pick this up.
The text was updated successfully, but these errors were encountered: