-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove redundant check #23776
Remove redundant check #23776
Conversation
Grouping keys are always part of output symbols
31cc80a
to
0af76a5
Compare
@@ -233,8 +232,7 @@ public boolean producesDistinctRows() | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is probably wrong, e.g: when there are multiple grouping sets.
Also, this check is too probably too strict, e.g: with regards to hash symbols. cc @Praveen2112
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is correct, but is relying on the assumption that could in theory change at some point in the future. I can't think of a way to write a unit test that would catch that change in assumptions (an assert groupingSets.getGroupingKeys().containsAll(outputs)
might work, except asserts are disallowed by the style guide), but it's probably worth adding adding an inline comment to clarify the assumption.
Approving in advance since the requested comment change is minor. I agree with your comment though that the check should also inspect the grouping set count for correctness.
I really think we should use plan (derived) properties for this rather then explicitly query |
I see the argument for using derived properties, but the performance for property derivations is extremely slow. I don't think it's practical to use it in their current form as part of an iterative optimizer rule. |
Have you investigate why it's the case? The advantage of property derivation is that it can represent properties of subset of symbols, e.g: when hash symbol is not present or subset of symbols was already distinct before aggregation |
Property derivations traverses the whole sub-tree underneath the given node, so if you do in an iterative optimizer you end up with exponential runtime very quickly. |
I don't think visiting subtreee is the biggest problem here. Did you check if it's a problem with metadata fetching? |
grouping keys are always part of output symbols