Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant check #23776

Merged
merged 1 commit into from
Oct 16, 2024
Merged

Remove redundant check #23776

merged 1 commit into from
Oct 16, 2024

Conversation

sopel39
Copy link
Member

@sopel39 sopel39 commented Oct 14, 2024

grouping keys are always part of output symbols

        ImmutableList.Builder<Symbol> outputs = ImmutableList.builder();
        outputs.addAll(groupingSets.getGroupingKeys());

Grouping keys are always part of output symbols
@@ -233,8 +232,7 @@ public boolean producesDistinctRows()
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is probably wrong, e.g: when there are multiple grouping sets.
Also, this check is too probably too strict, e.g: with regards to hash symbols. cc @Praveen2112

Copy link
Member

@pettyjamesm pettyjamesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is correct, but is relying on the assumption that could in theory change at some point in the future. I can't think of a way to write a unit test that would catch that change in assumptions (an assert groupingSets.getGroupingKeys().containsAll(outputs) might work, except asserts are disallowed by the style guide), but it's probably worth adding adding an inline comment to clarify the assumption.

Approving in advance since the requested comment change is minor. I agree with your comment though that the check should also inspect the grouping set count for correctness.

@sopel39
Copy link
Member Author

sopel39 commented Oct 16, 2024

This change is correct, but is relying on the assumption that could in theory change at some point in the future. I can't think of a way to write a unit test that would catch that change in assumptions (an assert groupingSets.getGroupingKeys().containsAll(outputs) might work, except asserts are disallowed by the style guide), but it's probably worth adding adding an inline comment to clarify the assumption.

I really think we should use plan (derived) properties for this rather then explicitly query AggregationNode

@sopel39 sopel39 merged commit bf536a8 into trinodb:master Oct 16, 2024
92 checks passed
@sopel39 sopel39 deleted the ks/remove_redundant branch October 16, 2024 15:53
@github-actions github-actions bot added this to the 462 milestone Oct 16, 2024
@pettyjamesm
Copy link
Member

I really think we should use plan (derived) properties for this rather then explicitly query AggregationNode

I see the argument for using derived properties, but the performance for property derivations is extremely slow. I don't think it's practical to use it in their current form as part of an iterative optimizer rule.

@sopel39
Copy link
Member Author

sopel39 commented Oct 18, 2024

I see the argument for using derived properties, but the performance for property derivations is extremely slow. I don't think it's practical to use it in their current form as part of an iterative optimizer rule.

Have you investigate why it's the case? The advantage of property derivation is that it can represent properties of subset of symbols, e.g: when hash symbol is not present or subset of symbols was already distinct before aggregation

@pettyjamesm
Copy link
Member

Property derivations traverses the whole sub-tree underneath the given node, so if you do in an iterative optimizer you end up with exponential runtime very quickly.

@sopel39
Copy link
Member Author

sopel39 commented Oct 21, 2024

Property derivations traverses the whole sub-tree underneath the given node, so if you do in an iterative optimizer you end up with exponential runtime very quickly.

I don't think visiting subtreee is the biggest problem here. Did you check if it's a problem with metadata fetching?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants