-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Iceberg OPTIMIZE with WHERE casting timestamp_tz column to a date #12918
Support Iceberg OPTIMIZE with WHERE casting timestamp_tz column to a date #12918
Conversation
...trino-main/src/main/java/io/trino/sql/planner/iterative/rule/PushPredicateIntoTableScan.java
Outdated
Show resolved
Hide resolved
* {@link TimestampWithTimeZoneType}. Such unwrap would not be monotonic. Within Iceberg, we know | ||
* that {@link TimestampWithTimeZoneType} is always in UTC zone (point in time, with no time zone information), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future: I wonder if we can communicate a constraint on a type from connector to engine so standard optimizer can handle that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could do that as part of io.trino.spi.connector.ColumnMetadata
.
Or, we could use a dedicate type: #2273
A type may be better because this limitation -- storing point in time only, without time zone -- isn't really specific to Iceberg.
9be3f52
to
412e967
Compare
Map<ColumnHandle, Domain> enforcedDomains = enforcedDomainsBuilder.buildOrThrow(); | ||
checkArgument( | ||
enforcedDomains.size() + unenforcedDomains.size() == predicateDomains.size(), | ||
"Enforced tuple domain cannot be determined. Connector returned an unenforced TupleDomain %s that contains columns not in predicate %s.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @martint
@@ -437,9 +437,8 @@ public static TupleDomain<ColumnHandle> computeEnforced(TupleDomain<ColumnHandle | |||
if (unenforcedDomains.containsKey(predicateColumnHandle)) { | |||
Domain unenforcedDomain = unenforcedDomains.get(predicateColumnHandle); | |||
checkArgument( | |||
predicateDomain.equals(unenforcedDomain), | |||
"Enforced tuple domain cannot be determined. The connector is expected to enforce the respective domain entirely on none, some, or all of the column. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @martint
412e967
to
1ad18f7
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/ConstraintExtractor.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/ConstraintExtractor.java
Show resolved
Hide resolved
When `IcebergMetadata.applyFilter` is invoked with `Constraint` that has no useful summary (`TupleDomain`) and carries only expression or functional predicate, we can short-circuit the method execution.
Also improve wording.
`unwrapTimestampToDateCast` assumes the target type is `DATE` (and is invoked only when it is), so passing `targetType` is redundant.
The check dates back to the time when `TupleDomain` was the only information passed to the connector in the `ConnectorMetadata.applyFilter`. Its purpose was to ensure the connector does not erroneously return some new column constraints (`Domain` objects) in `ConstraintApplicationResult.remainingFilter`. With expression-based pushdown, the check is no longer valid. A connector may be able to translate `Constraint.expression` (or part thereof) into a `Domain` / `TupleDomain` and then enforce it, or return such simplified representation as a remaining `TupleDomain` (`ConstraintApplicationResult.remainingFilter`).
1ad18f7
to
d6f2da1
Compare
(just rebased to resolve a conflict with #12911) |
@findepi Just a quick question: -- regardless of session zone
ALTER TABLR iceberg_table EXECUTE optimize WHERE CAST(c_timestamp_tz AS date) > a_date_constant This wouldn't work for Iceberg time partitions since c_timestamp_tz could be in any time zone right? I thought we had discussed making it look like: -- regardless of session zone
ALTER TABLR iceberg_table EXECUTE optimize WHERE CAST(c_timestamp_tz AT TIME ZONE 'UTC' AS date) > a_date_constant To force the same timezone? Side note: that being said, it looks like if we just do |
in Iceberg,
Yep, this is already supported, but may be less intuitive to users. fortunately, it's or, not xor. People can choose whichever they prefer. |
I think we should document this so users know how to leverage this... |
Yes, definitely prefer |
Follow-up to #12795
Among other things, that PR added support for
This PR adds support for
Further enhances #7905
Fixes #12362