-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: use int64 instead of uint64 for PrecisionTimestamp(Tz) literal value #668
fix: use int64 instead of uint64 for PrecisionTimestamp(Tz) literal value #668
Conversation
…alue to allow timestamps to refer to time before epoch BREAKING CHANGE: PrecisionTimestamp(Tz) literal's value is now int64 instead of uint64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, good catch 😅
I think we should probably update content here: So that it is very clear the valid range of timestamps. Right now it is much more vague than other types. |
Would you have a proposal? Happy to include in this PR if so, otherwise I’d suggest doing it as followup to get this break in as soon as possible. (Though I see I need to update the page you linked for uint->int, thanks!) |
@westonpace @jacques-n et al, any further comments or would this be good to go? (I'd like to get it in next release if possible..) |
I seem to recall this being intentional. The PR to introduce precision timestamps originally stated |
@@ -41,8 +41,8 @@ Compound type classes are type classes that need to be configured by means of a | |||
| NSTRUCT<N:T1,...,N:Tn> | **Pseudo-type**: A struct that maps unique names to value types. Each name is a UTF-8-encoded string. Each value can have a distinct type. Note that NSTRUCT is actually a pseudo-type, because Substrait's core type system is based entirely on ordinal positions, not named fields. Nonetheless, when working with systems outside Substrait, names are important. | n/a | |||
| LIST<T> | A list of values of type T. The list can be between [0..2,147,483,647] values in length. | `repeated Literal`, all types matching T | |||
| MAP<K, V> | An unordered list of type K keys with type V values. Keys may be repeated. While the key type could be nullable, keys may not be null. | `repeated KeyValue` (in turn two `Literal`s), all key types matching K and all value types matching V | |||
| PRECISIONTIMESTAMP<P> | A timestamp with fractional second precision (P, number of digits) 0 <= P <= 9. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. | `uint64` microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 (in an unspecified timezone) | |||
| PRECISIONTIMESTAMPTZ<P> | A timezone-aware timestamp, with fractional second precision (P, number of digits) 0 <= P <= 9. Similar to aware datetime in Python. | `uint64` microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 UTC | |||
| PRECISIONTIMESTAMP<P> | A timestamp with fractional second precision (P, number of digits) 0 <= P <= 9. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. | `int64` seconds, milliseconds, microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 (in an unspecified timezone) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the discussion on ranges, I'm and P, I'm struggling with this description. How do I do a literal if P = 1,2,4,5,7 or 8? I'm going to approve this PR but we should open a follow-up to address the inconsistency here. Either:
P can only be 0,3,6 or 9
OR
P can be 0..9 and we need additional specification around literal data.
This allows timestamps to refer to time before epoch, and aligns with other systems (Spark, DataFusion/Arrow, DuckDB, Postgres, Parquet at least)
BREAKING CHANGE: PrecisionTimestamp(Tz) literal's value is now int64 instead of uint64
In #659 I created this
PrecisionTimestamp
message to include the precision, but unfortunately copied the value's type as-is, not realizing the unsignedness is a problem.