Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use int64 instead of uint64 for PrecisionTimestamp(Tz) literal value #668

Merged
merged 2 commits into from
Jul 28, 2024

Conversation

Blizzara
Copy link
Contributor

This allows timestamps to refer to time before epoch, and aligns with other systems (Spark, DataFusion/Arrow, DuckDB, Postgres, Parquet at least)

BREAKING CHANGE: PrecisionTimestamp(Tz) literal's value is now int64 instead of uint64

In #659 I created this PrecisionTimestamp message to include the precision, but unfortunately copied the value's type as-is, not realizing the unsignedness is a problem.

…alue to allow timestamps to refer to time before epoch

BREAKING CHANGE: PrecisionTimestamp(Tz) literal's value is now int64 instead of uint64
westonpace
westonpace previously approved these changes Jul 23, 2024
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, good catch 😅

@jacques-n
Copy link
Contributor

I think we should probably update content here:
https://substrait.io/types/type_classes/

So that it is very clear the valid range of timestamps. Right now it is much more vague than other types.

@Blizzara
Copy link
Contributor Author

I think we should probably update content here: https://substrait.io/types/type_classes/

So that it is very clear the valid range of timestamps. Right now it is much more vague than other types.

Would you have a proposal? Happy to include in this PR if so, otherwise I’d suggest doing it as followup to get this break in as soon as possible.

(Though I see I need to update the page you linked for uint->int, thanks!)

@Blizzara
Copy link
Contributor Author

@westonpace @jacques-n et al, any further comments or would this be good to go? (I'd like to get it in next release if possible..)

@westonpace
Copy link
Member

So that it is very clear the valid range of timestamps. Right now it is much more vague than other types.

I seem to recall this being intentional. The PR to introduce precision timestamps originally stated A naive timestamp within [1000-01-01 00:00:00.000000000..9999-12-31 23:59:59.999999999] but this is not possible with nanosecond resolution on systems that store timestamps as 8 byte integers (some store them as a pair of 8 byte integers for second / subsecond). Rather than make the range based on the precision we decided to just be vague about the range.

@@ -41,8 +41,8 @@ Compound type classes are type classes that need to be configured by means of a
| NSTRUCT<N:T1,...,N:Tn> | **Pseudo-type**: A struct that maps unique names to value types. Each name is a UTF-8-encoded string. Each value can have a distinct type. Note that NSTRUCT is actually a pseudo-type, because Substrait's core type system is based entirely on ordinal positions, not named fields. Nonetheless, when working with systems outside Substrait, names are important. | n/a
| LIST<T> | A list of values of type T. The list can be between [0..2,147,483,647] values in length. | `repeated Literal`, all types matching T
| MAP<K, V> | An unordered list of type K keys with type V values. Keys may be repeated. While the key type could be nullable, keys may not be null. | `repeated KeyValue` (in turn two `Literal`s), all key types matching K and all value types matching V
| PRECISIONTIMESTAMP&lt;P&gt; | A timestamp with fractional second precision (P, number of digits) 0 <= P <= 9. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. | `uint64` microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 (in an unspecified timezone)
| PRECISIONTIMESTAMPTZ&lt;P&gt; | A timezone-aware timestamp, with fractional second precision (P, number of digits) 0 <= P <= 9. Similar to aware datetime in Python. | `uint64` microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 UTC
| PRECISIONTIMESTAMP&lt;P&gt; | A timestamp with fractional second precision (P, number of digits) 0 <= P <= 9. Does not include timezone information and can thus not be unambiguously mapped to a moment on the timeline without context. Similar to naive datetime in Python. | `int64` seconds, milliseconds, microseconds or nanoseconds since 1970-01-01 00:00:00.000000000 (in an unspecified timezone)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the discussion on ranges, I'm and P, I'm struggling with this description. How do I do a literal if P = 1,2,4,5,7 or 8? I'm going to approve this PR but we should open a follow-up to address the inconsistency here. Either:

P can only be 0,3,6 or 9
OR
P can be 0..9 and we need additional specification around literal data.

@jacques-n jacques-n merged commit da3c74e into substrait-io:main Jul 28, 2024
13 checks passed
@Blizzara Blizzara deleted the avo/precision-timestamp-int64 branch July 29, 2024 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants