Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: include precision parameter in timestamp types #594

Merged
merged 14 commits into from
Feb 22, 2024
Merged
49 changes: 47 additions & 2 deletions extensions/functions_datetime.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ scalar_functions:
* ISO_YEAR Return the ISO 8601 week-numbering year. First week of an ISO year has the majority (4 or more) of
its days in January.
* US_YEAR Return the US epidemiological year. First week of US epidemiological year has the majority (4 or more)
of its days in January. Last week of US epidemiological year has the year's last Wednesday in it. US
of its days in January. Last week of US epidemiological year has the year's last Wednesday in it. US
epidemiological week starts on Sunday.
* QUARTER Return the number of the quarter within the year. January 1 through March 31 map to the first quarter,
April 1 through June 30 map to the second quarter, etc.
Expand All @@ -32,6 +32,7 @@ scalar_functions:
* SECOND Return the second (0-59).
* MILLISECOND Return number of milliseconds since the last full second.
* MICROSECOND Return number of microseconds since the last full millisecond.
* NANOSECOND Return number of nanoseconds since the last full microsecond.
* SUBSECOND Return number of microseconds since the last full second of the given timestamp.
westonpace marked this conversation as resolved.
Show resolved Hide resolved
* UNIX_TIME Return number of seconds that have elapsed since 1970-01-01 00:00:00 UTC, ignoring leap seconds.
* TIMEZONE_OFFSET Return number of seconds of timezone offset to UTC.
Expand All @@ -57,7 +58,7 @@ scalar_functions:
* MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK return values in range 0-52

The indexing option must be specified when the component is QUARTER, MONTH, DAY, DAY_OF_YEAR,
MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, or US_WEEK. The
MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, or US_WEEK. The
indexing option cannot be specified when the component is YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND,
MILLISECOND, MICROSECOND, SUBSECOND, UNIX_TIME, or TIMEZONE_OFFSET.

Expand All @@ -76,6 +77,17 @@ scalar_functions:
description: Timezone string from IANA tzdb.
value: string
return: i64
- args:
- name: component
options: [ YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND,
MILLISECOND, MICROSECOND, NANOSECOND, SUBSECOND, UNIX_TIME, TIMEZONE_OFFSET ]
description: The part of the value to extract.
- name: x
value: precision_timestamp_tz<P1>
- name: timezone
description: Timezone string from IANA tzdb.
value: string
return: i64
- args:
- name: component
options: [ YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND,
Expand All @@ -84,6 +96,14 @@ scalar_functions:
- name: x
value: timestamp
return: i64
- args:
- name: component
options: [ YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND,
MILLISECOND, MICROSECOND, NANOSECOND, SUBSECOND, UNIX_TIME ]
description: The part of the value to extract.
- name: x
value: precision_timestamp<P1>
return: i64
- args:
- name: component
options: [ YEAR, ISO_YEAR, US_YEAR, UNIX_TIME ]
Expand Down Expand Up @@ -112,6 +132,20 @@ scalar_functions:
description: Timezone string from IANA tzdb.
value: string
return: i64
- args:
- name: component
options: [ QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK,
SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, US_WEEK ]
description: The part of the value to extract.
- name: indexing
options: [ ONE, ZERO ]
description: Start counting from 1 or 0.
- name: x
value: precision_timestamp_tz<P1>
- name: timezone
description: Timezone string from IANA tzdb.
value: string
return: i64
- args:
- name: component
options: [ QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK,
Expand All @@ -123,6 +157,17 @@ scalar_functions:
- name: x
value: timestamp
return: i64
- args:
- name: component
options: [ QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK,
SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, US_WEEK ]
description: The part of the value to extract.
- name: indexing
options: [ ONE, ZERO ]
description: Start counting from 1 or 0.
- name: x
value: precision_timestamp<P1>
return: i64
- args:
- name: component
options: [ QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK,
Expand Down
10 changes: 8 additions & 2 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -796,7 +796,8 @@ message Expression {
string string = 12;
bytes binary = 13;
// Timestamp in units of microseconds since the UNIX epoch.
int64 timestamp = 14;
// Deprecated in favor of `precision_timestamp`
int64 timestamp = 14 [deprecated = true];
// Date in units of days since the UNIX epoch.
int32 date = 16;
// Time in units of microseconds past midnight
Expand All @@ -807,10 +808,15 @@ message Expression {
VarChar var_char = 22;
bytes fixed_binary = 23;
Decimal decimal = 24;
// If the precision is 6 or less then this is the microseconds since the UNIX epoch
// If the precision is more than 6 then this is the nanoseconds since the UNIX epoch
uint64 precision_timestamp = 34;
uint64 precision_timestamp_tz = 35;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these have been custom types that'd include the precision? Like Decimal/VarChar below. Or how is the consumer mean to infer the precision? The uint64 presumably means the timestamp value itself

Struct struct = 25;
Map map = 26;
// Timestamp in units of microseconds since the UNIX epoch.
int64 timestamp_tz = 27;
// Deprecated in favor of `precision_timestamp_tz`
int64 timestamp_tz = 27 [deprecated = true];
bytes uuid = 28;
Type null = 29; // a typed null literal
List list = 30;
Expand Down
20 changes: 18 additions & 2 deletions proto/substrait/parameterized_types.proto
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,22 @@ message ParameterizedType {
Type.FP64 fp64 = 11;
Type.String string = 12;
Type.Binary binary = 13;
Type.Timestamp timestamp = 14;
// Deprecated in favor of `ParameterizedPrecisionTimestamp precision_timestamp`
Type.Timestamp timestamp = 14 [deprecated = true];
Type.Date date = 16;
Type.Time time = 17;
Type.IntervalYear interval_year = 19;
Type.IntervalDay interval_day = 20;
Type.TimestampTZ timestamp_tz = 29;
// Deprecated in favor of `ParameterizedPrecisionTimestampTZ precision_timestamp_tz`
Type.TimestampTZ timestamp_tz = 29 [deprecated = true];
Type.UUID uuid = 32;

ParameterizedFixedChar fixed_char = 21;
ParameterizedVarChar varchar = 22;
ParameterizedFixedBinary fixed_binary = 23;
ParameterizedDecimal decimal = 24;
ParameterizedPrecisionTimestamp precision_timestamp = 34;
ParameterizedPrecisionTimestampTZ precision_timestamp_tz = 35;

ParameterizedStruct struct = 25;
ParameterizedList list = 27;
Expand Down Expand Up @@ -88,6 +92,18 @@ message ParameterizedType {
Type.Nullability nullability = 4;
}

message ParameterizedPrecisionTimestamp {
IntegerOption precision = 1;
uint32 variation_pointer = 2;
Type.Nullability nullability = 3;
}

message ParameterizedPrecisionTimestampTZ {
IntegerOption precision = 1;
uint32 variation_pointer = 2;
Type.Nullability nullability = 3;
}

message ParameterizedStruct {
repeated ParameterizedType types = 1;
uint32 variation_pointer = 2;
Expand Down
22 changes: 20 additions & 2 deletions proto/substrait/type.proto
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,22 @@ message Type {
FP64 fp64 = 11;
String string = 12;
Binary binary = 13;
Timestamp timestamp = 14;
// Deprecated in favor of `PrecisionTimestamp precision_timestamp`
Timestamp timestamp = 14 [deprecated = true];
Date date = 16;
Time time = 17;
IntervalYear interval_year = 19;
IntervalDay interval_day = 20;
TimestampTZ timestamp_tz = 29;
// Deprecated in favor of `PrecisionTimestampTZ precision_timestamp_tz`
TimestampTZ timestamp_tz = 29 [deprecated = true];
UUID uuid = 32;

FixedChar fixed_char = 21;
VarChar varchar = 22;
FixedBinary fixed_binary = 23;
Decimal decimal = 24;
PrecisionTimestamp precision_timestamp = 33;
PrecisionTimestampTZ precision_timestamp_tz = 34;

Struct struct = 25;
List list = 27;
Expand Down Expand Up @@ -159,6 +163,20 @@ message Type {
Nullability nullability = 4;
}

message PrecisionTimestamp {
// Defaults to 6
int32 precision = 1;
uint32 type_variation_reference = 2;
Nullability nullability = 3;
}

message PrecisionTimestampTZ {
// Defaults to 6
int32 precision = 1;
uint32 type_variation_reference = 2;
Nullability nullability = 3;
}

message Struct {
repeated Type types = 1;
uint32 type_variation_reference = 2;
Expand Down
20 changes: 18 additions & 2 deletions proto/substrait/type_expressions.proto
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,22 @@ message DerivationExpression {
Type.FP64 fp64 = 11;
Type.String string = 12;
Type.Binary binary = 13;
Type.Timestamp timestamp = 14;
// Deprecated in favor of `ExpressionPrecisionTimestamp precision_timestamp`
Type.Timestamp timestamp = 14 [deprecated = true];
Type.Date date = 16;
Type.Time time = 17;
Type.IntervalYear interval_year = 19;
Type.IntervalDay interval_day = 20;
Type.TimestampTZ timestamp_tz = 29;
// Deprecated in favor of `ExpressionPrecisionTimestampTZ precision_timestamp_tz`
Type.TimestampTZ timestamp_tz = 29 [deprecated = true];
Type.UUID uuid = 32;

ExpressionFixedChar fixed_char = 21;
ExpressionVarChar varchar = 22;
ExpressionFixedBinary fixed_binary = 23;
ExpressionDecimal decimal = 24;
ExpressionPrecisionTimestamp precision_timestamp = 40;
ExpressionPrecisionTimestampTZ precision_timestamp_tz = 41;

ExpressionStruct struct = 25;
ExpressionList list = 27;
Expand Down Expand Up @@ -80,6 +84,18 @@ message DerivationExpression {
Type.Nullability nullability = 4;
}

message ExpressionPrecisionTimestamp {
DerivationExpression precision = 1;
uint32 variation_pointer = 2;
Type.Nullability nullability = 3;
}

message ExpressionPrecisionTimestampTZ {
DerivationExpression precision = 1;
uint32 variation_pointer = 2;
Type.Nullability nullability = 3;
}

message ExpressionStruct {
repeated DerivationExpression types = 1;
uint32 variation_pointer = 2;
Expand Down
58 changes: 30 additions & 28 deletions site/docs/extensions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,34 +58,36 @@ Rather than using a full data type representation, the input argument types (`sh

Every compound function signature must be unique. If two function implementations in a YAML file would generate the same compound function signature, then the YAML file is invalid and behavior is undefined.

| Argument Type | Signature Name |
| -------------------------- | -------------- |
| Required Enumeration | req |
| i8 | i8 |
| i16 | i16 |
| i32 | i32 |
| i64 | i64 |
| fp32 | fp32 |
| fp64 | fp64 |
| string | str |
| binary | vbin |
| boolean | bool |
| timestamp | ts |
| timestamp_tz | tstz |
| date | date |
| time | time |
| interval_year | iyear |
| interval_day | iday |
| uuid | uuid |
| fixedchar&lt;N&gt; | fchar |
| varchar&lt;N&gt; | vchar |
| fixedbinary&lt;N&gt; | fbin |
| decimal&lt;P,S&gt; | dec |
| struct&lt;T1,T2,...,TN&gt; | struct |
| list&lt;T&gt; | list |
| map&lt;K,V&gt; | map |
| any[\d]? | any |
| user defined type | u!name |
| Argument Type | Signature Name |
|---------------------------------|----------------|
| Required Enumeration | req |
| i8 | i8 |
| i16 | i16 |
| i32 | i32 |
| i64 | i64 |
| fp32 | fp32 |
| fp64 | fp64 |
| string | str |
| binary | vbin |
| boolean | bool |
| timestamp | ts |
| timestamp_tz | tstz |
| date | date |
| time | time |
| interval_year | iyear |
| interval_day | iday |
| uuid | uuid |
| fixedchar&lt;N&gt; | fchar |
| varchar&lt;N&gt; | vchar |
| fixedbinary&lt;N&gt; | fbin |
| decimal&lt;P,S&gt; | dec |
| precision_timestamp&lt;P&gt; | pts |
| precision_timestamp_tz&lt;P&gt; | ptstz |
| struct&lt;T1,T2,...,TN&gt; | struct |
| list&lt;T&gt; | list |
| map&lt;K,V&gt; | map |
| any[\d]? | any |
| user defined type | u!name |

#### Examples

Expand Down
Loading
Loading