Use session TimeZone for timestamp_tz column #8680

yanqinghe · 2021-07-27T15:25:46Z

when query timestamp_tz column data from iceberg table, I found that Trino uses the UTC time zone to process these columns, which increases the cost of using queries.
For example:

select ts for table;

specify time zone

select ts from ts AT TIME ZONE 'Asia/Shanghai' from table;

Can we use session time zone to process these fields with time zone so that users do not need to make additional time zone conversion when querying?

There are places in the processing logic of Parquet/ORC file where the time zone can be modified. I don't know whether this is reasonable.

I hope to get better suggestions. Welcome to discuss

findepi · 2021-07-28T13:03:23Z

Iceberg's timestamptz is a point in time (without zone).

Mapping to session zone would have representability consequences as in #5488.

yanqinghe · 2021-08-04T03:12:20Z

Although iceberg's timestamptz format can only represent one time point physically, iceberg connector stipulates that this time point is UTC time zone, that is, it includes the concept of time zone.

We think that there is no problem in processing time with time zone according to the local time zone. After all, if necessary, the time zone can be converted to UTC again.

In the Hive connector, we found that when processing the int96 format in parquet, the timezone in session is used when processing the column with the field type of timestampwithtimezonetype, but when processing the Int64 format in parquet, the UTC time zone is used by default for the column with time zone.

The above implementation can be seen in io.trino.parquet.reader.primitivecolumnreader.

Therefore, we think that we can use session timezone when processing iceberg's timestamptz format.

findepi · 2021-08-04T15:20:10Z

Although iceberg's timestamptz format can only represent one time point physically, iceberg connector stipulates that this time point is UTC time zone, that is, it includes the concept of time zone.

yes, because `timestamp with time zone is the only Trino type that can be used to represent a point in time.
we don't have an instant type, for example (#2273)

We think that there is no problem in processing time with time zone according to the local time zone.

depending on the local time zone and actual data, their indeed might be no problem, or there may be

In the Hive connector, we found that when processing the int96 format in parquet, the timezone in session is used when processing the column with the field type of timestampwithtimezonetype,

that looks like a bug in Hive connector, because some values can be misrepresented, or may hit the #5781 problem

Poncirus · 2021-08-09T09:14:22Z

Although all problems could be resolved by converting to UTC again, problems related to DST are often ambiguous and inexplicit. That is the reason why you would like to use UTC as default time zone. Is my understanding correct?

Like for #5488, if user convert time zone to 'America/Los_Angeles', they will still get the same result.

SELECT INCREMENT,
       t AT TIME ZONE 'America/Los_Angeles'
FROM
  (SELECT INCREMENT,
          from_unixtime(base + INCREMENT * 3600, 'UTC') AS t
   FROM (
         VALUES(1572766200)) t(base)
   CROSS JOIN UNNEST(sequence(0, 3)) u(INCREMENT))

This conversion must be obvious, and should not be done by connector.

findepi · 2021-09-17T06:48:24Z

Note that:

trino> SELECT base, increment, from_unixtime(base + increment * 3600, 'America/Los_Angeles')
    -> FROM (VALUES to_unixtime(TIMESTAMP '2021-11-07 00:30:00 America/Los_Angeles')) t(base)
    -> CROSS JOIN UNNEST(sequence(0, 3)) u(increment);
    base     | increment |                    _col2
-------------+-----------+---------------------------------------------
 1.6362702E9 |         0 | 2021-11-07 00:30:00.000 America/Los_Angeles
 1.6362702E9 |         1 | 2021-11-07 01:30:00.000 America/Los_Angeles
 1.6362702E9 |         2 | 2021-11-07 01:30:00.000 America/Los_Angeles
 1.6362702E9 |         3 | 2021-11-07 02:30:00.000 America/Los_Angeles

two different points in time converted to timestamp with time zone in PT zone are then getting returned to the user with identical representation.

Also, the fact that engine discerns them (even though they look the same) is a bug tracked in #5781

findepi closed this as completed Sep 17, 2021

zielmicha mentioned this issue Aug 10, 2022

Support for reading TIMESTAMP WITH LOCAL TIME ZONE in Hive connector #13595

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use session TimeZone for timestamp_tz column #8680

Use session TimeZone for timestamp_tz column #8680

yanqinghe commented Jul 27, 2021 •

edited

Loading

findepi commented Jul 28, 2021

yanqinghe commented Aug 4, 2021

findepi commented Aug 4, 2021

Poncirus commented Aug 9, 2021

findepi commented Sep 17, 2021

Use session TimeZone for timestamp_tz column #8680

Use session TimeZone for timestamp_tz column #8680

Comments

yanqinghe commented Jul 27, 2021 • edited Loading

findepi commented Jul 28, 2021

yanqinghe commented Aug 4, 2021

findepi commented Aug 4, 2021

Poncirus commented Aug 9, 2021

findepi commented Sep 17, 2021

yanqinghe commented Jul 27, 2021 •

edited

Loading