Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: FromUnixtimeOperatorTransformer possibly built on false supposisions. #459

Closed
findinpath opened this issue Oct 5, 2023 · 2 comments · Fixed by #467
Closed

Bug: FromUnixtimeOperatorTransformer possibly built on false supposisions. #459

findinpath opened this issue Oct 5, 2023 · 2 comments · Fixed by #467

Comments

@findinpath
Copy link
Contributor

The FromUnixtimeOperatorTransformer from https://github.com/linkedin/coral/pull/426/files#diff-e26e325f65fb1f3341468a92d2e1051ee2b8b94b56c74203054aca3b2c790e4e
seems to be built on false premises.

The function https://trino.io/docs/current/functions/datetime.html#from_unixtime returns a timestamp(3) with time zone

However, the FromUnixtimeOperatorTransformer transforms from_unixtime expressions to
FORMAT_DATETIME(FROM_UNIXTIME(10000), 'yyyy-MM-dd HH:mm:ss') which is a varchar
This is causing the failure of statements like the following:

select 
  CAST(
    CAST(
      "at_timezone" (
        "format_datetime" (
          "from_unixtime" (
            CAST(
              "test_from_utc_timestamp_source"."source_float" AS DOUBLE
            )
          ),
          'yyyy-MM-dd HH:mm:ss'
        ),
        "$canonicalize_hive_timezone_id" ('America/Los_Angeles')
      ) AS TIMESTAMP (3)
    ) AS VARCHAR (65535)
  ) AS "ts_float"
from test_from_utc_timestamp_source 

with the exception:

Caused by: io.trino.spi.TrinoException: Unexpected parameters (varchar, varchar) for function at_timezone. Expected: at_timezone(timestamp(p) with time zone, interval day to second), at_timezone(timestamp(p) with time zone, varchar(x))

Related PR: #426

@findinpath
Copy link
Contributor Author

Steps to reproduce the issue

hive

CREATE TABLE test_from_utc_timestamp_source (source_float float);
CREATE VIEW test_from_utc_timestamp_view AS SELECT CAST(from_utc_timestamp(source_float, 'America/Los_Angeles') AS STRING) ts_float FROM test_from_utc_timestamp_source;

select * from test_from_utc_timestamp_view;

-- 1970-01-30 21:30:00

trino

insert into hive.default.test_from_utc_timestamp_source values (2592000.0);

trino version 420 while using coral version 2.2.9

SELECT * FROM hive.default.test_from_utc_timestamp_view;
Query 20231013_184405_00000_fa5un failed: line 1:15: Failed analyzing stored view 'hive.default.test_from_utc_timestamp_view': line 1:18: Unexpected parameters (varchar, varchar) for function at_timezone. Expected: at_timezone(timestamp(p) with time zone, varchar(x)), at_timezone(timestamp(p) with time zone, interval day to second)

trinoSql in ViewReaderUtil

SELECT CAST(CAST("at_timezone"("format_datetime"("from_unixtime"(CAST("test_from_utc_timestamp_source"."source_float" AS DOUBLE)), 'yyyy-MM-dd HH:mm:ss'), "$canonicalize_hive_timezone_id"('America/Los_Angeles')) AS TIMESTAMP(3)) AS VARCHAR(65535)) AS "ts_float"
FROM "default"."test_from_utc_timestamp_source" AS "test_from_utc_timestamp_source"

trino version 420 while using coral version 2.1.5

trino> SELECT * FROM hive.default.test_from_utc_timestamp_view;
        ts_float         
-------------------------
 1970-01-30 16:00:00.000 

trinoSql in ViewReaderUtil

SELECT CAST(CAST("at_timezone"("from_unixtime"(CAST("test_from_utc_timestamp_source"."source_float" AS DOUBLE)), "$canonicalize_hive_timezone_id"('America/Los_Angeles')) AS TIMESTAMP(3)) AS VARCHAR(65535)) AS "ts_float"
FROM "default"."test_from_utc_timestamp_source" AS "test_from_utc_timestamp_source"

https://github.com/trinodb/trino/blob/fa01166036c540a498af2beeedcf01fdc6d9e7c0/plugin/trino-hive/src/main/java/io/trino/plugin/hive/ViewReaderUtil.java#L246

2.1.5

CAST(CAST("at_timezone"("from_unixtime"...

2.2.9

CAST(CAST("at_timezone"("format_datetime"("from_unixtime"

format_datetime shouldn't be there.

@findinpath
Copy link
Contributor Author

FromUtcTimestampOperatorTransformer is playing with both:

  • from_unixtime_nanos
  • from_unixtime

from_unixtime_nanos is a Trino specific function https://trino.io/docs/current/functions/datetime.html?highlight=from_unixtime_nanos#from_unixtime_nanos

Translation of from_unixtime_nanos works as expected.

from_unixtime is:

from_unixtime(bigint unixtime[, string pattern])

Converts a number of seconds since epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current time zone(using config "hive.local.time.zone") using the specified pattern. If the pattern is missing the default is used ('uuuu-MM-dd HH:mm:ss' or yyyy-MM-dd HH:mm:ss'). Example: from_unixtime(0)=1970-01-01 00:00:00 (hive.local.time.zone=Etc/GMT)

Given the fact that from_unixtime is a Hive function, it gets its own FromUnixtimeOperatorTransformer and that's where our problem actually occurs.
When we transform from_utc_timestamp via FromUtcTimestampOperatorTransformer we're creating (in case of dealing with floats) a nested from_unixtime call which gets unintentionally transformed as well to "at_timezone"("from_unixtime" .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant