Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Freshness is not implemented #28

Closed
wrb2 opened this issue Dec 20, 2021 · 9 comments
Closed

Source Freshness is not implemented #28

wrb2 opened this issue Dec 20, 2021 · 9 comments
Assignees

Comments

@wrb2
Copy link

wrb2 commented Dec 20, 2021

When I add source freshness configuration and run dbt source freshness with dbt-trino, I get

Runtime Error in source mv_mantis_mfr (models/staging/sources.yml)
  ERROR: current_timestamp macro not implemented for adapter trino
@findinpath
Copy link
Collaborator

This should be solved pretty soon with the migration to v1.0.0

See 8df0f80

@hovaesco
Copy link
Contributor

@wrborigin dbt-trino v1.0.0 has been released

@wrb2
Copy link
Author

wrb2 commented Dec 21, 2021

I tested it with dbt-trino 1.0, and it seems to have some problem.

sources:
  - name: redacted
    database: redacted
    schema: redacted
    loaded_at_field: approval_date
trino> describe "redacted"."redacted"."redacted" ;
         Column          |     Type     |     Extra     |    Comment   
-------------------------+--------------+---------------+---------------------------
 approval_date           | timestamp(3) |               | 
17:02:40  Database Error in source mv_mantis_mfr (models/staging/sources.yml)
17:02:40    Expected a timestamp value when querying field 'approval_date' of table "redacted"."redacted"."redacted" but received value of type 'str' instead

Any idea what might be wrong?

Even adding "cast(approval_date as timestamp)" to the loaded_at_field doesn't help.

@findinpath findinpath reopened this Dec 21, 2021
@findinpath
Copy link
Collaborator

Please visit the trino web admin UI and paste the full stack trace of the problem.

@wrb2
Copy link
Author

wrb2 commented Dec 21, 2021

Trino UI shows a successful query of

select
      max(approval_date) as max_loaded_at,
      CURRENT_TIMESTAMP as snapshotted_at
    from "redacted"."redacted"."redacted"

If I prepare and describe output of that query, I get

Column Name   |Catalog|Schema|Table|Type                       |Type Size|Aliased|
--------------+-------+------+-----+---------------------------+---------+-------+
max_loaded_at |       |      |     |timestamp(3)               |        8|true   |
snapshotted_at|       |      |     |timestamp(3) with time zone|        8|true   |

so Trino definitely return timestamp.

I remembered I was looking into this for dbt-presto sometime in summer, and the problem was that while Trino returns timestamp, the Python driver somehow makes it a string, and that's what dbt is complaning about.

@findinpath
Copy link
Collaborator

findinpath commented Dec 22, 2021

Database Error in source stg_orders (models/staging/stg_orders.yml)
  Expected a timestamp value when querying field 'CAST(order_date AS timestamp)' of table "memory"."default"."stg_orders" but received value of type 'str' instead
Traceback (most recent call last):
  File "/Users/marius/Development/starburst/dbt-trino-test/dbt-trino-jaffle-shop-freshness/venv/lib/python3.8/site-packages/dbt/task/base.py", line 361, in safe_run
    result = self.compile_and_execute(manifest, ctx)
  File "/Users/marius/Development/starburst/dbt-trino-test/dbt-trino-jaffle-shop-freshness/venv/lib/python3.8/site-packages/dbt/task/base.py", line 314, in compile_and_execute
    result = self.run(ctx.node, manifest)
  File "/Users/marius/Development/starburst/dbt-trino-test/dbt-trino-jaffle-shop-freshness/venv/lib/python3.8/site-packages/dbt/task/base.py", line 403, in run
    return self.execute(compiled_node, manifest)
  File "/Users/marius/Development/starburst/dbt-trino-test/dbt-trino-jaffle-shop-freshness/venv/lib/python3.8/site-packages/dbt/task/freshness.py", line 149, in execute
    freshness = self.adapter.calculate_freshness(
  File "/Users/marius/Development/starburst/dbt-trino-test/dbt-trino-jaffle-shop-freshness/venv/lib/python3.8/site-packages/dbt/adapters/base/impl.py", line 1094, in calculate_freshness
    max_loaded_at = _utc(table[0][0], source, loaded_at_field)
  File "/Users/marius/Development/starburst/dbt-trino-test/dbt-trino-jaffle-shop-freshness/venv/lib/python3.8/site-packages/dbt/adapters/base/impl.py", line 93, in _utc
    raise raise_database_error(
  File "/Users/marius/Development/starburst/dbt-trino-test/dbt-trino-jaffle-shop-freshness/venv/lib/python3.8/site-packages/dbt/exceptions.py", line 465, in raise_database_error
    raise DatabaseException(msg, node)
dbt.exceptions.DatabaseException: Database Error in source stg_orders (models/staging/stg_orders.yml)
  Expected a timestamp value when querying field 'CAST(order_date AS timestamp)' of table "memory"."default"."stg_orders" but received value of type 'str' instead

This is the stacktrace of the dbt source freshness command when it gets applied to a model containing freshness checks.

Printing the content of the table passed to _utc table on the console gives us:

| column         | data_type |
| -------------- | --------- |
| max_loaded_at  | Text      |
| snapshotted_at | Text      |

while the _utc function expects a datetime.

This seems to be a limitation of the trino-python-client library which interprets the selected data as string.

The data coming back from Trino for the freshness request over HTTP looks like this:

[['2018-04-09 00:00:00.000', '2021-12-22 09:20:45.573 UTC']]

Unfortunately it is not converted to the corresponding datetime type.

See https://github.com/trinodb/trino-python-client/blob/master/trino/client.py#L401 for details of the missing conversion logic.

@findinpath
Copy link
Collaborator

Related issue:
trinodb/trino-python-client#32

@mdesmet
Copy link
Member

mdesmet commented Apr 15, 2022

Should be fixed by #61 as we use experimental_python_types from trino-python-client.

@hovaesco
Copy link
Contributor

hovaesco commented May 9, 2022

Resolved by #61

@hovaesco hovaesco closed this as completed May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants