[CT-1482] [Feature] Add support for data share objects #217

randy-rc · 2022-11-09T09:43:54Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-redshift functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Issue 1

Calling dbt_utils.get_column_values on a data share object results in error

{% macro test5() -%}

    {% set details = dbt_utils.get_column_values(table=source('[name]', '[object]'), column='[column]') %}

    {% for detail in details %}
        {{ log('Detail: ' ~ detail, True) }}
    {% endfor %}

{%- endmacro %}

09:25:34  Encountered an error while running operation: Compilation Error in macro test5 (macros/test.sql)
  In get_column_values(): relation "[database]"."[schema]"."[object]" does not exist and no default value was provided.
  
  > in macro statement (macros/etc/statement.sql)
  > called by macro default__get_column_values (macros/sql/get_column_values.sql)
  > called by macro get_column_values (macros/sql/get_column_values.sql)
  > called by macro test5 (macros/test.sql)
  > called by macro test5 (macros/test.sql)

dbt Cloud System Logs (Details) captures the statement used to check if object of concern exists. Unfortunately data share objects are not stored in both pg_tables and pg_views and hence the "does not exist" error.

...
09:25:33  On macro_test5: /* {"app": "dbt", "dbt_version": "1.2.2", "profile_name": "user", "target_name": "default", "connection_name": "macro_test5"} */
select
      '[database]' as database,
      tablename as name,
      schemaname as schema,
      'table' as type
    from pg_tables
    where schemaname ilike '[schema]'
    union all
    select
      '[database]' as database,
      viewname as name,
      schemaname as schema,
      'view' as type
    from pg_views
    where schemaname ilike '[schema]'
09:25:34  SQL status: SELECT in 0.24 seconds
09:25:34  with database=[database], schema=[schema], relations=[]
09:25:34  Postgres adapter: Error running SQL: macro test5
...

Issue 2

Calling adapter.get_columns_in_relation on a data share object does not result in error but no result is returned

{% macro test6() -%}
    
    {%- set cols = adapter.get_columns_in_relation(source('[name]', '[object]')) -%}
    {{ log('Detail: ' ~ cols, True) }}

{%- endmacro %}

05:01:51  Detail: []

Describe alternatives you've considered

Statement used to check if object of concern exists is found in macro redshift__list_relations_without_caching which references macro postgres__list_relations_without_caching and thus a solution is to overwrite redshift__list_relations_without_caching to get the required information from svv_datashare_objects.

Similarly statement used to check if columns exists is found in macro redshift__get_columns_in_relation and thus a solution is to overwrite redshift__get_columns_in_relation to get the required information from svv_all_columns
and svv_datashare_objects.

{% macro redshift__list_relations_without_caching(schema_relation) %}
  {% call statement('list_relations_without_caching', fetch_result=True) -%}
    select
      '{{ schema_relation.database }}' as database,
      tablename as name,
      schemaname as schema,
      'table' as type
    from pg_tables
    where schemaname ilike '{{ schema_relation.schema }}'
    union all
    select
      '{{ schema_relation.database }}' as database,
      viewname as name,
      schemaname as schema,
      'view' as type
    from pg_views
    where schemaname ilike '{{ schema_relation.schema }}'
    union all
    select
      '{{ schema_relation.database }}' as database,
      split_part(object_name, '.', 2) as name,
      split_part(object_name, '.', 1) as schema,
      'datashare' as type
    from svv_datashare_objects
    where share_type = 'INBOUND'
    and   object_type in ('table', 'view', 'materialized view', 'late binding view')
    and   split_part(object_name, '.', 1) ilike '{{ schema_relation.schema }}'
  {% endcall %}
  {{ return(load_result('list_relations_without_caching').table) }}
{% endmacro %}

{% macro redshift__get_columns_in_relation(relation) -%}
  {% call statement('get_columns_in_relation', fetch_result=True) %}
      with bound_views as (
        select
          ordinal_position,
          table_schema,
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

        from information_schema."columns"
        where table_name = '{{ relation.identifier }}'
    ),

    unbound_views as (
      select
        ordinal_position,
        view_schema,
        col_name,
        case
          when col_type ilike 'character varying%' then
            'character varying'
          when col_type ilike 'numeric%' then 'numeric'
          else col_type
        end as col_type,
        case
          when col_type like 'character%'
          then nullif(REGEXP_SUBSTR(col_type, '[0-9]+'), '')::int
          else null
        end as character_maximum_length,
        case
          when col_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 1),
            '')::int
          else null
        end as numeric_precision,
        case
          when col_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 2),
            '')::int
          else null
        end as numeric_scale

      from pg_get_late_binding_view_cols()
      cols(view_schema name, view_name name, col_name name,
           col_type varchar, ordinal_position int)
      where view_name = '{{ relation.identifier }}'
    ),

    external_views as (
      select
        columnnum,
        schemaname,
        columnname,
        case
          when external_type ilike 'character varying%' or external_type ilike 'varchar%'
          then 'character varying'
          when external_type ilike 'numeric%' then 'numeric'
          else external_type
        end as external_type,
        case
          when external_type like 'character%' or external_type like 'varchar%'
          then nullif(
            REGEXP_SUBSTR(external_type, '[0-9]+'),
            '')::int
          else null
        end as character_maximum_length,
        case
          when external_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 1),
            '')::int
          else null
        end as numeric_precision,
        case
          when external_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 2),
            '')::int
          else null
        end as numeric_scale
      from
        pg_catalog.svv_external_columns
      where
        schemaname = '{{ relation.schema }}'
        and tablename = '{{ relation.identifier }}'

    ),

    data_share as (
      select
        ordinal_position as columnnum,
        schema_name as schemaname,
        column_name as columnname,
        case
          when data_type ilike 'character varying%' then
            'character varying'
          when data_type ilike 'numeric%' then 'numeric'
          else data_type
        end as col_type,
        case
          when data_type like 'character%'
          then nullif(REGEXP_SUBSTR(data_type, '[0-9]+'), '')::int
          else null
        end as character_maximum_length,
        case
          when data_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(data_type, '[0-9,]+'), ',', 1),
            '')::int
          else null
        end as numeric_precision,
        case
          when data_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(data_type, '[0-9,]+'), ',', 2),
            '')::int
          else null
        end as numeric_scale
      from svv_all_columns a
      inner join (
        select *
        from svv_datashare_objects
        where share_type = 'INBOUND'
        and   object_type in ('table', 'view', 'materialized view', 'late binding view')
        and   object_name = '{{ relation.schema }}' || '.' || '{{ relation.identifier }}'
      ) b on a.schema_name || '.' || a.table_name  = b.object_name
    ),

    unioned as (
      select * from bound_views
      union all
      select * from unbound_views
      union all
      select * from external_views
      union all
      select * from data_share
    )

    select
      column_name,
      data_type,
      character_maximum_length,
      numeric_precision,
      numeric_scale

    from unioned
    {% if relation.schema %}
    where table_schema = '{{ relation.schema }}'
    {% endif %}
    order by ordinal_position
  {% endcall %}
  {% set table = load_result('get_columns_in_relation').table %}
  {{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}

Who will this benefit?

Any users who are exploring the use of Redshift data share.

Are you interested in contributing this feature?

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2022-11-14T16:25:12Z

@randy-rc Thanks for the detailed write-up!

This issue sounds a lot like #179. It would probably make sense to tackle the two together.

Tables in other databases (accessed via RA3 cross-db reads) don't appear in pg_tables or pg_views, either. They have to come from svv_tables.

In this case, it looks like there's a svv_datashare_objects system table, and that we'll also want to pull column-level info from svv_all_columns (in addition to information_schema."columns" and pg_catalog.svv_external_columns).

randy-rc · 2022-11-28T03:13:48Z

Update to macro redshift__list_relations_without_caching data share section to check for relations under the right database. Prior to this change, generating snapshots in consumer fails as it detects the resulting tables in data share.

{% macro redshift__list_relations_without_caching(schema_relation) %}
  {% call statement('list_relations_without_caching', fetch_result=True) -%}
    select
      '{{ schema_relation.database }}' as database,
      tablename as name,
      schemaname as schema,
      'table' as type
    from pg_tables
    where schemaname ilike '{{ schema_relation.schema }}'
    union all
    select
      '{{ schema_relation.database }}' as database,
      viewname as name,
      schemaname as schema,
      'view' as type
    from pg_views
    where schemaname ilike '{{ schema_relation.schema }}'
    union all
    (
      with
      shared_db_schema as (
        select
          database_name,
          schema_name
        from  svv_all_schemas
        where schema_type='shared'
      ),
      inbound_datashare_obj as (
        select
          split_part(object_name, '.', 1) as schema_name,
          split_part(object_name, '.', 2) as table_name
        from  svv_datashare_objects
        where share_type='INBOUND'
        and   object_type in ('table', 'view', 'materialized view', 'late binding view')
      )
      select
        a.database_name as database,
        a.table_name    as name,
        a.schema_name   as schema,
        'datashare'     as type
      from       svv_all_tables a
      inner join shared_db_schema b
      on         a.database_name = b.database_name
      and        a.schema_name = b.schema_name
      inner join inbound_datashare_obj c
      on         a.schema_name = c.schema_name
      and        a.table_name = c.table_name
      where      a.database_name ilike '{{ schema_relation.database }}'
      and        a.schema_name ilike '{{ schema_relation.schema }}'
    )
  {% endcall %}
  {{ return(load_result('list_relations_without_caching').table) }}
{% endmacro %}

github-actions · 2023-05-28T02:03:01Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

rarup1 · 2023-06-28T17:14:56Z

@jiezhen-chen Is this issue going to be resolved soon?
We are using Redshift datasharing with a Redshift serverless as our CICD / Dev environment as it helps separate our resource use. The dbt.utils.union_relations is an example of a utils that doesn't work if you are using a datashare since it is looking in the info_schema before a model is created in the server (if you are deferring to a datashare). The work around we are using at the moment is to have to build all upstream models of a union in the Redshift Serverless environment before we call the macro. It is not the best experience as it stands.

Grateful if this can be resolved in the macro / passing an override to point at the datashare schema...

Thanks!

rarup1 · 2023-07-30T22:06:10Z

Just seen that this is no longer fixed in dbt 1.6 - is there any chance this will be looked at soon?

Cheers

yuna-tang · 2023-08-08T02:05:41Z

@Fleid Just would like some updates, when will this issue be looked at for the future dbt version? We are planning to use serverless for our CICD and currently our workaround is to use the above override macros.

jiezhen-chen · 2023-08-09T17:39:33Z

@rarup1 Unfortunately the PR that was merged didn't end up providing support for datashares. We decided not to use the redshift_connector API call, and instead queried from information_schema, because the redshift_connector API call doesn't support MV at the moment. Apologies for the confusion!

yuna-tang · 2023-08-09T22:36:21Z

@jiezhen-chen Can you please let us know when dbt will support Redshift datashare?

rarup1 · 2023-08-10T13:31:53Z

@yuna-tang are you able to give me a run through of how you have got your workaround CICD/Dev process working with a RS datashare?

I am trying to implement the override macros that were shared above by @randy-rc:

{% macro redshift__list_relations_without_caching(schema_relation) %}
{% macro redshift__get_columns_in_relation(relation) -%}

I have set up a test model test_union.sql using the dbt.utils.union_relations macro as follows:

WITH _union as (
{{ dbt_utils.union_relations(relations=[ref('fct_table_1'), ref('fct_table_2')]) }}
)

SELECT * FROM _union

Where fct_table_1 and fct_table_2 are only in a datashare named warehouse and the target db is dev.

And then running a deferral:

dbt run -m test_union --state target_defer --defer

Where the default target db is dev and the deferrals all defer back to the datashare db warehouse through the target_defer/manifests.json if they do not exist in developer's schema eg. dbt_rupert_core.fct_table_1 and dbt_rupert_core.fct_table_2. In this example they do not, as that is where the problems for us arise.

The issue I am having at the moment after deploying the macros above is actually getting them to pick up the datashare columns. The jinja compile is column-less:

  create view "dev"."dbt_rupert_core"."test_union__dbt_tmp" as (
    WITH _union as (
        (select cast('"warehouse"."models_core"."fct_table_1"' as varchar) as _dbt_source_relation,
            from "warehouse"."models_core"."fct_table_1"
        )

        union all
        (select cast('"warehouse"."models_core"."fct_table_2"' as varchar) as _dbt_source_relation,
            from "warehouse"."models_core"."fct_table_2"
        )
)
SELECT * FROM _union
  ) with no schema binding;

Interested to hear how you got yours working! Cheers

update

It was due to the svv_datashare_object not being accesible for the warehouse datashare for any user other than root/admin. I got it working by testing out the whole process with root as the user. The performance was shockingly slow however and I will be looking into saving the svv_ tables in our dwh as a workaround.

yuna-tang · 2023-08-15T03:56:57Z

@rarup1 sorry for the late reply. I also tried the test model test_union.sql , which generated the empty columns for me.

For us to use serverless with Datashare, we need to configure all our datasource.yml file to hardcode the database & schema to our main cluster, and then we will be able to create the sample version of downstream models in the serverless.

version: 2
sources:
  - name: <your_data_source>
    database:<your_database>
    schema:  <your_data_source>

Then we need to add below in the datashare :

add all the schemas from production into the datashare

ALTER datashare <your_datashare_name> add schema <your_data_source>;
ALTER datashare  <your_datashare_name> add all tables in schema <your_data_source>;
ALTER DATASHARE  <your_datashare_name>  SET INCLUDENEW = TRUE FOR SCHEMA <your_data_source>;
```;
- create all the users in datashare that exist in main cluster

Then I have followed the steps in [this article ](https://medium.com/building-inventa/how-we-slimmed-down-slim-ci-for-dbt-cloud-6a944e7574e2)to create a sample production database and run CICD against it using serverless. 
- we are still at the testing stage but be able to create a full sample database in serverless via datashare with 1 day of data.

github-actions · 2024-02-12T01:48:14Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

dataders · 2024-04-03T19:30:02Z

closing in favor of #742

randy-rc added enhancement New feature or request triage labels Nov 9, 2022

github-actions bot changed the title ~~[Feature] Add support for data share objects~~ [CT-1482] [Feature] Add support for data share objects Nov 9, 2022

jtcohen6 removed the triage label Nov 14, 2022

Fleid self-assigned this Nov 14, 2022

jiezhen-chen mentioned this issue Apr 7, 2023

add support for datashared objects and external tables for redshift l… #399

Closed

6 tasks

jiezhen-chen mentioned this issue May 4, 2023

Override list_relations_without_caching #428

Merged

6 tasks

jiezhen-chen mentioned this issue May 23, 2023

Override check schema exists #452

Closed

6 tasks

github-actions bot added the Stale label May 28, 2023

Fleid removed the Stale label May 31, 2023

mikealfare closed this as completed in #428 Jul 13, 2023

dataders reopened this Jul 13, 2023

colin-rogers-dbt mentioned this issue Jan 24, 2024

[CT-1061] Remove ra3_node profile config? #171

Closed

github-actions bot added the Stale label Feb 12, 2024

mikealfare removed the Stale label Feb 12, 2024

Fleid removed their assignment Feb 14, 2024

Fleid self-assigned this Feb 22, 2024

dataders added the ra3_node issues relating to ra3 node support label Mar 26, 2024

dataders mentioned this issue Mar 27, 2024

[Feature] Datasharing: cross-database support #742

Open

dataders closed this as completed Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1482] [Feature] Add support for data share objects #217

[CT-1482] [Feature] Add support for data share objects #217

randy-rc commented Nov 9, 2022 •

edited

Loading

jtcohen6 commented Nov 14, 2022

randy-rc commented Nov 28, 2022 •

edited

Loading

github-actions bot commented May 28, 2023

rarup1 commented Jun 28, 2023 •

edited

Loading

rarup1 commented Jul 30, 2023

yuna-tang commented Aug 8, 2023

jiezhen-chen commented Aug 9, 2023

yuna-tang commented Aug 9, 2023

rarup1 commented Aug 10, 2023 •

edited

Loading

yuna-tang commented Aug 15, 2023

github-actions bot commented Feb 12, 2024

dataders commented Apr 3, 2024

[CT-1482] [Feature] Add support for data share objects #217

[CT-1482] [Feature] Add support for data share objects #217

Comments

randy-rc commented Nov 9, 2022 • edited Loading

Is this your first time submitting a feature request?

Describe the feature

Issue 1

Issue 2

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

jtcohen6 commented Nov 14, 2022

randy-rc commented Nov 28, 2022 • edited Loading

github-actions bot commented May 28, 2023

rarup1 commented Jun 28, 2023 • edited Loading

rarup1 commented Jul 30, 2023

yuna-tang commented Aug 8, 2023

jiezhen-chen commented Aug 9, 2023

yuna-tang commented Aug 9, 2023

rarup1 commented Aug 10, 2023 • edited Loading

yuna-tang commented Aug 15, 2023

github-actions bot commented Feb 12, 2024

dataders commented Apr 3, 2024

randy-rc commented Nov 9, 2022 •

edited

Loading

randy-rc commented Nov 28, 2022 •

edited

Loading

rarup1 commented Jun 28, 2023 •

edited

Loading

rarup1 commented Aug 10, 2023 •

edited

Loading