Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to create External Tables across Multiple Databases within Snowflake #89

Closed
2 tasks
radeleye opened this issue May 14, 2021 · 1 comment
Closed
2 tasks
Labels
bug Something isn't working

Comments

@radeleye
Copy link

Describe the bug

We are attempting to create external tables within multiple databases yet when we run the dbt run-operation stage_external_sources command, the package attempts to create all external tables from stages within the database specified in our profiles.yml file and errors out when we attempt to create external tables in other databases because the stages specified in the sources file do not exist in that specific database.

It seems the package assumes all stages are within in the database specified in our profiles.yml file.

It also completely ignores any configurations within our dbt_project.yml file which specifies the database to have the external tables created in.

We have attempted to use arguments and tags to specify which database to have the external tables created but those options do not seem to work.

Steps to reproduce

  • We have a dbt sub-directory "Orchestration" with a subfolder for each database, which houses the sources.yml files for each schema with a corresponding external table:

models/orchestration/folder_for_db_one/schema_1

sources: 
  - name: stage_table
    database: db_one
    schema: raw_schema_1
    loaded_at_field: src_file_name_date
    tables:
      - name: ext_table
        external:
          location: '@s3_location'
          file_format: >
            (type = csv field_delimiter=',' SKIP_HEADER = 1 FIE

models/orchestration/folder_for_db_two/schema_2

sources: 
  - name: stage_table
    database: db_two
    schema: raw_schema_2
    loaded_at_field: src_file_name_date
    tables:
      - name: ext_table
        external:
          location: '@s3_location'
          file_format: >
            (type = csv field_delimiter=',' SKIP_HEADER = 1 FIE
  • Our dbt_project.yml file is configured to specify a database for each source folder:
models:
  folder_for_db_one:
    +database: DB_ONE
  folder_for_db_two:
    +database: DB_TWO
  • Our profiles.yml file is configured to point to a specific db:

database: DB_ONE

Expected results

Success messages for all external tables configured to be created in databases other than the one found in our profiles.yml file.

Actual results

the external tables under the db from our profiles.yml are built successfully:

11:02:38 + 1 of 53 START external source raw_schema_1.ext_table
11:02:39 + 1 of 53 (1) create or replace external table db_one.t...  
11:02:40 + 1 of 53 (1) SUCCESS 1

Once external tables found in the second db's configuration is reached, an error is thrown

Encountered an error while running operation: Database Error
  002003 (02000): SQL compilation error:
  Stage 'DB_TWO.SCHEMA_2.DB_ONE_SCHEMA_STAGE' does not exist or not authorized.

Screenshots and log output

System information

The contents of your packages.yml file:


  - package: fishtown-analytics/dbt_external_tables
    version: 0.6.2
# <see https://hub.getdbt.com/fishtown-analytics/dbt_external_tables/latest/> for more details 

Which database are you using dbt with?

  • redshift
  • [ x] snowflake
  • other (specify: __________)

The output of dbt --version:

Running with dbt=0.19.1

The operating system you're using:

The output of python --version:

Additional context

@radeleye radeleye added the bug Something isn't working label May 14, 2021
@jtcohen6
Copy link
Collaborator

Thanks for the detailed writeup @radeleye!

It seems the package assumes all stages are within in the database specified in our profiles.yml file.

The package doesn't do anything with stages, except include the provided external.location in its create external table and create snowpipe DDL statements. As noted in the README, this package assumes that you have already created any Snowflake external stages you need. If those stages exist in databases different from target.database, then you can fully qualify the name of the stage in the location yaml property:

external:
  location: '@db_two.schema_name.stage_name'

It also completely ignores any configurations within our dbt_project.yml file which specifies the database to have the external tables created in.

This is tricky: the database source property is different from the database config of models, which is what you're configuring in dbt_project.yml. I get that this is confusing—resource property vs. node config—it's a distinction without a difference, really, and it's something we'd like to resolve ahead of releasing dbt v1.0 (dbt-labs/dbt-core#2401). In the meantime, if you want to change the database of the external table the package will create, you need to do it within each source's properties.

I'm going to close this issue for now, as I don't think we need (or can make) a code change in the package to account for the points you've raised. If you think there's something specific that the package could be doing differently, however, I'm all ears.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants