Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overriding the database in the dbt_project file does not work #59

Closed
CMRobbie opened this issue Nov 14, 2020 · 21 comments
Closed

Overriding the database in the dbt_project file does not work #59

CMRobbie opened this issue Nov 14, 2020 · 21 comments

Comments

@CMRobbie
Copy link

When I override the database to something different than the database configured in the profile, the CLI says it's going to built into the override database, but it's executed on the profile database.

Example profiles.xml:

datawarehouse:
  target: development
  outputs:
    development:
      type: sqlserver
      driver: 'ODBC Driver 17 for SQL Server'
      server: bisql
      port: 1433
      windows_login: True
      database: Test
      schema: test
      threads: 4

example dbt_project.yml:

...
models:
  my_datawarehouse:
    materialized: view
    products:
      +database: ProductsModels
      +schema: ProductsModels

When ran, the CLI says "created view model ProductsModels.ProductsModels.Metrics..." but in reality it's created in Test.ProductsModels.Metrics

@dataders
Copy link
Collaborator

interesting. i'm going to poke around with this and see what I can find out.

@CMRobbie
Copy link
Author

Great! Thanks @swanderz. Is there any extra information I can provide to help you?

@dataders
Copy link
Collaborator

dataders commented Nov 19, 2020

Great! Thanks @swanderz. Is there any extra information I can provide to help you?

Can call dbt run --model Metrics then go into logs/dbt.log and share the SQL queries relevant the the Metrics view creation?

My thinking is that it has to do with this dbt-labs/dbt-core#1695, specifically the generate_database_name() macro, but at a glance I'm not sure how this could be not working...

cc: @NandanHegde15 @alieus

@CMRobbie
Copy link
Author

CMRobbie commented Nov 19, 2020

It's omitting the database name in the query. And when connecting, it uses the incorrect one (from profile instead of project). The SQL Query that loads the data:

   drop table if exists ProductsModels.ProductsModels_Metrics__dbt_tmp


   EXEC('create view ProductsModels.ProductsModels_Metrics__dbt_tmp_temp_view as

SELECT 

        [TechnicalLinkTypeID],

        [CRMID],

        [LogicalAccountID],

        [TechnicalAccountID],

        [BusinessLine],

        [Metric],

        [Year],

        [Month],

        [Day],

        [Value]

    FROM DataWarehouse.CDPMetrics.Metrics
    ');

   SELECT * INTO ProductsModels.ProductsModels_Metrics__dbt_tmp FROM
    ProductsModels.ProductsModels_Metrics__dbt_tmp_temp_view

   drop view if exists ProductsModels.ProductsModels_Metrics__dbt_tmp_temp_view

    
   DROP INDEX IF EXISTS ProductsModels_ProductsModels_Metrics__dbt_tmp.ProductsModels_ProductsModels_Metrics__dbt_tmp_cci
  CREATE CLUSTERED COLUMNSTORE INDEX ProductsModels_ProductsModels_Metrics__dbt_tmp_cci
    ON ProductsModels.ProductsModels_Metrics__dbt_tmp

   

@CMRobbie
Copy link
Author

I think the problem is in this part, rather than the generation of the queries:
2020-11-19 19:16:14.545496 (Thread-1): Using connection string: DRIVER={ODBC Driver 17 for SQL Server};SERVER=of-bisql-01,1433;Database=Test;trusted_connection=yes
2020-11-19 19:16:14.545496 (Thread-1): Connected to db: Test

It should have connected to the ProductsModels database instead.

@dataders
Copy link
Collaborator

def helpful. now i'm thinking it has to do with the sqlserver_create_view_as() macro. Basically the default and redshift version of the macro just calls {{relation}} rather than {{ relation.schema }}.{{ relation.identifier }}. My team doesn't have bandwidth to look more into this until next week at the earliest. If you're feeling adventurous, you could try over-writing the sqlserver_create_view_as() macro by making a coping your local project's macro folder, then experimenting to see if you can get it to use the database name in the query. You might also have to include a line line USE {{relation.database}}

sqlserver_create_view_as()

{% macro sqlserver__create_view_as(relation, sql) -%}
  create view {{ relation.schema }}.{{ relation.identifier }} as
    {{ sql }}
{% endmacro %}

default_create_view_as()

{% macro default__create_view_as(relation, sql) -%}
  {%- set sql_header = config.get('sql_header', none) -%}

  {{ sql_header if sql_header is not none }}
  create view {{ relation }} as (
    {{ sql }}
  );
{% endmacro %}

@CMRobbie
Copy link
Author

CMRobbie commented Nov 19, 2020

I'm all for an adventure! :)
And I'm also happy to wait to next week.

I added a file called sqlserver_create_view_as.sql in my macros folder with this as content:

{% macro sqlserver__create_view_as(relation, sql) -%}
  create view {{ relation.schema }}.{{ relation.identifier }} as
    --THIS IS A TEST
    {{ sql }}
{% endmacro %}

But I have the impression it's not using it at all, because it's not logging the "THIS IS A TEST" anywhere.
Or maybe I did not understand you?

@CMRobbie
Copy link
Author

I now also tried it with two underscores in the filename behind sqlserver, and with one underscore in the macro function name, but all permutations those do not seem to work.

@dataders
Copy link
Collaborator

dataders commented Nov 19, 2020

ok. try renaming the macro to just create_view_as(). your instinct is correct that --THIS IS A TEST should show up in the log

@CMRobbie
Copy link
Author

Nope, doesn't work.
Does the filename matter?

@CMRobbie
Copy link
Author

Ah, sorry... I changed the materialization to table so it's obviously using sqlserver__create_table_as now.

@dataders
Copy link
Collaborator

Ok so now you add a new macro called create_table_as() with a version of the code that looks like this. Hopefully you get an interesting workflow. Note if it feels like stumbling around in the dark and fumbling with light switches -- that's also my experience 😸

@CMRobbie
Copy link
Author

Adding USE {{ relation.database }}; seems to do the trick! I'm going to test it a bit more.

@CMRobbie
Copy link
Author

It really helps to know that I can just override any macro from the adapter in the macros folder :) Thank you very much!

@CMRobbie
Copy link
Author

Creating a new table when none exists works. But if the table already exists, it doesn't work. My first hunch was to look at the query that does the rename, but somehow my new definition of the rename_relation doesn't come through. I tried overriding both sqlserver__rename_relation and rename_relation.

The error I get:

2020-11-19 20:19:23.209382 (Thread-1): On model.cm_datawarehouse.ProductsModels_Metrics: EXEC sp_rename 'ProductsModels.ProductsModels_Metrics__dbt_tmp', 'Metrics'
IF EXISTS(
SELECT *
FROM sys.indexes
WHERE name='ProductsModels_ProductsModels_Metrics__dbt_tmp_cci' and object_id = OBJECT_ID('ProductsModels.Metrics'))
EXEC sp_rename N'ProductsModels.Metrics.ProductsModels_ProductsModels_Metrics__dbt_tmp_cci', N'ProductsModels_Metrics_cci', N'INDEX'
2020-11-19 20:19:23.209382 (Thread-1): Database error: ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Error: The new name 'Metrics' is already in use as a object name and would cause a duplicate that is not permitted. (15335) (SQLExecDirectW)")
2020-11-19 20:19:23.209382 (Thread-1): On model.cm_datawarehouse.ProductsModels_Metrics: ROLLBACK
2020-11-19 20:19:23.209382 (Thread-1): On model.cm_datawarehouse.ProductsModels_Metrics: Close
2020-11-19 20:19:23.209382 (Thread-1): Error running SQL: macro rename_relation

@dataders
Copy link
Collaborator

welcome to the world of dbt adapter development and thanks for looking into this! At this point you're well on the way to your first PR. Feel free to do the macro over-riding for the short term, but this guide will guide you in how to make changes for a PR.

As for the the rename thing you mention above, looks like it has something to do with the sqlserver__rename_relation macro.

@CMRobbie
Copy link
Author

CMRobbie commented Nov 19, 2020

I don't know.. I got it working but the change seems odd. It's working if I drop the table/view just before the rename:

{% macro sqlserver__rename_relation(from_relation, to_relation) -%}
  --sqlserver__rename_relation override

  {% call statement('rename_relation') -%}
    {{ sqlserver__drop_relation_script(to_relation) }};
    EXEC sp_rename '{{ from_relation.schema }}.{{ from_relation.identifier }}', '{{ to_relation.identifier }}'
    IF EXISTS(
    SELECT *
    FROM sys.indexes
    WHERE name='{{ from_relation.schema }}_{{ from_relation.identifier }}_cci' and object_id = OBJECT_ID('{{ from_relation.schema }}.{{ to_relation.identifier }}'))
    EXEC sp_rename N'{{ from_relation.schema }}.{{ to_relation.identifier }}.{{ from_relation.schema }}_{{ from_relation.identifier }}_cci', N'{{ from_relation.schema }}_{{ to_relation.identifier }}_cci', N'INDEX'
  {%- endcall %}
{% endmacro %}

Also, fun fact, this query does not get logged into dbt.log even though I 'm pretty sure it gets executed.

@mikaelene
Copy link
Collaborator

def helpful. now i'm thinking it has to do with the sqlserver_create_view_as() macro. Basically the default and redshift version of the macro just calls {{relation}} rather than {{ relation.schema }}.{{ relation.identifier }}. My team doesn't have bandwidth to look more into this until next week at the earliest. If you're feeling adventurous, you could try over-writing the sqlserver_create_view_as() macro by making a coping your local project's macro folder, then experimenting to see if you can get it to use the database name in the query. You might also have to include a line line USE {{relation.database}}

sqlserver_create_view_as()

{% macro sqlserver__create_view_as(relation, sql) -%}
  create view {{ relation.schema }}.{{ relation.identifier }} as
    {{ sql }}
{% endmacro %}

default_create_view_as()

{% macro default__create_view_as(relation, sql) -%}
  {%- set sql_header = config.get('sql_header', none) -%}

  {{ sql_header if sql_header is not none }}
  create view {{ relation }} as (
    {{ sql }}
  );
{% endmacro %}

I think this is the issue here. The current macro assumes you are connected to the database you want to build the view in. Thus leaving out the database identifier. Actually SQL Server doesn't allow you to specify the database to build it into. You have to be in the right data base. I think this issue will be quite difficult to solve. dbt is reusing connections so there would be a risk of executing sql in the wrong database for the models you want in the database in the profile.

Maybe we can add a USE [DB] statement before every sql we are executing?

@adrianturcato-initiative

Hi All,

Thought I would drop a line here. I am also experiencing this problem. I followed the suggested fix in PR #126, installing dbt via cloning the branch from PR #126 . But I am still getting the same issue. I pasted the log below.

In this case I am looking for the table stg_unpacked_placements to be materialized in Client_DB instead of the default Default_DB. With the build from #126 I am definitely seeing Client_DB throughout the compiled SQL, but still doesn't appear to have the correct results.

I am working on SQL Server 2017

2021-05-05 13:21:00.134064 (Thread-1): 09:21:00 | 4 of 6 START table model Client_DB.client_name.stg_unpacked_placements.... [RUN]
2021-05-05 13:21:00.135062 (Thread-1): Acquiring new sqlserver connection "model.my_new_project.stg_unpacked_placements".
2021-05-05 13:21:00.135062 (Thread-1): Compiling model.my_new_project.stg_unpacked_placements
2021-05-05 13:21:00.138064 (Thread-1): Writing injected SQL for node "model.my_new_project.stg_unpacked_placements"
2021-05-05 13:21:00.139065 (Thread-1): finished collecting timing info
2021-05-05 13:21:00.142064 (Thread-1): Using sqlserver connection "model.my_new_project.stg_unpacked_placements".
2021-05-05 13:21:00.142064 (Thread-1): On model.my_new_project.stg_unpacked_placements: use [Client_DB];
    if object_id ('"client_name"."stg_unpacked_placements__dbt_tmp"','U') is not null
      begin
      drop table "client_name"."stg_unpacked_placements__dbt_tmp"
      end
2021-05-05 13:21:00.142064 (Thread-1): Opening a new connection, currently in state closed
2021-05-05 13:21:00.142064 (Thread-1): Using connection string: DRIVER={SQL Server Native Client 11.0};SERVER=server.address.rds.amazonaws.com,port;Database=DEFAULT_DB;UID={***};PWD=***;Application Name=dbt-sqlserver/0.19.0.3
2021-05-05 13:21:00.143067 (Thread-1): Connected to db: DEFAULT_DB
2021-05-05 13:21:00.269190 (Thread-1): SQL status: OK in 0.13 seconds
2021-05-05 13:21:00.271193 (Thread-1): Using sqlserver connection "model.my_new_project.stg_unpacked_placements".
2021-05-05 13:21:00.271193 (Thread-1): On model.my_new_project.stg_unpacked_placements: use [Client_DB];
    if object_id ('"client_name"."stg_unpacked_placements__dbt_backup"','U') is not null
      begin
      drop table "client_name"."stg_unpacked_placements__dbt_backup"
      end
2021-05-05 13:21:00.398163 (Thread-1): SQL status: OK in 0.13 seconds
2021-05-05 13:21:00.400163 (Thread-1): Writing runtime SQL for node "model.my_new_project.stg_unpacked_placements"
2021-05-05 13:21:00.401182 (Thread-1): Using sqlserver connection "model.my_new_project.stg_unpacked_placements".
2021-05-05 13:21:00.401182 (Thread-1): On model.my_new_project.stg_unpacked_placements: 
   
  use [Client_DB];
  if object_id ('"client_name"."stg_unpacked_placements__dbt_tmp_temp_view"','V') is not null
      begin
      drop view "client_name"."stg_unpacked_placements__dbt_tmp_temp_view"
      end


   
   
  use [Client_DB];
  if object_id ('"client_name"."stg_unpacked_placements__dbt_tmp"','U') is not null
      begin
      drop table "client_name"."stg_unpacked_placements__dbt_tmp"
      end


   use [Client_DB];
   EXEC('create view "client_name"."stg_unpacked_placements__dbt_tmp_temp_view" as
    

SELECT *
FROM (
    SELECT 
        Placement, 
        PlacementID as placement_id, 
        campaign, 
        CampaignID as campaign_id
    FROM "DEFAULT_DB"."dbo"."a_table"
    WHERE NormalizedClient = ''some client''
) r
    ');

   SELECT * INTO "Client_DB"."client_name"."stg_unpacked_placements__dbt_tmp" FROM
    "Client_DB"."client_name"."stg_unpacked_placements__dbt_tmp_temp_view"

   
   
  use [Client_DB];
  if object_id ('"client_name"."stg_unpacked_placements__dbt_tmp_temp_view"','V') is not null
      begin
      drop view "client_name"."stg_unpacked_placements__dbt_tmp_temp_view"
      end

    
   use [Client_DB];
  if EXISTS (
        SELECT * FROM
        sys.indexes WHERE name = 'client_name_stg_unpacked_placements__dbt_tmp_cci'
        AND object_id=object_id('client_name_stg_unpacked_placements__dbt_tmp')
    )
  DROP index client_name.stg_unpacked_placements__dbt_tmp.client_name_stg_unpacked_placements__dbt_tmp_cci
  CREATE CLUSTERED COLUMNSTORE INDEX client_name_stg_unpacked_placements__dbt_tmp_cci
    ON client_name.stg_unpacked_placements__dbt_tmp

   


2021-05-05 13:21:00.599162 (Thread-1): SQL status: OK in 0.20 seconds
2021-05-05 13:21:00.601163 (Thread-1): Using sqlserver connection "model.my_new_project.stg_unpacked_placements".
2021-05-05 13:21:00.601163 (Thread-1): On model.my_new_project.stg_unpacked_placements: use [Client_DB];
    EXEC sp_rename 'client_name.stg_unpacked_placements__dbt_tmp', 'stg_unpacked_placements'
    IF EXISTS(
    SELECT *
    FROM sys.indexes
    WHERE name='client_name_stg_unpacked_placements__dbt_tmp_cci' and object_id = OBJECT_ID('client_name.stg_unpacked_placements'))
    EXEC sp_rename N'client_name.stg_unpacked_placements.client_name_stg_unpacked_placements__dbt_tmp_cci', N'client_name_stg_unpacked_placements_cci', N'INDEX'
2021-05-05 13:21:00.736199 (Thread-1): SQL status: OK in 0.14 seconds
2021-05-05 13:21:00.737190 (Thread-1): On model.my_new_project.stg_unpacked_placements: COMMIT
2021-05-05 13:21:00.738190 (Thread-1): Using sqlserver connection "model.my_new_project.stg_unpacked_placements".
2021-05-05 13:21:00.738190 (Thread-1): On model.my_new_project.stg_unpacked_placements: use [Client_DB];
    if object_id ('"client_name"."stg_unpacked_placements__dbt_backup"','U') is not null
      begin
      drop table "client_name"."stg_unpacked_placements__dbt_backup"
      end
2021-05-05 13:21:00.864162 (Thread-1): SQL status: OK in 0.13 seconds
2021-05-05 13:21:00.865162 (Thread-1): finished collecting timing info
2021-05-05 13:21:00.865162 (Thread-1): On model.my_new_project.stg_unpacked_placements: Close
2021-05-05 13:21:00.865162 (Thread-1): Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'af2c7e7b-1505-4230-a756-a2308e0ae67f', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x000001F53D6C8A00>]}
2021-05-05 13:21:00.865162 (Thread-1): 09:21:00 | 4 of 6 OK created table model Client_DB.client_name.stg_unpacked_placements [OK in 0.73s]

@adrianturcato-initiative

Hi Again,

I seemed to have resolved the situation. I ran the compiled code directly in SQL Server, it gave me an error message that wasn't being surfaced in DBT. Basically there was a UDF I was referencing in the query (which I actually sanitized from the log posted above), the UDF is defined in the Default_DB, but I wasn't referencing it with the full location (ie I was only referring to schema.udf_name, not db.schema.udf_name), and either the error was causing the query to fail and write in the wrong location, or the ambiguous reference was somehow changing the destination back to the Default_DB.

@dataders
Copy link
Collaborator

fixed by: #126!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants