Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up relation usage and default behavior #92

Closed
wants to merge 23 commits into from

Conversation

dataders
Copy link
Collaborator

@dataders dataders commented Jan 26, 2021

changes

  • implements a relation.py which sets the relation:
    • quoting policy to both:
      • use " to quote relation parts, and
    • inclusion policy to never include the database name when referencing a relation, just schema.identifier
  • fixes: clean up relation construction #91 cleaner relation name gen ( {{ relation }} instead of {{ relation.schema }}.{{ relation.identifier }})
  • use adapter.drop_relation which deprecates sqlserver__drop_relation_script
  • re-enables the snapshot_strategy_check_cols test for Azure SQL (should have been re-enabled with Fix for passing the snapshot adapter test. Fixes #61 #74)

@dataders
Copy link
Collaborator Author

@jtcohen6 the import is failing but I have no idea why, but I'm willing to bet it's just a rookie python typo.

dataders and others added 3 commits January 27, 2021 14:32
@dataders
Copy link
Collaborator Author

dataders commented Feb 11, 2021

issue

something weird is happening with drop_relation(). For all 8 steps of the empty sequence and the first 12 steps of the base test sequence, the drop_relation's work as they should in that

this jinja-sql

if object_id ('{{ relation }}','{{ object_id_type }}') is not null
    begin
    drop {{ relation.type }} {{ relation }}
    end

compiles to this

 if object_id ('dbt_test_azure_sql_210211000737938430161027.view_model__dbt_backup','V') is not null
      begin
      drop view dbt_test_azure_sql_210211000737938430161027.view_model__dbt_backup
      end

However, it fails starting with step 13 of the base test sequence (running the swappable.sql as incremental). Now the compiled drop_relation SQL looks like this:

if object_id (''dbt_test_azure_sql_210211000737938430161027'.'swappable'','V') is not null
      begin
      drop view 'dbt_test_azure_sql_210211000737938430161027'.'swappable'
      end

@jtcohen6, does the global_project's incremental materialization (and others) change the QuotePolicy somehow? wild guess -- does the { this } relation object inherit the adapter's quoting policy?

SQLServerQuotePolicy is currently set to False for parts, so I don't get why the swappable view is getting quoted here.

@dataclass
class SQLServerQuotePolicy(Policy):
    database: bool = False
    schema: bool = False
    identifier: bool = False

desired behavior

TL;DR I'd like '{{ relation }}' to be rendered as one of these three things:

  • 'database.schema.identifier' per SQLServerQuotePolicy, and
  • '''database''.''schema''.''identifier''' (inexplicably, ' is the default escape character in TSQL, instead of \)

side note, square brackets are the defacto TSQL way to quote relation parts and column names, but not sure if that's possible to set SQLServerRelation's quote_character to be two characters: [ or ].

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6, does the global_project's incremental materialization (and others) change the QuotePolicy somehow? wild guess -- does the {{ this }} relation object inherit the adapter's quoting policy?

Hm... the {{ this }} relation object definitely inherits the adapter's quoting policy. It also doesn't seem like {{ this }} is directly involved here. To be clear, dbt-sqlserver reimplements the incremental materialization, so I'd think that this line is performing the bad drop (because swappable indeed exists_not_as_table, it exists as a view). I'm not sure what's causing it to quote old_relation, though. Feels like your best bet is to take this for a spin locally—create a model as a view, switch it to incremental, add a lot of logging between lines 17 and 32 of the incremental materialization, and see when exactly it starts being quoted.

While on this subject, it's also worth checking to see if the default incremental materialization still needs overriding—the only difference I can tell is explicitly writing {{ tmp_relation.include(schema=False, database=False) }}, rather than implicitly with {{ tmp_relation }} as the default does, even though the result (at least on dbt-postgres) is exactly the same.

@@ -145,18 +132,18 @@
type='view')-%}
{%- set temp_view_sql = sql.replace("'", "''") -%}

{{ sqlserver__drop_relation_script(tmp_relation) }}
{{ sqlserver__drop_relation(tmp_relation) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this, and the other instances below, can just be {{ adapter.drop_relation(tmp_relation) }}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! qq: why adapter.drop_relation() and not just drop_relation()? is it to make sure the macro can't be overwritten by a macro in the project that is also named drop_relation?

@dataders
Copy link
Collaborator Author

it's also worth checking to see if the default incremental materialization still needs overriding

@jtcohen6 I've pulled in changes from #102 which drops the overriding of the incremental materialization. Unfortunately, the same issue is still happening, so I'm going to start locally logging/debugging the global project's incremental materialization.

@dataders dataders force-pushed the relation-construction branch 2 times, most recently from 0427207 to 18903b4 Compare February 14, 2021 06:40
@dataders dataders changed the title auto-render relation names without database clean up relation usage and default behavior Feb 14, 2021
@dataders dataders marked this pull request as ready for review February 14, 2021 07:01
class SQLServerRelation(BaseRelation):
quote_policy: SQLServerQuotePolicy = SQLServerQuotePolicy()
include_policy: SQLServerIncludePolicy = SQLServerIncludePolicy()
quote_character: str = '"'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikaelene this is the one thing I'm not sure about. can you forsee any issues using " instead of ' to quote relation parts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a guru in quoting :-). Did we have ' before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we did, inexplicably though, the code change when I change it back to single quote. You're right that we should think about this change maybe breaking existing snapshot or incremental tables...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will definitely have to change this back to include database=True by default to make cross-database selects work.

Copy link
Collaborator

@mikaelene mikaelene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code look really good. There is a lot of changes. How well have you tested this? Have you run it on a real dbt-project or just in the CI?

@dataders
Copy link
Collaborator Author

Have you run it on a real dbt-project or just in the CI?

Great question. I actually just tried running a project that was previously run with (I believe) dbt-sqlserver==0.19.0. And we get an error. So we should def table this PR until 0.19.1, instead of cramming it prematurely into 0.19.0.1.

I haven't seen this error message before, maybe @jtcohen6 could explain. It seems it has to do with partial matches due to quoting differences? this might constitute a breaking change for users. What I don't understand is why:

  • it searches for an unquoted version, but
  • finds a quoted version, when
  • the quote policy is none, however
  • in all my debugging the relation was always showing up double-quoted (despite the False quote policy)
Compilation Error in model SAPHRPeopleDim (models/stage/SAPHRPeopleDim.sql)
  When searching for a relation, dbt found an approximate match. Instead of guessing 
  which relation to use, dbt will move on. Please delete "ajs_stg"."SAPHRPeopleDim", or rename it to be less ambiguous.
  Searched for: ajs_stg.saphrpeopledim
  Found: "ajs_stg"."SAPHRPeopleDim"
  
  > in macro materialization_view_default (macros/materializations/view/view.sql)
  > called by model SAPHRPeopleDim (models/stage/SAPHRPeopleDim.sql)

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 15, 2021

I haven't seen this error message before, maybe @jtcohen6 could explain. It seems it has to do with partial matches due to quoting differences? this might constitute a breaking change for users.

Yes, that's right. If dbt finds an existing relation in the database that is similar to a model's relation-to-e, but not exactly the same (different quoting/casing), dbt will raise a compilation error to avoid accidentally overwriting a view/table it doesn't actually own. Here's the source for that behavior.

It sounds like you ran this same project previously, using dbt-sqlserver==0.19.0, when dbt was quoting relations. So, now that quoting has been turned off, dbt has identified the discrepancy. I agree this is a breaking change and should probably wait until v0.20.0.

in all my debugging the relation was always showing up double-quoted (despite the False quote poli

This is the part I'm not sure about! If the quote_policy is now False across the board, dbt should not be quoting relation components. Which brings me to ask: What's the actual reason to turn off quoting on SQLServer? I think I missed this part of the puzzle. The reason we turn it off for Snowflake is because, if an identifier name is quoted and contains non-uppercase characters, selecting it is a nightmare.

never quote any part by default (but I see zero evidence that changing this makes a difference?)

The reason to keep quoting is that, on several databases (e.g. Postgres), identifier names that start with certain characters must be quoted: dev_jerco."123Table". On other databases (e.g. Snowflake), quoting adds so much hassle that it just isn't worth the support for special characters. Some databases (e.g. BigQuery) don't ever support those kind of special-character identifier names, even when quoted.

@dataders
Copy link
Collaborator Author

What's the actual reason to turn off quoting on SQLServer?

great question @jtcohen6. Looks like there wasn't a good reason and I just did it to try and solve a bug that I stirred up along the way. It isn't necessary. It is however necessary to change our relation part quote character from a single quote to double.

@mikaelene I tested this on a local project and it works without issue. ready for for your further review/approval.

Copy link
Collaborator

@mikaelene mikaelene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems sane. But It is a lot of code to review. I get a little to comfortable with the CI tests sometimes. I think I introduced a bug in snapshots. #108 . So I will review this when I have time.

class SQLServerRelation(BaseRelation):
quote_policy: SQLServerQuotePolicy = SQLServerQuotePolicy()
include_policy: SQLServerIncludePolicy = SQLServerIncludePolicy()
quote_character: str = '"'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a guru in quoting :-). Did we have ' before?

WHERE name='{{ from_relation.schema }}_{{ from_relation.identifier }}_cci' and object_id = OBJECT_ID('{{ from_relation.schema }}.{{ to_relation.identifier }}'))
EXEC sp_rename N'{{ from_relation.schema }}.{{ to_relation.identifier }}.{{ from_relation.schema }}_{{ from_relation.identifier }}_cci', N'{{ from_relation.schema }}_{{ to_relation.identifier }}_cci', N'INDEX'
WHERE name='{{ from_relation.schema }}_{{ from_relation.identifier }}_cci' and object_id = OBJECT_ID('{{ from_relation }}'))
EXEC sp_rename N'{{ from_relation }}.{{ from_relation.schema }}_{{ from_relation.identifier }}_cci', N'{{ from_relation.schema }}_{{ to_relation.identifier }}_cci', N'INDEX'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we prefixing with the entire relation now? That includes double-quoted schema and relation names, which is creating really bizarre output in my run:

EXEC sp_rename N'"dbo"."APPT_COMPLETED_MONTHLY__dbt_tmp".dbo_APPT_COMPLETED_MONTHLY__dbt_tmp_cci', N'dbo_APPT_COMPLETED_MONTHLY_cci', N'INDEX'

@panasenco
Copy link

@swanderz I've spent several hours today trying to reimplement database switching (see PR #111) in this branch. However, I've run into a wall with the function adapter.drop_relation. The macro create_table_as does the following steps:

  1. Drop the temporary view if it exists
  2. Create the temporary view
  3. Drop the temporary table if it exists
  4. Insert temporary view contents into the temporary table
  5. Drop the temporary view again

Because the temporary view is dropped twice, something screwy happens with the relation cache and the second drop view statement doesn't happen, leaving the temporary view in the database. I'm not certain what exactly is happening because honestly the relation cache is difficult to understand.

My thought is to just drop all uses of adapter.drop_relation and to do it with straightforward SQL rather than trying to understand dbt's relation cache. I don't see the point of the cache anyway, it shaves off milliseconds at most... But I want to check in with you before I do that and ask whether that's a direction you're willing to go.

Is it OK to replace all uses of adapter.drop_relation with some straightforward macro?

@dataders
Copy link
Collaborator Author

dataders commented Mar 3, 2021

@panasenco -- I very much appreciate you taking the time to try and make your PR with my refactor. TIL, I had no idea about the relation cache.

Totally open to the "straightforward SQL" approach, but any chance you could open up a different PR with the code you were stuck on so we can look at the error you were seeing? If anything, for curiosity's sake.

@sdebruyn sdebruyn added chore enhancement New feature or request labels May 22, 2022
@dataders dataders closed this Oct 7, 2022
@sdebruyn sdebruyn deleted the relation-construction branch May 17, 2023 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clean up relation construction
5 participants