Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for dbt-audit-helper issue #84 #85

Closed
wants to merge 3 commits into from

Conversation

igor-lobanov-maersk
Copy link

Escaping "count" field using double quotes, which I believe is SQL standard.

Description & motivation

I believe this is all that's required to fix #84

Checklist

  • I have verified that these changes work locally
  • I have updated the README.md (not applicable)
  • I have added tests & descriptions to my models (not applicable)

Escaping "count" field using double quotes, which I believe is SQL standard.
@igor-lobanov-maersk
Copy link
Author

Ok, see that this didn't work for Snowflake. Will take another look at syntax options.

@dbeatty10
Copy link
Contributor

@igor-lobanov-maersk I suspect that what you're seeing in the CI failures for dbt-snowflake is a result of the default quoting rules for dbt-snowflake:

image

You're changing it from being an unquoted identifier into a quoted one, but dbt-snowflake is configured to treat identifiers as unquoted.

I don't immediately know how to make all the adapters happy here.

Crux of the issue

The crux seems to be that (by default) dbt-snowflake doesn't want quoted identifiers, but dbt-dremio has count as a reserved keyword for identifiers.

Some resources to help research and troubleshoot:

One idea

Maybe something like this (using quote config):

models:
  - name: compare_queries
    columns:
      - name: count
        quote: true
    tests:
      - dbt_utils.equality:
          compare_model: ref('expected_results__compare_relations_without_exclude')

@igor-lobanov-maersk
Copy link
Author

Thank you for getting back to me @dbeatty10, this is a good pointer. I will explore this further. It's a bit difficult for me to follow as I don't have access to a Snowflake environment, but I can try a few more things against the CI build.

Another way this could be addressed is if count was actually renamed to something less contentious like row_count. This should work everywhere. However, this would change 'public' interface of this module. Not sure to what extent it's a problem. What do you think?

@dbeatty10
Copy link
Contributor

Another way this could be addressed is if count was actually renamed to something less contentious like row_count. This should work everywhere. However, this would change 'public' interface of this module. Not sure to what extent it's a problem. What do you think?

This is an insightful idea and you're spot-on on the affect it would have on the public interface.

Since this has been a stable public interface for 4+ years, we'd be hesitant to change it.

We'd probably want to check if even merely changing this from unquoted to quoted would affect the public interface that folks may be using.

Let's brainstorm some ideas

First, I'm going to just list out a bunch of ideas (and temporarily withhold judgement if they are "good" or not):

  1. Change the public interface of dbt-audit-helper to not use any SQL reserved keywords (like count) as column names
  2. Dremio allow the SQL reserved keyword count as a column name (similarly to how it is apparently allowed by Snowflake, etc).
  3. Put specific hard-coded logic within dbt-audit-helper that adds quotes only when the adapter is dremio
  4. Put specific hard-coded logic within dbt-audit-helper that adds quotes for all adapters except for snowflake
  5. Create an adapter-level interface that specifies which words are empirically not accepted as unquoted table or column name (like this), and use this interface to decide whether to quote or not
  6. Create an adapter-level interface that specifies if the unquoted identifiers are uppercased, lowercased, or samecased within their information_schema.columns-compatible metadata store.
  7. Make no changes and accept that dbt-dremio users are not able to use the audit_helper.compare_queries macro directly as-is

Considering trade-offs

I'd prefer not to change the public interface since it's been stable. Although count is a reserved keyword, it seems many databases allow it to be unquoted in practice (all the databases tested within this package as well as all the databases listed here):

image

I'm not guessing the Dremio maintainers would want to change they way they are handling count, but that would be easiest for everyone else, and it would put them more in line with how other databases are handling it. It would also be a non-breaking change for users of Dremio. But we can't assume or control if that will happen.

I'd also strongly prefer not to add adapter-specific logic within audit_helper related specifically to Dremio. It would create maintenance issues of untested logic, and dbt packages should really be agnostic of details related to adapters.

Creating an adapter-level interface is actually my favorite idea, but it would be a more involved and long-term project than the scope of this PR.

Using dispatch is the most "dbt-onic" approach. It's takes a little bit more explanation for end users to use it, and it is necessary within each dbt project.

Let's make a decision

After reviewing all of the options and their trade-offs, dispatch is the way to go here. There's two options from the perspective of dbt-dremio maintainers:

Long-term, I'd still be quite interested in an adapter-level interface. i.e. an interface that can identify words that empirically determined to be invalid without quoting and/or an interface that surfaces how the database stores the casing of unquoted columns within its metadata.

Copy link

This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 24, 2024
Copy link

github-actions bot commented May 1, 2024

Although we are closing this PR as stale, it can still be reopened to continue development. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

compare_queries does not work on Dremio
2 participants