Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package project yml configs overriding root project configs #3698

Closed
1 of 5 tasks
fivetran-jamie opened this issue Aug 5, 2021 · 6 comments
Closed
1 of 5 tasks

Package project yml configs overriding root project configs #3698

fivetran-jamie opened this issue Aug 5, 2021 · 6 comments
Labels
bug Something isn't working packages Functionality for interacting with installed packages stale Issues that have gone stale vars

Comments

@fivetran-jamie
Copy link

Describe the bug

A clear and concise description of what the bug is. What command did you run? What happened?

I installed this branch of the zendesk_source package in my root project. This branch uses the custom collect_freshness macro we built in order to disable (or rather, automatically pass) freshness tests for source tables that the user has elected to disable (note: the source entity itself is not configured/disabled, but rather the staging models drawing from these source tables are disabled). When variables (such as using_ticket_form_history) are defined in the package's own dbt_project.yml file, the package-provided value overrides any value provided in the root project file. I am unable to disable/pass freshness tests (using dbt source snapshot-freshness --select zendesk.ticket_form_history) by setting the appropriate variable to false in my root project (but if i change the package-defined variable to false, it passes).

Note: I am only seeing this conflict with freshness tests. When i provide using_ticket_form_history in my root project (false in my project yml and true in the package yml) and execute dbt run, the respective staging models that are appropriately disabled by the variable. So perhaps this has to do with the collect_freshness macro being a part of dbt core rather than utils or anything?

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

  1. Install the bug/disable-freshness branch of the zendesk_source package
  2. Add the below to your root project file
vars:
  zendesk_source:
    using_ticket_form_history: false
  1. Add the following to the dbt_modules/zendesk_source/dbt_project.yml
vars:
  zendesk_source:
    using_ticket_form_history: true
  1. Run dbt source snapshot-freshness --select zendesk.ticket_form_history
    Expected: freshness test should pass, since we have chosen to disable/automatically pass freshness tests for this table
    Actual: freshness test fails (presuming you do not have this table or just have stale data)

  2. Change using_ticket_form_history to false in dbt_modules/zendesk_source/dbt_project.yml

  3. Run dbt source snapshot-freshness --select zendesk.ticket_form_history
    This passes, which makes it seem like dbt is indeed grabbing the variable value from the package yml

Removing the variable from the package project yml allows me to disable freshness tests from my root project yml, but a good deal of our packages have these default-set variables in their dbt_project.yml files, so just trying to figure out if we need to remove those in order for this solution to continue to work.

Expected behavior

A clear and concise description of what you expected to happen.
(see bold text in steps to reproduce ^)

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.20.0
   latest version: 0.20.0

Up to date!

Plugins:
  - bigquery: 0.20.0
  - snowflake: 0.20.0
  - redshift: 0.20.0
  - postgres: 0.20.0

The operating system you're using:
MacOS
The output of python --version:
Python 3.8.9

Additional context

Add any other context about the problem here.

@fivetran-jamie fivetran-jamie added bug Something isn't working triage labels Aug 5, 2021
@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 29, 2021

@fivetran-jamie Thanks for the really detailed write-up! Sorry for the delay getting back to you. I've been able to reproduce the bug, it's just a real head-scratcher to figure out what's going on here.

I do think the peculiarities of the source freshness task have something to do with this one. Unlike other tasks, which generate a runtime context, compile SQL, and then execute that compiled SQL within a materialization, the freshness task jumps straight to the finish: find and execute a certain macro using a macro_context.

There are two things working right:

  • dbt picks the right macro, preferring fivetran_utils.collect_freshness over built-in dbt.collect_freshness
  • The macro_context does seem to be doing the right sort of var resolution (sometimes)

However:

  • I see different results when I pass the macro into dbt.clients.jinja.get_rendered vs. dbt.clients.jinja.MacroGenerator

I dropped a debugger in here:

        macro_context.update(context_override)
        if 'collect_freshness' in macro.unique_id:
            import ipdb; ipdb.set_trace()
        macro_function = Macro
        Generator(macro, macro_context)

When I just wrap the macro in get_rendered, it does the right thing (no error):

ipdb> from dbt.clients.jinja import get_rendered
ipdb> get_rendered("{{ fivetran_utils.collect_freshness('zendesk.ticket_form_history') }}", macro_context)
'| column         | data_type |\n| -------------- | --------- |\n| max_loaded_at  | DateTime  |\n| snapshotted_at | DateTime  |\n'

But when I follow the actual codepath, and execute the macro using self.connections.exception_handler, it clearly renders differently (raising an error):

ipdb> macro_function = MacroGenerator(macro, macro_context)
ipdb> with self.connections.exception_handler(f'macro {macro_name}'):
            result = macro_function(**kwargs)

*** dbt.exceptions.DatabaseException: Database Error
  relation "zendesk.ticket_form_history" does not exist
  LINE 42:     from "jerco"."zendesk"."ticket_form_history"
                    ^

When I try to just get the rendered variable value on its own, it doesn't appear to be accessible (??), but at least it shows me the correct value (false), as set in my root dbt_project.yml.

ipdb> get_rendered("{{ var('using_ticket_form_history') }}", macro_context)
*** dbt.exceptions.CompilationException: Compilation Error in macro collect_freshness (macros/collect_freshness.sql)
  Required var 'using_ticket_form_history' not found in config:
  Vars supplied to collect_freshness = {
      "zendesk_source": {
          "using_ticket_form_history": false
      }
  }

I want to confirm whether I'm arriving at the right distinction: get_rendered does one thing, MacroGenerator does a different thing, and that's at the heart of the matter about picking the wrong var value. Even if that distinction holds, I imagine it's a really just symptom of some really low-lying subtleties of the way we're constructing contexts + templates in dbt/clients/jinja.py.

@gshank I think you know the most about rendering contexts today, so I'd be curious to hear if what I've laid out above seems like the right thread to pull on, or if I've totally missed the boat.

@jtcohen6 jtcohen6 added packages Functionality for interacting with installed packages vars and removed triage labels Aug 29, 2021
@jtcohen6
Copy link
Contributor

I took another look at this one, just to see if anything new has clicked.

I think this is not specific to the execution of source freshness, as opposed to what I said above. After all, in manifest.json, dbt renders meta.is_enabled for the source as true! It prefers the value of the variable set in the package dbt_project.yml over the value of the variable set in the root project dbt_project.yml, when it should be the exact reverse. If we simplify the issue to that—yaml properties are rendered with the wrong var precedence—we might have a better shot at getting to the root cause here.

As a separate matter, I do think there's a better way to achieve this functionality, rather than overriding the collect_freshness macro, which can yield some surprising behavior for end users (e.g. slack thread). It's possible to disable freshness for a specific source table by setting freshness: source; I see that Fivetran packages often do this for certain tables. So how about something like:

      - name: ticket_form_history
        description: Ticket forms allow an admin to define a subset of ticket fields for display to both agents and end users.
        freshness: "{{ ({} if var('using_ticket_form_history', true) else none) | as_native }}"

This prevents the freshness check from running at all for this table when using_ticket_form_history: false. (Unfortunately, it still doesn't resolve the bug in this issue: at parse time, dbt is rendering the var with preference for the package value over the root project's.)

Ultimately, I'd like to see us do unto sources as we've done unto other resource types (#3662):

  • add a config property to sources, making it possible to set enabled for a given source, e.g. to a Jinja expression that leverages a var value
  • convert freshness into an honest-to-goodness config that can be set/overridden in the root project's dbt_project.yml (as an always-available last resort)

@bill-warner
Copy link

I am also running into what I think is the same issue when developing packages. Using the jaffle shop example on this branch.

Customers model enabled using the enabled_customers var within the models config

enabled_customers default value false set in package's dbt_project file.

enabled_customers set to true in root project, which imports the package.

Generic tests on customers model are also enabled using enabled_customers var. This is intended to stop the warning dbt would throw if the customers model was disabled in the root project:

[WARNING]: Test 'test.jaffle_shop.unique_customers_customer_id.c5af1ff4b1' (models/schema.yml) depends on a node named 'customers' which is disabled

With this setup, given enabled_customers is set to true in the root project, you would expect to be able to both run and test the customers model. Instead you can run the customer model but no tests are found. This suggests the vars value in the schema.yml seemingly is taken from the packages dbt_project.yml file while the model's config is taking the value from the root projects dbt_project.yml

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jun 15, 2022
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

@rlh1994
Copy link
Contributor

rlh1994 commented Sep 28, 2022

@jtcohen6 I understand this was closed for inactivity but do you know was there a planned resoltuion for this? I am seeing the same issue with variable priority - a variable defined in a package yaml is being prioritised over the project yaml overwrite when that variable is called from within a schema yaml within the package?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working packages Functionality for interacting with installed packages stale Issues that have gone stale vars
Projects
None yet
Development

No branches or pull requests

4 participants