-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] CT-201 source config WIP #5003
Conversation
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting close! I left a few comments that may provide helpful (or least historical) context
base=False, | ||
patch_config_dict=target.source.config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need access to both:
target.source.config
= config defined at the source leveltarget.table.config
= config defined at the source table level
I think there are two options here:
- Merge those configs together (preferring table-level over source-level), before passing into
patch_config_dict
- Modify
calculate_node_config
to accept multiple "levels" of patches, so that all calculation happens in that same method: source table > source > project-level config
@@ -465,6 +465,8 @@ def parse_project( | |||
else: | |||
dct = block.file.dict_from_yaml | |||
parser.parse_file(block, dct=dct) | |||
# Came out of here with UnpatchedSourceDefinition containing configs at the source level | |||
# and not configs at the table level (as expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment re: patch_config_dict
passed into calculate_node_config
default=None, | ||
metadata=CompareBehavior.Exclude.meta(), | ||
) | ||
# TODO what type is this? docs say: "<column_name_or_expression>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String is right.
Food for thought: These dataclass attributes are duplicative with the ones already defined:
Given our aim here is backwards compatibility, that duplication may be unavoidable
@@ -335,6 +335,39 @@ def replace(self, **kwargs): | |||
@dataclass | |||
class SourceConfig(BaseConfig): | |||
enabled: bool = True | |||
quoting: Dict[str, Any] = field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move the addition of these new configs into a separate PR? If the goal is just to support setting enabled
as a config on the source/table levels
) | ||
|
||
unrendered_config = self._generate_source_config( | ||
fqn=target.fqn, | ||
target=target, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Background context on rendered: bool
, UnrenderedConfigGenerator
, and unrendered_config
, since these are pretty confusing!
This exists to limit "false positives" during state comparison, i.e. to power state:modified
selection in Slim CI runs ("just build what's changed"): https://docs.getdbt.com/reference/node-selection/state-comparison-caveats#false-positives
The comparison happens in the same_config
method here:
dbt-core/core/dbt/contracts/graph/parsed.py
Lines 317 to 321 in 899b0ef
def same_config(self, old: T) -> bool: | |
return self.config.same_contents( | |
self.unrendered_config, | |
old.unrendered_config, | |
) |
Idea being, it's not uncommon for users to have environment-based configuration like:
sources:
- name: my_source
config:
database: "{{ 'prod_raw' if target.name == 'prod' else 'sampled_dev' }}"
schema: "{{ env_var('DBT_SOURCE_SCHEMA') }}"
# etc
The rendered versions of those configs will be different based on the values of env vars or target
values. The thing we actually want to detect changes in is the raw un-rendered Jinja expression. If that's changed, it's because someone edited the thing.
This works for configs set in dbt_project.yml
, which are stored as unrendered Jinja expressions, and passed twice into calculate_node_config
(once with rendering, once without).
However, we've already rendered all fields in (non-project) yaml files, during initial parsing:
dbt-core/core/dbt/parser/schemas.py
Lines 633 to 635 in 899b0ef
# Render the data (except for tests and descriptions). | |
# See the SchemaYamlRenderer | |
entry = self.render_entry(entry) |
So we're passing already-rendered Jinja expressions into calculate_node_config
, and we never get access to their unrendered forms. We'd need a subtle reordering of parsing/rendering to make it work as intended.
This PR has been marked as Stale because it has been open for 180 days with no activity. If you would like the PR to remain open, please remove the stale label or comment on the PR, or it will be closed in 7 days. |
resolves #3662
Description
Draft PR for feature branch.
Some context over in #4960
Checklist