-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contracts: Handle struct column specified both at root and nested levels + arrays of structs #806
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-bigquery contributing guide. |
…els + arrays of structs
bbe8814
to
bac87f5
Compare
@colin-rogers-dbt - I've tagged you for review as this is follow-on work from #738. |
@@ -252,8 +273,22 @@ def _format_nested_data_type(unformatted_nested_data_type: Union[str, Dict[str, | |||
if isinstance(unformatted_nested_data_type, str): | |||
return unformatted_nested_data_type | |||
else: | |||
parent_data_type, *parent_constraints = unformatted_nested_data_type.pop( | |||
_PARENT_DATA_TYPE_KEY, "" | |||
).split() or [None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need the or [None]
here? I thought I tried this in a toy example where .pop()
returned ""
and it still worked as anticipated. But I might have missed some edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately not :( it broke a couple unit tests with the following error:
>>> foo, *bar = "".split()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected at least 1, got 0)
…t_empty_schema_sql on flattened columns representation
{%- endfor -%} | ||
{%- if (col_err | length) > 0 -%} | ||
{{ exceptions.column_type_missing(column_names=col_err) }} | ||
{%- endif -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR introduces some unsatisfying duplication in the bigquery version of the get_empty_schema_sql
macro.
However, it's necessary to do this before the columns get nested to provide granular error messages as opposed to deferring it to the handling in default__get_empty_schema_sql
once the columns are already nested and it is not possible to determine which nested field had a missing data_type value.
Once the nested handling of model contracts moves into core, this duplication should be resolved and only present in the default__get_empty_schema_sql
implementation.
🎩 - Missing data_type: ❯ dbt run --select nested_fields --project-dir ~/basic-dbt
17:16:08 Running with dbt=1.6.0-b8
17:16:08 Registered adapter: bigquery=1.6.0-b4
17:16:08 Found 5 models, 1 test, 0 snapshots, 0 analyses, 498 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
17:16:08
17:16:10 Concurrency: 1 threads (target='dev')
17:16:10
17:16:10 1 of 1 START sql table model dbt_marky.nested_fields ........................... [RUN]
17:16:10 1 of 1 ERROR creating sql table model dbt_marky.nested_fields .................. [ERROR in 0.83s]
17:16:10
17:16:10 Finished running 1 table model in 0 hours 0 minutes and 2.01 seconds (2.01s).
17:16:11
17:16:11 Completed with 1 error and 0 warnings:
17:16:11
17:16:11 Compilation Error in model nested_fields (models/nested_fields.sql)
17:16:11 Contracted models require data_type to be defined for each column. Please ensure that the column name and data_type are defined within the YAML configuration for the ['a', 'b.name'] column(s).
17:16:11
17:16:11 > in macro bigquery__get_empty_schema_sql (macros/utils/get_columns_spec_ddl.sql)
... |
resolves #782
resolves #781
Description
Handles scenarios where a nested column type may be specified both at the struct-level (e.g.
b.id: int
,b.name: string
) and parent levelb: struct
. Additionally handles when parent level data_type is used to specify an array of structs.This PR resolves two issues -- initially I had aimed to just tackle 782 on its own by ignoring a top-level data_type if specified (preserving its constraints) but the top-level array specification was just a couple additional lines at that point, and figured reviewing / testing it all together would be more straightforward since the issues and affected codepaths were very closely related.
Checklist
changie new
to create a changelog entry🎩 parent array
nested_array_field.sql
🎩 parent specification with constraints
nested_fields.sql