Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1922] Support 'constraints' as a model-level property #6754

Closed
Tracked by #6747
MichelleArk opened this issue Jan 26, 2023 · 3 comments · Fixed by #7230
Closed
Tracked by #6747

[CT-1922] Support 'constraints' as a model-level property #6754

MichelleArk opened this issue Jan 26, 2023 · 3 comments · Fixed by #7230
Assignees
Labels
multi_project user docs [docs.getdbt.com] Needs better documentation

Comments

@MichelleArk
Copy link
Contributor

MichelleArk commented Jan 26, 2023

In addition to constraints as a column-level property, support constraints as a model-level attribute for row-level boolean expressions that require multiple columns, or combined primary keys (unenforced & informational-only).

Parsing of model-level constraints attribute will be addressed by: #7066

This issue builds on top of #7066 to use the model-level constraints property in the get_columns_spec_ddl macro to inject additional supported constraints. Postgres spec written in the example below.

Testing: The existing test__constraints_ddl test pattern can be user to create a new unit test test__model_constraints_ddl to incorporate model-level constraint configuration: https://github.com/dbt-labs/dbt-core/blob/main/tests/adapter/dbt/tests/adapter/constraints/test_constraints.py#L126. A new unit test probably makes most sense so adapters can skip/pass this unit test as necessary if model-level constraint configuration is not supported (e.g. bigquery)

@github-actions github-actions bot changed the title Support 'constraints' as a model-level config [CT-1922] Support 'constraints' as a model-level config Jan 26, 2023
@MichelleArk MichelleArk changed the title [CT-1922] Support 'constraints' as a model-level config [CT-1922] Support 'contracted' as a model-level config Jan 26, 2023
@MichelleArk MichelleArk changed the title [CT-1922] Support 'contracted' as a model-level config [CT-1922] Support 'ccontract' as a model-level config Feb 14, 2023
@MichelleArk MichelleArk changed the title [CT-1922] Support 'ccontract' as a model-level config [CT-1922] Support 'contract' as a model-level config Feb 14, 2023
@MichelleArk MichelleArk changed the title [CT-1922] Support 'contract' as a model-level config [CT-1922] Support 'constraints' as a model-level config Feb 23, 2023
@MichelleArk
Copy link
Contributor Author

From #7066:

class ConstraintOption(StrEnum):
    check = "check"
    not_null = "not_null"
    unique = "unique"
    primary_key = "primary_key"
    foreign_key = "foreign_key"
    custom = "custom"  # I have no idea -- but let's leave it open for the wacky & wild data platforms out there

ColumnLevelConstraint:
    type: ConstraintOption
    name: Optional[str]
    warn_unenforced: bool = True  # raise a warning if this constraint is not enforced by this data platform (but will still be included in templated DDL)
    warn_unsupported: bool = True  # raise a warning if this constraint is not supported by this data platform (and will be excluded from templated DDL)
    expression: Optional[str] # free for all. required if constraint is 'check' or 'foreign_key' -- can we validate via post-init ?
    # additional properties allowed: yes! there are going to be lots of constraint-specific and platform-specific shenanigans

# the same, with an added 'columns' attribute (which isn't needed when the constraint is defined for one column)
ModelLevelConstraint(ColumnLevelConstraint):
    columns: List[str]
models:
  constraints: Optional[List[ModelLevelConstraint]]

If constraints is set a the model level, it should be a model-level attribute, not a "config." This means it isn't settable in {{ config() }} or in dbt_project.yml, and that's okay.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 27, 2023

Example

(Borrowing heavily from Postgres docs: https://www.postgresql.org/docs/current/ddl-constraints.html)

models:
  - name: my_model
    columns:
      - name: first_id
        data_type: int
      # ... rest of columns ...
    constraints:
      - type: primary_key
        columns: [first_id, second_id]
      - type: unique
        columns: [product_id, order_no]
        name: must_be_different
      - type: foreign_key
        columns: [column_a, column_b]
        expression: references other_table (another_x, another_y)  # hmmm do we need to be able to support 'ref' here?
      - type: check
        columns: [discounted_price]  # not technically needed, but I sort of like it as metadata?
        expression: discounted_price > 0
      - type: check
        columns: [price, discounted_price]  # as above
        name: valid_discount
        expression: discounted_price > 0

This should template out DDL like:

create table dbt_jcohen.my_model (
    # all of the columns first
    first_id int,
    ...
    
    # then the model-level constraints
    primary key (first_id, second_id),
    constraint must_be_different unique (product_id, order_no),
    foreign key (column_a, column_b) references other_table (another_x, another_y),
    check (discounted_price > 0),
    constraint valid_discount check (price > discounted_price)
);

Try this yourself in psql:

jerco=# create schema if not exists dbt_jerco;
jerco=# create table dbt_jerco.other_table (another_x int, another_y int, unique (another_x, another_y));
CREATE TABLE
jerco=# create table dbt_jerco.my_model (
    first_id int,
    second_id int,
    product_id int,
    order_no int,
    column_a int,
    column_b int,
    price float,
    discounted_price float,
    primary key (first_id, second_id),
    constraint must_be_different unique (product_id, order_no),
    foreign key (column_a, column_b) references dbt_jerco.other_table (another_x, another_y),
    check (discounted_price > 0),
    constraint valid_discount check (price > discounted_price)
);
CREATE TABLE

How should users actually choose to define these, as column-level versus model-level constraints? To quote from the Postgres docs:

It's a matter of taste.

Notes

  • name is always optional. On some data platforms, you can provide a custom name or not. On other data platforms (e.g. Databricks), we'll need to auto-generate hashed names if the user doesn't provide one — this is already how the current implementation works.
  • You don't see not_null above because not_null is only supported at the column level, never the model level
  • check-type constraint doesn't actually require the columns field, if defined at the model level — but I'm sort of inclined to keep it there, and encourage people to use it anyway
  • We've defined constraints as a top-level property, not a config. Meaning that it can't be set via either in-file {{config()}} or in dbt_project.yml. Let's keep having the larger conversation around configs vs. properties; out of scope for this specific issue.
  • Ugh!! If we're serious about supporting foreign_key constraints, I think we need a way to let users define ref in the expression field, like we do for the to field in relationships tests. Otherwise it can only be defined on a model in the same/default database/schema. (Maybe that's a reasonable limitation...?)

@jtcohen6
Copy link
Contributor

jtcohen6 commented Mar 1, 2023

Ugh!! If we're serious about supporting foreign_key constraints, I think we need a way to let users define ref in the expression field, like we do for the to field in relationships tests. Otherwise it can only be defined on a model in the same/default database/schema. (Maybe that's a reasonable limitation...?)

Officially kicking this out of scope for now. If it's really important to support foreign_key constraints in a first-class way, we can revisit in the future. To be honest, the appetite for us doing that might only come about if/when analytical data platforms support enforcing FKs for real.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi_project user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants