Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt project v2 #2312

Merged
merged 17 commits into from
Apr 22, 2020
Merged

dbt project v2 #2312

merged 17 commits into from
Apr 22, 2020

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Apr 10, 2020

resolves #2300

Description

This PR adds support for a dbt_project.yml v2, as described in #2300

Details:

  • The single source of truth for values in the config is now the NodeConfig and its descendants. The v1 config still uses a special class that implements the legacy behavior, for backwards compatibility reasons.
  • Cleaned up snapshot config definitions to be mutually exclusive and serialize/deserialize more easily and consistently.
  • If a node's project and the active project are different config versions, the higher version one is "downgraded" into v1
  • Adapter specific configs are now typed JsonSchemaMixins
    • Removed the deprecated bigquery functionality of partition_by supporting strings
  • RuntimeConfig is now explicitly aware of its dependency packages, removing the need to pass the collection of packages around
  • I moved a number of things around
  • Added type annotations for many modules that previously didn't have them
  • Sources now have Configs
    • You can set the enabled flag
  • Added a number of v1/v2 compatibility shims that we can remove when this is done
  • Split configuration file rendering into components

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Apr 10, 2020
@beckjake beckjake force-pushed the feature/dbt-project-v2 branch 3 times, most recently from e5f656e to 7087709 Compare April 16, 2020 19:06
@beckjake beckjake marked this pull request as ready for review April 16, 2020 19:57
@beckjake beckjake changed the title (WIP) dbt project v2 dbt project v2 Apr 16, 2020
@beckjake beckjake force-pushed the feature/dbt-project-v2 branch 2 times, most recently from 1a58574 to 2618ae9 Compare April 16, 2020 20:23
Jacob Beck added 7 commits April 20, 2020 11:18
- Rename project contract to projectv1
- Add load_dependencies and dependencies attrs to RuntimeConfig
  - Moved project loading from parser/manifest.py into methods of RuntimeConfig
- Move methods that check for unused configs from Project to RuntimeConfig
  - they require the adapter type to be correct
- Rework Var in provider vs base context to handle both styles of Var configuration
Renamed config.config to config.build_config_dict()
mypy annotations
rename a module to avoid duplicating names
Configs all get defaults, a base class
Added SourceConfig
Introduced the idea of MergeBehavior and ShowBehavior
Implemented updates/merges based on MergeBehavior for configs
Implemented typed adapter-specific configs and behavior
- Backwards compatiblity shims from v2 to v1 configs
- config merging for v2
- compatibility shim for parsing/contexts
- defer var lookups as late as possible
- fixed ContextConfigType to be a proper Union of the two context config types
- Fix adapter configs to be proper dataclasses
- fix unused config paths error
- make v2 config parsing not alter its source data
@beckjake
Copy link
Contributor Author

I've rebased this again to remove the merge conflict and (I hope) fix the last of those minor test failures in the process.

Jacob Beck added 4 commits April 20, 2020 11:56
Docs now get the schema.yml parser vars
The SchemaParser now renders the entire schema.yml except tests/descriptions
Split up hooks to expose the hook dict
Remove some odd imports
Also add some missing agate methods to stubs
fix tests from splitting the config renderer
Bump included projects to v2
fix tests
@drewbanin
Copy link
Contributor

Things to check:

  • warning for configs that are useless
  • update description to not show bit about quoting configs (this will help me later @ documentation time)
  • replace config: block with + syntax. Make this optional, but always allowed for config declarations

@beckjake
Copy link
Contributor Author

warning for configs that are useless

Locally this seems to work ok for me now, but I did mess with that code in this most recent change!

@beckjake
Copy link
Contributor Author

Ok, I've switched out config for an optional + prefix - we test for it in the column types test, so I feel pretty good about it.

…it on dict values

Make the package-duplicates test use a local dependency
@drewbanin
Copy link
Contributor

Ok - I played around with this some more. Can we:

  • Make sources: participate in useless config checking? It looks like useless source configs are not currently checked / warned
  • Disallow specifying vars: in the dbt_project.yml v2 spec models: section?
    • This is going to be the most likely failure mode IMO -- if a user specifies that they're using config version 2, then we should at least warn about vars: declarations as they will be unused

I'm going to start digging into the code in this diff. Will follow up with other thoughts inline while reviewing / testing. Looking good so far though!

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments after taking a look through the diff. I feel good about the code changes here, and I'd like to just do a tiny bit more functional testing before we merge this thing.

👍


def _flatten_config(dct: Dict[str, Any]):
result = {}
for key, value in dct.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

lists of lists of strings, where each inner list of strings represents
a configured path in the resource.
"""
return {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add sources in here too?

validate=validate
)

# TODO: is fqn[0] always the project name?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👁️ 👁️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah... the answer is "yes, but I don't trust that to always be true"

class DbtProjectYamlDeprecation(DBTDeprecation):
_name = 'dbt-project-yaml-v1'
_description = '''\
The existing dbt_project.yml format has been deprecated. dbt_project.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a link + update copy here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go for something like:

dbt v0.17.0 introduces a new config format for the dbt_project.yml file. Support
for the existing version 1 format will be removed in a future release of dbt.
The following packages are currently configured with config version 1:
 - debug
 - some_package

For upgrading instructions, consult the documentation:
  https://docs.getdbt.com/docs/guides/migration-guide/upgrading-to-0-17-0

partition_by: Optional[Dict[str, Any]] = None
kms_key_name: Optional[str] = None
labels: Optional[Dict[str, str]] = None
# TODO: should this accept `str` and `int`, too?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for partitions? I think a list of ints would also be valid for some use cases, but users can always pipe a list of ints to | map('string') or similar if needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. That sounds reasonable to me - I think it previously did, but I don't think it necessarily was intended to do so.

@drewbanin
Copy link
Contributor

Some things I've noticed while testing:


Can we specify which packages are configured with v1 dbt_project.yml files? I was confused by the warning I was seeing locally until I realized that deprecation warning was coming from a dependency package.


It looks like the vars dict we're supplying to nodes includes the top-level package name keys. I see this error when misconfigured a model's vars:

Compilation Error in model vars (models/vars/enabled.sql)
  Required var 'enabled' not found in config:
  Vars supplied to enabled = {
      "debug": {
          "key": "ok",
          "mydict": {
              "a": 1
          }
      },
      "key": "ok",
      "mydict": {
          "a": 1
      }
  }

  > in model vars (models/vars/enabled.sql)
  > called by model vars (models/vars/enabled.sql)

My corresponding vars: config looks like:

vars:
    debug:
        key: ok
        mydict: {a: 1}

Should debug (the package name) be a member in the var() dict?


We should add more error handling around var values. The following vars: values make dbt fail hard:

vars:

vars:
    debug: something

vars:
    debug:

If we can figure out a way to warn for vars: found in a v1 dbt_project.yml file, I think it would make it much easier for folks to upgrade to the new spec!

@beckjake
Copy link
Contributor Author

beckjake commented Apr 21, 2020

Should debug (the package name) be a member in the var() dict?

I came around (for compatibility reasons) to the idea of vars having two tiers - global and package-scoped. I don't know if it's the best idea, but it seemed to make it easier to migrate. So yes, it could be! We could change it though - I just thought sometimes people want to set vars for everything, not just a single project at a time.

Handle empty vars dict
Check sources for unused configs
  - add a test
Warn when dbt finds "vars" in "models", "seeds", etc blocks
  - add a test
Clean up enabled/disabled code to share between sources and nodes
  - only log downstream tests at debug level when a source is disabled
  - include the offending project name in the v1 deprecation message
  - Fix tests that care about error messages, add new ones
@drewbanin
Copy link
Contributor

I came around (for compatibility reasons) to the idea of vars having two tiers - global and package-scoped.

Yep, that makes sense. I didn't consider that case when I last looked at this last, but I buy that global and package scoped tiers should be separately addressable. This does get wonky if a var in a package masks a package name, eg:

vars:
  my_package:
    other_package: something
  other_package:
    myvar: true

Here, the my_package.other_package var is going to mask the vars defined for other_package. I'm ok with that, but it would be even better if we added a warning :)

@beckjake
Copy link
Contributor Author

In your example, a model in my_package that looked for var('other_package') would get back 'something'. A model outside my_package that looked for var('other_package') would get back {'myvar': True}.

I'm not sure if that's what I'd expect, but I don't think I'd be surprised?

@@ -331,7 +331,16 @@ def load_all(
macro_hook: Callable[[Manifest], Any],
) -> Manifest:
with PARSING_STATE:
projects = load_all_projects(root_config)
projects = root_config.load_dependencies()
v1_configs = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing this out locally, and I'm (strangely) only seeing one project or package name in the deprecation warning. My setup:

  • has a v1 dbt_project.yml file in my root project
  • has a v1 dbt_project.yml file in a dependency package

Any idea what might be going wrong?

…ion warnings on graph runnable tasks are better
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM with one annoying caveat: can we actually remove the warning that renders if you have a vars block in a v2 dbt_project.yml file? I didn't realize this before, but the existing "useless config" warning does a good job of exposing that the vars dict is unused. Apologies for flip-flopping on that one.

Merge it when that very last change is addressed. Nice work here!!

@beckjake beckjake merged commit 616b32e into dev/octavius-catto Apr 22, 2020
@beckjake beckjake deleted the feature/dbt-project-v2 branch April 22, 2020 13:41
# models have two things to avoid
if first in {'seeds', 'models', 'snapshots'}:
if first in {'seeds', 'models', 'snapshots', 'seeds'}:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beckjake it's already in there (and line 98)

QMalcolm added a commit that referenced this pull request Apr 8, 2024
The file `searcher.py` was added, as an empty file, four years ago in
[#2312](https://github.com/dbt-labs/dbt-core/pull/2312/files#diff-c8e9e62cf63356f44fe1a2b0272b239d7a650c57911477ed4549b15ee3de16fa).
Since then it has never been touched or added to. I'm not sure of the
initial purpose of the file as it never contained anything and I have not
found any documentation in reference to it. It's safe to say, it can go
away.
QMalcolm added a commit that referenced this pull request Apr 10, 2024
#9878)

* Delete empty `test_compiler.py` file

Over a year ago in [#7008](https://github.com/dbt-labs/dbt-core/pull/7008/files#diff-c10753a61a1d8f973ec3206206d72b3c5200e41673ab2c6b256f333103eadb8f)
the tests in `test_compiler.py` got moved into other testing files. Since
then this file has been sitting around empty. It is probably a reasonable
time to say goodbye to it.

* Delete empty `searcher.py` file

The file `searcher.py` was added, as an empty file, four years ago in
[#2312](https://github.com/dbt-labs/dbt-core/pull/2312/files#diff-c8e9e62cf63356f44fe1a2b0272b239d7a650c57911477ed4549b15ee3de16fa).
Since then it has never been touched or added to. I'm not sure of the
initial purpose of the file as it never contained anything and I have not
found any documentation in reference to it. It's safe to say, it can go
away.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dbt_project.yml v2
3 participants