[CT-1162] Blueprint optional project ref statement parameter #5826

dpguthrie · 2022-09-13T10:44:25Z

Success Criteria:

Links to existing files in dbt-core that need to be updated
Net new files in dbt-core and why it needs new files
Potential "gotcha" implementations based on the above(think: if we update this file here, we have to update 100 files downstream)
pseudo-code snippets on initial approach for the above to make this functionality work

dpguthrie · 2022-09-13T11:16:08Z

Related issues / PRs:

dpguthrie · 2022-09-13T11:58:37Z

Links to Existing Files (WIP)

Model uniqueness constraint across projects (core/dbt/parser/manifest.py#971)
Some possibilities related to the target_model_package which we currently allow as an argument to ref:
- Here's where we begin to process the ref and the package name comes into play (dbt/parser/manifest.py)
- The manifest is then responsible for resolving this ref with an internal method here. This method accepts the target_model_package as an Optional argument.

dpguthrie · 2022-09-19T20:22:04Z

Progress Update

Question 1: What do we need to do to make `ref` accept a `project` parameter?

Answer: Nothing! Currently, the ref argument accepts two arguments:

target_model_package
target_model_name

The logic exists within _process_refs_for_node

if len(ref) == 1:
    target_model_name = ref[0]
elif len(ref) == 2:
    target_model_package, target_model_name = ref

However, the projects are scoped to what we've defined in our packages.yml file as well as internal packages. The upstream project is not a package though. We don't need to have the project locally but just a representation of it, something like a pared-down manifest.json file.

Question 2: What do we need to do then to resolve a reference to an upstream contract?

This is where it starts to get tricky.

The _process_refs_for_node function mentioned above will eventually call resolve_ref, which is a method on the Manifest class. This eventually calls an internal object called ref_lookup, which is a pointer to a RefableLookup object on the manifest instance. This object has a storage property that is populated from the nodes in the instantiated manifest

def populate(self, manifest):
    for node in manifest.nodes.values():
        self.add_node(node)

My guess is we don't want to include a "node" from an upstream project within the nodes in our manifest, as we don't want these to actually be run in our project's scope. We simply need a way to resolve the appropriate references to the locations of those contract nodes.

So, what could be some possible implementations:

Alter the way we `populate` the `ref_lookup`?

def populate(self, manifest):
    for node in chain(manifest.nodes.values(), manifest.contracts_upstream.values()):
        self.add_node(node)

This assumes:

we create a contracts_upstream property in our manfiest
- The Manifest dataclass would have to be updated to account for a new field (dbt/contracts/graph/manifest.py)
- [WIP] - Still investigating other places where we'd have to update to accommodate for a change like this
it would be constructed very similarly to our nodes, in that each resource would be of the ManifestNode type. Presumably, though, the structure for this would already have been created by the upstream project and we should just be able to leverage that during parse time.

contracts_upstream: MutableMapping[str, ManifestNode] = field(default_factory=dict)

Gotchas

Without making any changes to how this is populated, the names of each node would have to be unique across each project. Otherwise, you'd run into scenarios where you would overwrite an existing node within the storage dictionary if it exists more than one time across each project.

This is how a node is added to the storage dictionary:

def add_node(self, node: ManifestNode):
    if node.resource_type in self._lookup_types:
        if node.name not in self.storage:
            self.storage[node.name] = {}
        self.storage[node.name][node.package_name] = node.unique_id

Create a new `ContractLookup`

When we resolve_ref the node, it could be set to this:

node = (
    self.ref_lookup.find(target_model_name, pkg, self)
    or self.contract_lookup.find(target_model_name, pkg, self)
)

If it's not found in the manifest's nodes then look in the contracts_upstream.

Open Questions:

How does this hinder being able to use a contracts-like selector to build from?
What would we have to do to include these contracts in docs?
Would this even work?

sungchun12 · 2022-09-19T21:35:15Z

After your first answer to your open questions, let me know where I can help round out the answers

dpguthrie · 2022-09-20T16:28:54Z

Internal slack feedback for the first attempt in the related PR

Things that this makes me start thinking about:

Does this work if the upstream model has the same name as a model in the current project? What would need to change (this PR)?

How should the upstream project specify which models are public / ref'able?

What sort of error should the downstream project see when trying to ref a private model?

Should a "bad ref from other project" error be raised at parse time, or as the very first step during runtime? The former sounds better at first blush, but it would mean that you always need the full "export" artifact from the upstream project just to parse your own project (including dbt ls). I'm honestly not sure about this one!

dpguthrie · 2022-09-20T16:43:16Z

Does this work if the upstream model has the same name as a model in the current project? What would need to change (#3053)?

My first inclination is that this will work if an upstream model has the same name as one in the current project. The reason being is how uniqueness is determined:

def _check_resource_uniqueness(
    manifest: Manifest,
    config: RuntimeConfig,
) -> None:
    names_resources: Dict[str, ManifestNode] = {}
    alias_resources: Dict[str, ManifestNode] = {}

    for resource, node in manifest.nodes.items():

This function iterates through the nodes property on the manifest. This proposal creates an entirely new property, consumers on the manifest, which wouldn't be considered, at least in its current form, as dbt checks for uniqueness.

dpguthrie · 2022-09-20T16:48:00Z

How should the upstream project specify which models are public / ref'able?

I really like @christineberger's suggestion in the original discussion where the developer explicitly defines what models to make public. As an aside, I think the default here is false and it's up to the developer to opt-in to anything being made public.

models:
    # Model folder level
    +shared: true
    # Sub-folder level
    staging:
       +shared: false

This configuration would share all models except for one's located in the staging folder. This configuration would then be used to inform what actually makes it into a json file that a downstream project could consume.

dpguthrie · 2022-09-20T17:11:54Z

What sort of error should the downstream project see when trying to ref a private model?

I think the same compilation error that you would get today if refing a model that doesn't exist in your own project. We shouldn't even state that they're trying to ref a private model.

I think you should treat it as web services treat private resources. For instance, I know the urls of private repos at my last company, but because I no longer have permission to view those, I'll receive 404s if I try directly navigating to those pages. They exist! But that doesn't mean I should have that information.

dpguthrie · 2022-09-28T22:29:33Z

Another implementation thought I had is to add some more properties to our NodeConfig (or something in a higher parent class). The idea being that these nodes that we're pulling in from upstream projects, are just that, nodes. Shouldn't they live alongside the nodes that we've compiled from our project and other packages? What could some other properties include that would delineate these from the others though?
- execute - This should have a default to True. The thinking behind this is we need a way to identify nodes from contracts so we don't run them within our project's scope (unless given permission to do so by the upstream project)
- share - This would be used by whatever functions we develop to create the json artifact an upstream producer produces.
- is_contract - Pretty self explanatory

I think the downstream impacts of this approach are much greater than the initial implementation proposed using a ConsumerLookup but it may make more sense.

github-actions · 2023-03-28T01:56:21Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions · 2023-04-05T01:46:11Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

dpguthrie self-assigned this Sep 13, 2022

github-actions bot changed the title ~~Blueprint optional project ref statement parameter~~ [CT-1162] Blueprint optional project ref statement parameter Sep 13, 2022

dpguthrie linked a pull request Sep 20, 2022 that will close this issue

CT-1162 ref upstream project #5885

Closed

6 tasks

sungchun12 moved this from In Progress to Done in dbt Multi-Project Dreams: Building Blocks in Motion Sep 27, 2022

sungchun12 mentioned this issue Sep 27, 2022

[CT-1256] Unite the work across dbt contracts and optional ref parameter #5945

Closed

5 tasks

dpguthrie linked a pull request Sep 28, 2022 that will close this issue

Project Ref Param #5966

Closed

6 tasks

dpguthrie removed a link to a pull request Sep 28, 2022

Project Ref Param #5966

Closed

6 tasks

github-actions bot added the stale Issues that have gone stale label Mar 28, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1162] Blueprint optional project ref statement parameter #5826

[CT-1162] Blueprint optional project ref statement parameter #5826

dpguthrie commented Sep 13, 2022

dpguthrie commented Sep 13, 2022

dpguthrie commented Sep 13, 2022 •

edited

Loading

dpguthrie commented Sep 19, 2022 •

edited

Loading

sungchun12 commented Sep 19, 2022

dpguthrie commented Sep 20, 2022 •

edited

Loading

dpguthrie commented Sep 20, 2022

dpguthrie commented Sep 20, 2022 •

edited

Loading

dpguthrie commented Sep 20, 2022

dpguthrie commented Sep 28, 2022

github-actions bot commented Mar 28, 2023

github-actions bot commented Apr 5, 2023

[CT-1162] Blueprint optional project ref statement parameter #5826

[CT-1162] Blueprint optional project ref statement parameter #5826

Comments

dpguthrie commented Sep 13, 2022

dpguthrie commented Sep 13, 2022

dpguthrie commented Sep 13, 2022 • edited Loading

Links to Existing Files (WIP)

dpguthrie commented Sep 19, 2022 • edited Loading

Progress Update

Question 1: What do we need to do to make ref accept a project parameter?

Question 2: What do we need to do then to resolve a reference to an upstream contract?

So, what could be some possible implementations:

Alter the way we populate the ref_lookup?

Create a new ContractLookup

sungchun12 commented Sep 19, 2022

dpguthrie commented Sep 20, 2022 • edited Loading

dpguthrie commented Sep 20, 2022

dpguthrie commented Sep 20, 2022 • edited Loading

dpguthrie commented Sep 20, 2022

dpguthrie commented Sep 28, 2022

github-actions bot commented Mar 28, 2023

github-actions bot commented Apr 5, 2023

dpguthrie commented Sep 13, 2022 •

edited

Loading

dpguthrie commented Sep 19, 2022 •

edited

Loading

Question 1: What do we need to do to make `ref` accept a `project` parameter?

Alter the way we `populate` the `ref_lookup`?

Create a new `ContractLookup`

dpguthrie commented Sep 20, 2022 •

edited

Loading

dpguthrie commented Sep 20, 2022 •

edited

Loading