Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2776] [Feature] Allow get_metadata_vars() to populate from env post-initial invocation #8010

Closed
3 tasks done
gem7318 opened this issue Jun 30, 2023 · 3 comments
Closed
3 tasks done
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors logging

Comments

@gem7318
Copy link
Contributor

gem7318 commented Jun 30, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

The requested feature from this change enables get_metadata_vars() to be populated if the environment is updated after the first time get_metadata_vars() is ever called.

Impetus is that in our deployments, we need to inject the project vars into the data.extra attribute of each Log Line to operate on for alerting down to the team / project / environment - mechanically, this means:

  1. Get the post-parse project vars since they’re templated with pre-existing environment variables
  2. Prepend them with DBT_ENV_CUSTOM_ENV_ and jam them into the runtime environment
  3. Then call our build command so that the custom env vars make it into data.extra attribute of log lines it emits

Example

Post applying #7998, the following is possible:

# example
import os
from typing import Dict

from dbt.cli.main import dbtRunner, dbtRunnerResult

METADATA_ENV_PREFIX = "DBT_ENV_CUSTOM_ENV_"

def build() -> None:

    dbt: dbtRunner = dbtRunner()
    
    # Compile project to populate project `vars` with
    # template values from initial environment.
    dbt_run: dbtRunnerResult = dbt.invoke(["compile"])

    custom_vars: Dict[str, str] = {
        # Get the post-parse project vars from any results object.
        # Prefix them to be registered as `DBT_ENV_CUSTOM_ENV_*` vars.
        f"{METADATA_ENV_PREFIX}_{var_name}".upper(): var_value
        for result in dbt_run.result.results
        for var_name, var_value in result.node.config.meta.items()
        if (
            dbt_run.result and dbt_run.result.results
        )
    }
    
    # Update environment with project vars as environment vars.
    os.environ.update(custom_vars)

    # Build project with updated environment and custom env vars.
    # Without the change to `get_metadata_vars()`, the new vars
    # won't make it into the logger's context during this `build`.
    dbt.invoke(["build"])

More context

When using programmatic invocations, the dbt.core.events.functions.get_metadata_vars() function is called once when the dbtRunner is instantiated.

If no DBT_ENV_CUSTOM_ENV_* vars are found the first time it's called, it will continue to return an empty Dictionary upon subsequent invocations and will miss environment variables added after its initial instantiation.

The main consideration is if there are any ramifications from checking the environment each time get_metadata_vars() is called until it (possibly) becomes populated - this should only be a concern if get_metadata_vars() is called a zillion times and too costly to re-calculate on the fly.

Describe alternatives you've considered

Trying to get these project vars into the runtime environment before initial instantiation (untenable, we use project vars too heavily)

Who will this benefit?

Those using programmatic invocations and wanting to update the runtime environment across invocations of dbtRunner.invoke().

Are you interested in contributing this feature?

Here is a link to the required change that has been tested locally.

@gem7318 gem7318 added enhancement New feature or request triage labels Jun 30, 2023
@github-actions github-actions bot changed the title [Feature] Allow get_metadata_vars() to populate from environment post-initial invocation. [CT-2776] [Feature] Allow get_metadata_vars() to populate from environment post-initial invocation. Jun 30, 2023
@gem7318
Copy link
Contributor Author

gem7318 commented Jun 30, 2023

Linking the PR associated with the change as a separate comment for clarity.

@gem7318 gem7318 changed the title [CT-2776] [Feature] Allow get_metadata_vars() to populate from environment post-initial invocation. [CT-2776] [Feature] Allow get_metadata_vars() to populate from env post-initial invocation Jun 30, 2023
@jtcohen6 jtcohen6 self-assigned this Jul 4, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 4, 2023

@gem7318 Thanks for the super-thorough write-up!!

The change in #7998 seems straightforward enough. I'll queue it up for review by the eng team.

@jtcohen6 jtcohen6 added logging and removed triage labels Jul 4, 2023
@jtcohen6 jtcohen6 removed their assignment Jul 4, 2023
@jtcohen6 jtcohen6 added the help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors label Jul 10, 2023
@gem7318
Copy link
Contributor Author

gem7318 commented Jan 19, 2024

@jtcohen6 Closing this issue as #7998 was merged🙏🏻

@gem7318 gem7318 closed this as completed Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors logging
Projects
None yet
Development

No branches or pull requests

2 participants