Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1584] New top level commands: interactive compile #7008

Merged
merged 18 commits into from
Mar 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changes/unreleased/Features-20230306-170251.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Features
body: '[CT-1584] New top level commands: interactive compile'
time: 2023-03-06T17:02:51.240582-08:00
custom:
Author: aranke
Issue: "6358"
2 changes: 2 additions & 0 deletions core/dbt/cli/flags.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
"FULL_REFRESH": False,
"STRICT_MODE": False,
"STORE_FAILURES": False,
"INTROSPECT": True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to figure out a way to remove this section at some point. Just write down what I think, this works before we do that.

}


Expand All @@ -43,6 +44,7 @@
"fail_fast",
"indirect_selection",
"store_failures",
"introspect",
Copy link
Contributor

@ChenyuLInx ChenyuLInx Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we want to add introspect here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it to help a test pass, but it might not be required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: removing this doesn't help, so I'll leave it as-is for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean removing it would cause things to fail? This doesn't feel right given that introspect is only a parameter for compile task command, and all parameter in this list is defiled both as cli click group level and a task command level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Punted into a new ticket: #7156

]


Expand Down
3 changes: 3 additions & 0 deletions core/dbt/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -239,12 +239,15 @@ def docs_serve(ctx, **kwargs):
@p.exclude
@p.favor_state
@p.full_refresh
@p.indirect_selection
@p.introspect
@p.parse_only
@p.profile
@p.profiles_dir
@p.project_dir
@p.select
@p.selector
@p.inline
@p.state
@p.target
@p.target_path
Expand Down
13 changes: 11 additions & 2 deletions core/dbt/cli/params.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,17 @@
help="Pre cache database objects relevant to selected resource only.",
)

introspect = click.option(
"--introspect/--no-introspect",
envvar="DBT_INTROSPECT",
help="Whether to scaffold introspective queries as part of compilation",
default=True,
)

compile_docs = click.option(
"--compile/--no-compile",
envvar=None,
help="Wether or not to run 'dbt compile' as part of docs generation",
help="Whether or not to run 'dbt compile' as part of docs generation",
default=True,
)

Expand Down Expand Up @@ -100,7 +107,7 @@
"--indirect-selection",
envvar="DBT_INDIRECT_SELECTION",
help="Select all tests that are adjacent to selected resources, even if they those resources have been explicitly selected.",
type=click.Choice(["eager", "cautious", "buildable"], case_sensitive=False),
type=click.Choice(["eager", "cautious", "buildable", "empty"], case_sensitive=False),
default="eager",
)

Expand Down Expand Up @@ -310,6 +317,8 @@
"type": tuple,
}

inline = click.option("--inline", envvar=None, help="Pass SQL inline to dbt compile and preview")

# `--select` and `--models` are analogous for most commands except `dbt list` for legacy reasons.
# Most CLI arguments should use the combined `select` option that aliases `--models` to `--select`.
# However, if you need to split out these separators (like `dbt ls`), use the `models` and `raw_select` options instead.
Expand Down
Binary file modified core/dbt/docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified core/dbt/docs/build/doctrees/index.doctree
Binary file not shown.
5 changes: 5 additions & 0 deletions core/dbt/docs/build/html/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,11 @@ <h4>full_refresh<a class="headerlink" href="#compile|full_refresh" title="Permal
<p>Type: boolean</p>
<p>If specified, dbt will drop incremental models and fully-recalculate the incremental table from the model definition.</p>
</section>
<section id="compile|introspect">
<h4>introspect<a class="headerlink" href="#compile|introspect" title="Permalink to this heading">¶</a></h4>
<p>Type: boolean</p>
<p>Whether to scaffold introspective queries as part of compilation</p>
</section>
<section id="compile|parse_only">
<h4>parse_only<a class="headerlink" href="#compile|parse_only" title="Permalink to this heading">¶</a></h4>
<p>Type: boolean</p>
Expand Down
2 changes: 1 addition & 1 deletion core/dbt/docs/build/html/searchindex.js

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions core/dbt/events/proto_types.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 10 additions & 1 deletion core/dbt/events/types.proto
Original file line number Diff line number Diff line change
Expand Up @@ -1622,7 +1622,16 @@ message ConcurrencyLineMsg {
ConcurrencyLine data = 2;
}

// Skipped Q028
// Q028
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what is Q stand for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node execution, it's not the perfect category, but the best one I could find.

message CompiledNode {
string node_name = 1;
string compiled = 2;
}

message CompiledNodeMsg {
EventInfo info = 1;
CompiledNode data = 2;
}

// Q029
message WritingInjectedSQLForNode {
Expand Down
10 changes: 8 additions & 2 deletions core/dbt/events/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -1711,7 +1711,13 @@ def message(self) -> str:
return f"Concurrency: {self.num_threads} threads (target='{self.target_name}')"


# Skipped Q028
@dataclass
class CompiledNode(InfoLevel, pt.CompiledNode):
def code(self):
return "Q028"

def message(self) -> str:
return f"Compiled node '{self.node_name}' is:\n{self.compiled}"
Copy link
Member Author

@aranke aranke Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbeatty10 Does this message make sense, or should I change it to something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example what this message would look like with realistic values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup:

❯ dbt compile --inline "select * from {{ ref('raw_orders') }}"
...
17:50:59  Compiled node 'inline_query' is:
select * from "jaffle_shop"."main"."raw_orders"
17:50:59  Done.
❯ dbt compile --select stg_payments
...
17:51:27  Compiled node 'stg_payments' is:
with source as (
    select * from "jaffle_shop"."main"."raw_payments"

),

renamed as (

    select
        id as payment_id,
        order_id,
        payment_method,

        -- `amount` is currently stored in cents, so we convert it to dollars
        amount / 100 as amount

    from source

)

select * from renamed
17:51:27  Done.



@dataclass
Expand Down Expand Up @@ -2306,7 +2312,7 @@ def message(self) -> str:


# The Note event provides a way to log messages which aren't likely to be useful as more structured events.
# For conslole formatting text like empty lines and separator bars, use the Formatting event instead.
# For console formatting text like empty lines and separator bars, use the Formatting event instead.
@dataclass
class Note(InfoLevel, pt.Note):
def code(self):
Expand Down
1 change: 1 addition & 0 deletions core/dbt/flags.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ def get_flag_dict():
"quiet",
"no_print",
"cache_selected_only",
"introspect",
"target_path",
"log_path",
}
Expand Down
19 changes: 13 additions & 6 deletions core/dbt/graph/selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,11 +87,14 @@ def get_nodes_from_criteria(
)
return set(), set()

neighbors = self.collect_specified_neighbors(spec, collected)
direct_nodes, indirect_nodes = self.expand_selection(
selected=(collected | neighbors), indirect_selection=spec.indirect_selection
)
return direct_nodes, indirect_nodes
if spec.indirect_selection == IndirectSelection.Empty:
return collected, set()
else:
neighbors = self.collect_specified_neighbors(spec, collected)
direct_nodes, indirect_nodes = self.expand_selection(
selected=(collected | neighbors), indirect_selection=spec.indirect_selection
)
return direct_nodes, indirect_nodes

def collect_specified_neighbors(
self, spec: SelectionCriteria, selected: Set[UniqueId]
Expand Down Expand Up @@ -199,7 +202,7 @@ def expand_selection(
) -> Tuple[Set[UniqueId], Set[UniqueId]]:
# Test selection by default expands to include an implicitly/indirectly selected tests.
# `dbt test -m model_a` also includes tests that directly depend on `model_a`.
# Expansion has three modes, EAGER, CAUTIOUS and BUILDABLE.
# Expansion has four modes, EAGER, CAUTIOUS and BUILDABLE, EMPTY.
#
# EAGER mode: If ANY parent is selected, select the test.
#
Expand All @@ -213,6 +216,8 @@ def expand_selection(
# - If ANY parent is missing, return it separately. We'll keep it around
# for later and see if its other parents show up.
#
# EMPTY mode: Only select the given node and ignore attached nodes (i.e. ignore tests attached to a model)
#
# Users can opt out of inclusive EAGER mode by passing --indirect-selection cautious
# CLI argument or by specifying `indirect_selection: true` in a yaml selector

Expand All @@ -237,6 +242,8 @@ def expand_selection(
node.depends_on_nodes
) <= set(selected_and_parents):
direct_nodes.add(unique_id)
elif indirect_selection == IndirectSelection.Empty:
pass
else:
indirect_nodes.add(unique_id)

Expand Down
1 change: 1 addition & 0 deletions core/dbt/graph/selector_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ class IndirectSelection(StrEnum):
Eager = "eager"
Cautious = "cautious"
Buildable = "buildable"
Empty = "empty"


def _probably_path(value: str):
Expand Down
3 changes: 2 additions & 1 deletion core/dbt/task/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import time
import traceback
from abc import ABCMeta, abstractmethod
from contextlib import nullcontext
from typing import Type, Union, Dict, Any, Optional
from datetime import datetime

Expand Down Expand Up @@ -309,7 +310,7 @@ def skip_result(self, node, message):

def compile_and_execute(self, manifest, ctx):
result = None
with self.adapter.connection_for(self.node):
with self.adapter.connection_for(self.node) if get_flags().INTROSPECT else nullcontext():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user set DBT_INTROSPECT to False, but we are not running compile/preview command, get_flags().INTROSPECT will be True since it is using default value right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct.

ctx.node.update_event_status(node_status=RunningStatus.Compiling)
fire_event(
NodeCompiling(
Expand Down
45 changes: 38 additions & 7 deletions core/dbt/task/compile.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
import threading
from typing import AbstractSet, Optional

from .runnable import GraphRunnableTask
from .base import BaseRunner

from dbt.contracts.graph.manifest import WritableManifest
from dbt.contracts.results import RunStatus, RunResult
from dbt.events.functions import fire_event
from dbt.events.types import CompileComplete, CompiledNode
from dbt.exceptions import DbtInternalError, DbtRuntimeError
from dbt.graph import ResourceTypeSelector
from dbt.events.functions import fire_event
from dbt.events.types import CompileComplete
from dbt.parser.manifest import write_manifest
from dbt.node_types import NodeType
from dbt.parser.manifest import write_manifest, process_node
from dbt.parser.sql import SqlBlockParser
from dbt.task.base import BaseRunner
from dbt.task.runnable import GraphRunnableTask


class CompileRunner(BaseRunner):
Expand Down Expand Up @@ -43,19 +43,40 @@ def raise_on_first_error(self):
return True

def get_node_selector(self) -> ResourceTypeSelector:
if getattr(self.args, "inline", None):
resource_types = [NodeType.SqlOperation]
else:
resource_types = NodeType.executable()

if self.manifest is None or self.graph is None:
raise DbtInternalError("manifest and graph must be set to get perform node selection")
return ResourceTypeSelector(
graph=self.graph,
manifest=self.manifest,
previous_state=self.previous_state,
resource_types=NodeType.executable(),
resource_types=resource_types,
)

def get_runner_type(self, _):
return CompileRunner

def task_end_messages(self, results):
if getattr(self.args, "inline", None):
result = results[0]
fire_event(
CompiledNode(node_name=result.node.name, compiled=result.node.compiled_code)
)

if self.selection_arg:
matched_results = [
result for result in results if result.node.name == self.selection_arg[0]
]
if len(matched_results) == 1:
result = matched_results[0]
fire_event(
CompiledNode(node_name=result.node.name, compiled=result.node.compiled_code)
)

fire_event(CompileComplete())

def _get_deferred_manifest(self) -> Optional[WritableManifest]:
Expand Down Expand Up @@ -88,3 +109,13 @@ def defer_to_manifest(self, adapter, selected_uids: AbstractSet[str]):
)
# TODO: is it wrong to write the manifest here? I think it's right...
write_manifest(self.manifest, self.config.target_path)

def _runtime_initialize(self):
if getattr(self.args, "inline", None):
block_parser = SqlBlockParser(
project=self.config, manifest=self.manifest, root_project=self.config
)
sql_node = block_parser.parse_remote(self.args.inline, "inline_query")
process_node(self.config, self.manifest, sql_node)

super()._runtime_initialize()
7 changes: 6 additions & 1 deletion core/dbt/tests/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
)
from dbt.events.test_types import IntegrationTestDebug


# =============================================================================
# Test utilities
# run_dbt
Expand Down Expand Up @@ -160,6 +161,11 @@ def write_file(contents, *paths):
fp.write(contents)


def file_exists(*paths):
"""Check if file exists at path"""
return os.path.exists(os.path.join(*paths))


Comment on lines +164 to +168
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem to solve in this PR, but I wonder why os.path was used in this module rather than pathlib?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already using os in several places instead of pathlib so wanted to keep the status quo to avoid bringing in a new import.
But I don't feel too strongly either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, going with the flow make sense in this PR.

Something for us to consider in a refactor down the road though.

# Used in test utilities
def read_file(*paths):
contents = ""
Expand Down Expand Up @@ -385,7 +391,6 @@ def check_relation_has_expected_schema(adapter, relation_name, expected_schema:
def check_relations_equal_with_relations(
adapter: Adapter, relations: List, compare_snapshot_cols=False
):

with get_connection(adapter):
basis, compares = relations[0], relations[1:]
# Skip columns starting with "dbt_" because we don't want to
Expand Down
Loading