Skip to content

Commit

Permalink
Ensure MicrobatchModelRunner doesn't double compile batches
Browse files Browse the repository at this point in the history
We were compiling the node for each batch _twice_. Besides making microbatch
models more expensive than they needed to be, double compiling wasn't
causing any issue. However the first compilation was happening _before_ we
had added the batch context information to the model node for the batch. This
was leading to models which try to access the `batch_context` information on the
model to blow up, which was undesirable. As such, we've now gone and skipped
the first compilation. We've done this similar to how SavedQuery nodes skip
compilation.
  • Loading branch information
QMalcolm committed Nov 27, 2024
1 parent 585fb04 commit 45daec7
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions core/dbt/task/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,13 @@ def __init__(self, config, adapter, node, node_index: int, num_nodes: int):
self.batches: Dict[int, BatchType] = {}
self.relation_exists: bool = False

def compile(self, manifest: Manifest):
# The default compile function is _always_ called. However, we do our
# compilation _later_ in `_execute_microbatch_materialization`. This
# meant the node was being compiled _twice_ for each batch. To get around
# this, we've overriden the default compile method to do nothing
return self.node

def set_batch_idx(self, batch_idx: int) -> None:
self.batch_idx = batch_idx

Expand Down

0 comments on commit 45daec7

Please sign in to comment.