-
Notifications
You must be signed in to change notification settings - Fork 26
/
triggers.Rmd
174 lines (143 loc) · 8.18 KB
/
triggers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
# Triggers: decision rules for building targets {#triggers}
```{r, message = FALSE, warning = FALSE, include = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
options(
drake_make_menu = FALSE,
drake_clean_menu = FALSE,
warnPartialMatchArgs = FALSE,
crayon.enabled = FALSE,
readr.show_progress = FALSE,
tidyverse.quiet = TRUE
)
```
```{r, message = FALSE, warning = FALSE, include = FALSE}
library(drake)
library(tidyverse)
invisible(drake_example("main", overwrite = TRUE))
invisible(file.copy("main/raw_data.xlsx", ".", overwrite = TRUE))
invisible(file.copy("main/report.Rmd", ".", overwrite = TRUE))
```
When you call `make()`, `drake` tries to skip as many targets as possible. If it thinks a command will return the same value as last time, it does not bother running it. In other words, `drake` is lazy, and laziness saves you time.
## What are triggers?
To figure out whether it can skip a target, `drake` goes through an intricate checklist of **triggers**:
1. The **missing** trigger: Do we lack a return value from a previous `make()`? Maybe you are building the target for the first time or you removed it from the cache with `clean()`.
2. The **command** trigger: did the command in the `drake` plan change nontrivially since the last `make()`? Changes to spacing, formatting, and comments are ignored.
3. The **depend** trigger: did any non-file dependencies change since the last `make()`? These could be:
- Other targets.
- Imported objects.
- Imported functions. To track changes to a function, `drake` removes any code closed in `ignore()`, deparses the literal code so that whitespace is standardized and comments are removed, and then hashes the resulting string. In some cases, `drake` makes special adjustments for strange edge cases like [`Rcpp` functions with pointers](https://github.com/ropensci/drake/issues/806) and functions defined with `Vectorize()`. However, edge cases like [this one](https://github.com/ropensci-books/drake/issues/130#issue-522436582) are inevitable because of the flexibility of R.
- Any dependencies of imported functions.
- Any dependencies of dependencies of imported functions, and so on.
4. The **file** trigger: did any file inputs or file outputs change since the last `make()`? These files are the ones explicitly declared in the command with `file_in()`, `knitr_in()`, and `file_out()`.
5. The **seed** trigger: for statistical reproducibility, `drake` assigns a unique seed to each target based on the target's name and the global `seed` argument to `make()`. If you change the target's pseudo-random number generator seed either with the `seed` argument or the custom `seed` column in the plan, this change will cause a rebuild if the `seed` trigger is turned on.
6. The **format** trigger: did you add or change the target's storage format since last build? Details: <https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets>.
7. The **condition** trigger: an optional user-defined piece of code that evaluates to a `TRUE`/`FALSE` value. The target builds if the value is `TRUE`.
8. The **change** trigger: an optional user-defined piece of code that evaluates to any value (preferably small and quick to compute). The target builds if the value changed since the last `make()`.
If *any* trigger detects something wrong or different with the target or its dependencies, the next `make()` will run the command and (re)build the target.
## Customization
With the `trigger()` function, you can create your own customized checklist of triggers. Let's run a simple workflow with just the **missing** trigger. We deactivate the **command**, **depend**, and **file** triggers by setting the respective `command`, `depend`, and `file` arguments to `FALSE`.
```{r}
plan <- drake_plan(
psi_1 = (sqrt(5) + 1) / 2,
psi_2 = (sqrt(5) - 1) / 2
)
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
```
Now, even if you wreck all the commands, nothing rebuilds.
```{r}
plan <- drake_plan(
psi_1 = (sqrt(5) + 1) / 2 + 9999999999999,
psi_2 = (sqrt(5) - 1) / 2 - 9999999999999
)
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
```
You can also give different targets to different triggers. Triggers in the `drake` plan override the `trigger` argument to `make()`. Below, `psi_2` always builds, but `psi_1` only builds if it has never been built before.
```{r}
plan <- drake_plan(
psi_1 = (sqrt(5) + 1) / 2 + 9999999999999,
psi_2 = target(
command = (sqrt(5) - 1) / 2 - 9999999999999,
trigger = trigger(condition = psi_1 > 0)
)
)
plan
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
make(plan, trigger = trigger(command = FALSE, depend = FALSE, file = FALSE))
```
Interestingly, `psi_2` now depends on `psi_1`. Since `psi_1` is part of the target `psi_2` because of the **condition** trigger, it needs to be up to date before we attempt `psi_2`. However, since `psi_1` is not part of the command, changing it will not trip the other triggers such as **depend**.
```{r}
vis_drake_graph(plan)
```
In the next toy example below, `drake` reads from a file to decide whether to build `x`. Try it out.
```{r}
plan <- drake_plan(
x = target(
1 + 1,
trigger = trigger(condition = file_in(readRDS("file.rds")))
)
)
saveRDS(TRUE, "file.rds")
make(plan)
make(plan)
make(plan)
saveRDS(FALSE, "file.rds")
make(plan)
make(plan)
make(plan)
```
In a real project with remote data sources, you may want to use the **condition** trigger to limit your builds to times when enough bandwidth is available for a large download. For example,
```{r, eval = FALSE}
drake_plan(
x = target(
command = download_large_dataset(),
trigger = trigger(condition = is_enough_bandwidth())
)
)
```
Since the **change** trigger can return any value, it is often easier to use than the **condition** trigger.
```{r}
clean(destroy = TRUE)
plan <- drake_plan(
x = target(
command = 1 + 1,
trigger = trigger(change = sqrt(y))
)
)
y <- 1
make(plan)
make(plan)
y <- 2
make(plan)
```
In practice, you may want to use the **change** trigger to check a large remote before downloading it.
```{r, eval = FALSE}
drake_plan(
x = target(
command = download_large_dataset(),
trigger = trigger(
condition = is_enough_bandwidth(),
change = date_last_modified()
)
)
)
```
A word of caution: every non-`NULL` **change** trigger is always evaluated, and its value is carried around in memory throughout `make()`. So if you are not careful, heavy use of the **change** trigger could slow down your workflow and consume extra resources. The **change** trigger should return small values (and should ideally be quick to evaluate). To reduce memory consumption, you may want to return a fingerprint of your trigger value rather than the value itself. See the [`digest`](https://github.com/eddelbuettel/digest) package for more information on computing hashes/fingerprints.
```{r, eval = FALSE}
library(digest)
drake_plan(
x = target(
command = download_large_dataset(),
trigger = trigger(
change = digest(download_medium_dataset())
)
)
)
```
## Alternative trigger modes
Sometimes, you may want to suppress a target without having to worry about turning off every single trigger. That is why the `trigger()` function has a `mode` argument, which controls the role of the **condition** trigger in the decision to build or skip a target. The available trigger modes are `"whitelist"` (default), `"blacklist"`, and `"condition"`.
- `trigger(mode = "whitelist")`: we *rebuild* the target whenever `condition` evaluates to `TRUE`. Otherwise, we defer to the other triggers. This is the default behavior described above in this chapter.
- `trigger(mode = "blacklist")`: we *skip* the target whenever `condition` evaluates to `FALSE`. Otherwise, we defer to the other triggers.
- `trigger(mode = "condition")`: here, the `condition` trigger is the only decider, and we ignore all the other triggers. We *rebuild* target whenever `condition` evaluates to `TRUE` and *skip* it whenever `condition` evaluates to `FALSE`.
## A more practical example
See the ["packages" example](#packages) for a more practical demonstration of triggers and their usefulness.