Skip to content

Commit

Permalink
Enable history by default
Browse files Browse the repository at this point in the history
  • Loading branch information
wlandau-lilly committed Jun 26, 2019
1 parent 228c72a commit fe279e6
Show file tree
Hide file tree
Showing 12 changed files with 89 additions and 71 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ Imports:
methods,
rlang (>= 0.2.0),
storr (>= 1.1.0),
txtq (>= 0.1.3),
utils
Suggests:
abind,
Expand Down Expand Up @@ -115,7 +116,6 @@ Suggests:
testthat (>= 2.1.0),
tibble,
tidyselect (>= 0.2.4),
txtq (>= 0.1.3),
txtplot,
usethis,
visNetwork,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,7 @@ importFrom(rlang,quo_squash)
importFrom(rlang,quos)
importFrom(storr,storr_environment)
importFrom(storr,storr_rds)
importFrom(txtq,txtq)
importFrom(utils,compareVersion)
importFrom(utils,flush.console)
importFrom(utils,head)
Expand Down
3 changes: 2 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@

## New features

- Track history and provenance of targets, viewable with `drake_history()`. Powered by [`txtq`](https://github.com/wlandau/txtq) (#918, #920).
- Export `transform_plan()`.
- Add a new `no_deps()` function, similar to `ignore()`. `no_deps()` suppresses dependency detection but still tracks changes to the literal code ([#910](https://github.com/ropensci/drake/issues/910)).
- Add a new "autoclean" memory strategy (#917)
- Add a new "autoclean" memory strategy (#917).

## Enhancements

Expand Down
1 change: 0 additions & 1 deletion R/api-history.R
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,6 @@ history_analyze_value <- function(name, value, ht) {
}

default_history_queue <- function(cache_path) {
assert_pkg("txtq", version = "0.1.2")
cache_dir <- dirname(cache_path)
history_path <- file.path(cache_dir, ".drake_history")
txtq::txtq(history_path)
Expand Down
12 changes: 2 additions & 10 deletions R/api-make.R
Original file line number Diff line number Diff line change
Expand Up @@ -156,18 +156,10 @@ make <- function(
template = list(),
sleep = function(i) 0.01,
hasty_build = NULL,
memory_strategy = c(
"speed",
"autoclean",
"preclean",
"lookahead",
"unload",
"none",
"memory" # deprecated on 2019-06-22
),
memory_strategy = "speed",
layout = NULL,
lock_envir = TRUE,
history = FALSE
history = TRUE
) {
log_msg(
"begin make()",
Expand Down
1 change: 1 addition & 0 deletions R/api-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
#' @importFrom methods new setRefClass
#' @importFrom rlang dots_list enquo eval_tidy expr quo_squash quos
#' @importFrom storr storr_environment storr_rds
#' @importFrom txtq txtq
#' @importFrom utils compareVersion flush.console head menu packageVersion
#' read.csv sessionInfo stack type.convert unzip write.table
NULL
12 changes: 12 additions & 0 deletions R/exec-memory.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
memory_strategies <- function() {
c(
"speed",
"autoclean",
"preclean",
"lookahead",
"unload",
"none",
"memory" # deprecated on 2019-06-22
)
}

assign_to_envir <- function(target, value, config) {
memory_strategy <- config$layout[[target]]$memory_strategy %||NA%
config$memory_strategy
Expand Down
16 changes: 4 additions & 12 deletions R/preprocess-config.R
Original file line number Diff line number Diff line change
Expand Up @@ -399,7 +399,7 @@
#' of your targets. You can also supply a
#' [`txtq`](https://github.com/wlandau/txtq), which is
#' how `drake` records history.
#' Required for [drake_history()].
#' Must be `TRUE` for [drake_history()] to work later.
#'
#' @examples
#' \dontrun{
Expand Down Expand Up @@ -464,18 +464,10 @@ drake_config <- function(
template = list(),
sleep = function(i) 0.01,
hasty_build = NULL,
memory_strategy = c(
"speed",
"autoclean",
"preclean",
"lookahead",
"unload",
"none",
"memory" # deprecated on 2019-06-22
),
memory_strategy = "speed",
layout = NULL,
lock_envir = TRUE,
history = FALSE
history = TRUE
) {
log_msg(
"begin drake_config()",
Expand Down Expand Up @@ -560,7 +552,7 @@ drake_config <- function(
# 2019-01-03 # nolint
)
}
memory_strategy <- match.arg(memory_strategy)
memory_strategy <- match.arg(memory_strategy, choices = memory_strategies())
if (memory_strategy == "memory") {
memory_strategy <- "preclean"
warning(
Expand Down
29 changes: 13 additions & 16 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ output:
<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r knitrsetup, echo = FALSE}
dir <- tempfile()
dir.create(dir)
knitr::opts_knit$set(root.dir = dir)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
Expand Down Expand Up @@ -188,7 +191,7 @@ plan
So far, we have just been setting the stage. Use `make()` to do the real work. Targets are built in the correct order regardless of the row order of `plan`.

```{r make1}
make(plan, history = TRUE) # History is new in drake 7.5.0.
make(plan)
```

Except for files like `report.html`, your output is stored in a hidden `.drake/` folder. Reading it back is easy.
Expand Down Expand Up @@ -229,7 +232,7 @@ vis_drake_graph(config) # Interactive graph: zoom, drag, etc.
The next `make()` just builds `hist` and `report.html`. No point in wasting time on the data or model.

```{r justhistetc}
make(plan, history = TRUE)
make(plan)
```

```{r hist2, eval = FALSE}
Expand Down Expand Up @@ -267,26 +270,28 @@ make(plan) # Independently re-create the results from the code and input data.

## History and provenance

As of version 7.5.0, `drake` can track the history and provenance of your targets:
As of version 7.5.0, `drake` tracks the history and provenance of your targets:
what you built, when you built it, how you built it, the arguments you
used in your function calls, and how to get the data back.
used in your function calls, and how to get the data back. (Disable with `make(history = FALSE)`)

```{r history}
history <- drake_history(analyze = TRUE) # Requires make(history = TRUE)
history <- drake_history(analyze = TRUE)
history
```

Remarks:

- The `quiet` column appears above because one of the `drake_plan()` commands has `knit(quiet = TRUE)`.
- The `hash` column identifies all the previous the versions of your targets. As long as `exists` is `TRUE`, you can recover old data.
- Advanced: if you use `make(cache_log_file = TRUE)` and put the cache log file under version control, you can match the hashes from `drake_history()` with the `git` commit history of your code.

Let's use the history to recover the old histogram.
Let's use the history to recover the oldest histogram.

```{r, eval = FALSE}
hash <- history %>%
filter(target == "hist" & !latest) %>% # Get the old histogram.
pull(hash)
filter(target == "hist") %>%
pull(hash) %>%
head(n = 1)
cache <- drake_cache()
cache$get_value(hash)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Expand Down Expand Up @@ -476,11 +481,3 @@ Many thanks to [Julia Lowndes](https://github.com/jules32), [Ben Marwick](https:
Credit for images is [attributed here](https://ropensci.github.io/drake/figures/image-credit.md).

[![ropensci_footer](https://ropensci.org/public_images/github_footer.png)](https://ropensci.org)

```{r cleanupfooter, echo = FALSE}
clean(destroy = TRUE)
unlink(
c(".drake_history", "main", "raw_data.xlsx", "report.Rmd"),
recursive = TRUE
)
```
69 changes: 47 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ work. Targets are built in the correct order regardless of the row order
of `plan`.

``` r
make(plan, history = TRUE) # History is new in drake 7.5.0.
make(plan)
#> target raw_data
#> target data
#> target fit
Expand Down Expand Up @@ -260,7 +260,7 @@ The next `make()` just builds `hist` and `report.html`. No point in
wasting time on the data or model.

``` r
make(plan, history = TRUE)
make(plan)
#> target hist
#> target report
```
Expand Down Expand Up @@ -330,24 +330,29 @@ make(plan) # Independently re-create the results from the code and input data.

## History and provenance

As of version 7.5.0, `drake` can track the history and provenance of
your targets: what you built, when you built it, how you built it, the
As of version 7.5.0, `drake` tracks the history and provenance of your
targets: what you built, when you built it, how you built it, the
arguments you used in your function calls, and how to get the data back.
(Disable with `make(history = FALSE)`)

``` r
history <- drake_history(analyze = TRUE) # Requires make(history = TRUE)
history <- drake_history(analyze = TRUE)
history
#> # A tibble: 7 x 9
#> target time hash exists command runtime latest quiet
#> <chr> <dttm> <chr> <lgl> <chr> <dbl> <lgl> <lgl>
#> 1 data 2019-06-25 14:32:05 e580… TRUE raw_da… 0.004 TRUE NA
#> 2 fit 2019-06-25 14:32:05 62a1… TRUE lm(Sep… 0.007 TRUE NA
#> 3 hist 2019-06-25 14:32:05 10bc… TRUE create… 0.008 FALSE NA
#> 4 hist 2019-06-25 14:32:06 5252… TRUE create… 0.00400 TRUE NA
#> 5 raw_d… 2019-06-25 14:32:04 6317… TRUE "readx… 0.012 TRUE NA
#> 6 report 2019-06-25 14:32:06 9946… TRUE "rmark… 1.18 FALSE TRUE
#> 7 report 2019-06-25 14:32:07 9946… TRUE "rmark… 0.489 TRUE TRUE
#> # … with 1 more variable: output_file <chr>
#> # A tibble: 12 x 9
#> target time hash exists command runtime latest quiet output_file
#> <chr> <chr> <chr> <lgl> <chr> <dbl> <lgl> <lgl> <chr>
#> 1 data 2019-0… e580… TRUE raw_data… 0.002 FALSE NA <NA>
#> 2 data 2019-0… e580… TRUE raw_data… 0 TRUE NA <NA>
#> 3 fit 2019-0… 62a1… TRUE lm(Sepal… 0.003 FALSE NA <NA>
#> 4 fit 2019-0… 62a1… TRUE lm(Sepal… 0.001000 TRUE NA <NA>
#> 5 hist 2019-0… 10bc… TRUE create_p… 0.006 FALSE NA <NA>
#> 6 hist 2019-0… 5252… TRUE create_p… 0.004 FALSE NA <NA>
#> 7 hist 2019-0… 00fa… TRUE create_p… 0.00600 TRUE NA <NA>
#> 8 raw_da… 2019-0… 6317… TRUE "readxl:… 0.01 FALSE NA <NA>
#> 9 raw_da… 2019-0… 6317… TRUE "readxl:… 0.007 TRUE NA <NA>
#> 10 report 2019-0… 0064… TRUE "rmarkdo… 0.647 FALSE TRUE report.html
#> 11 report 2019-0… 0064… TRUE "rmarkdo… 0.45 FALSE TRUE report.html
#> 12 report 2019-0… 0064… TRUE "rmarkdo… 0.456 TRUE TRUE report.html
```

Remarks:
Expand All @@ -356,13 +361,17 @@ Remarks:
commands has `knit(quiet = TRUE)`.
- The `hash` column identifies all the previous the versions of your
targets. As long as `exists` is `TRUE`, you can recover old data.
- Advanced: if you use `make(cache_log_file = TRUE)` and put the cache
log file under version control, you can match the hashes from
`drake_history()` with the `git` commit history of your code.

Let’s use the history to recover the old histogram.
Let’s use the history to recover the oldest histogram.

``` r
hash <- history %>%
filter(target == "hist" & !latest) %>% # Get the old histogram.
pull(hash)
filter(target == "hist") %>%
pull(hash) %>%
head(n = 1)
cache <- drake_cache()
cache$get_value(hash)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Expand Down Expand Up @@ -679,13 +688,29 @@ here](https://github.com/wlandau/drake-examples/tree/master/main).

### Version control

`drake` is not a version control tool. However, it is fully compatible with [`git`](https://git-scm.com/), [`svn`](https://en.wikipedia.org/wiki/Apache_Subversion), and similar software. In fact, it is good practice to use [`git`](https://git-scm.com/) alongside `drake` for reproducible workflows.
`drake` is not a version control tool. However, it is fully compatible
with [`git`](https://git-scm.com/),
[`svn`](https://en.wikipedia.org/wiki/Apache_Subversion), and similar
software. In fact, it is good practice to use
[`git`](https://git-scm.com/) alongside `drake` for reproducible
workflows.

However, data poses a challenge. The datasets created by `make()` can get large and numerous, and it is not recommended to put the `.drake/` cache or the `.drake_history/` logs under version control. Instead, it is recommended to use a data storage solution such as [DropBox](https://www.dropbox.com/) or [OSF](https://osf.io/ka7jv/wiki/home/).
However, data poses a challenge. The datasets created by `make()` can
get large and numerous, and it is not recommended to put the `.drake/`
cache or the `.drake_history/` logs under version control. Instead, it
is recommended to use a data storage solution such as
[DropBox](https://www.dropbox.com/) or
[OSF](https://osf.io/ka7jv/wiki/home/).

### Containerization and R package environments

`drake` does not track R packages or system dependencies for changes. Instead, it defers to tools like [Docker](https://www.docker.com), [Singularity](https://sylabs.io/singularity/), [`renv`](https://github.com/rstudio/renv), and [`packrat`](https://github.com/rstudio/packrat), which create self-contained portable environments to reproducibly isolate and ship data analysis projects. `drake` is fully compatible with these tools.
`drake` does not track R packages or system dependencies for changes.
Instead, it defers to tools like [Docker](https://www.docker.com),
[Singularity](https://sylabs.io/singularity/),
[`renv`](https://github.com/rstudio/renv), and
[`packrat`](https://github.com/rstudio/packrat), which create
self-contained portable environments to reproducibly isolate and ship
data analysis projects. `drake` is fully compatible with these tools.

### workflowr

Expand Down
7 changes: 3 additions & 4 deletions man/drake_config.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 3 additions & 4 deletions man/make.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit fe279e6

Please sign in to comment.