Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate drake_cache_log_file() #741

Closed
12 tasks
wlandau opened this issue Feb 14, 2019 · 3 comments
Closed
12 tasks

Deprecate drake_cache_log_file() #741

wlandau opened this issue Feb 14, 2019 · 3 comments

Comments

@wlandau
Copy link
Member

wlandau commented Feb 14, 2019

Overview

drake's cache log is a data frame with the fingerprints all the targets and dependencies.

library(drake)
load_mtcars_example()
make(my_plan, verbose = 0)
drake_cache_log()
#> # A tibble: 22 x 3
#>    hash             type   name                  
#>    <chr>            <chr>  <chr>                 
#>  1 4e6e4d4c0bcda263 target coef_regression1_large
#>  2 5ad25785a84e0cac target coef_regression1_small
#>  3 b2662785d55b28c1 target coef_regression2_large
#>  4 23c66b36be56905b target coef_regression2_small
#>  5 7b6504d4c0dcceab target large                 
#>  6 f4b89e63bc92af79 import datasets::mtcars      
#>  7 7a456ee58df699be import file report.Rmd       
#>  8 4b64ebf47673af37 target file report.md        
#>  9 db84c9f752635a13 import random_rows           
#> 10 21935c86f12692e2 import reg1                  
#> # … with 12 more rows

Created on 2019-02-15 by the reprex package (v0.2.1)

The purpose is to

  1. Hint at when and why targets fall out of date, and
  2. Provide a compact record of the state of the whole workflow.

Point (2) supports reproducibility. You can run make(cache_log_file = "cache_log.txt") to generate a flat file with the cache log. Then, when you put cache_log.txt under version control (git/GitHub) you can track the changes to your targets over time. This is especially handy because drake's cache is inconvenient and messy to put under version control.

Problem

drake_cache_log_file() is a standalone function to write the cache log to a file. It seems convenient at first glance, but I no longer believe it should be part of drake.

  1. The physical file of the cache log should really be synchronized with make(). If you change an imported function, run outdated(), and then run drake_cache_log_file(), you will get a cache log file in which the inputs to your workflow are not synchronized with the outputs.
  2. Ideally, the API should be as simple as possible. drake_cache_log() and drake_cache_log_file() are nearly redundant.

Solution

As a Chicago R Unconference project, let's deprecate drake_cache_log_file() to warn people not to use it. A year or two later, the function will become defunct and then eventually removed from the package altogether.

Implementation

  • Move drake_cache_log_file() from its current home in R/api-cache.R to the bottom of R/utils-deprecate.R.
  • Call .Deprecated() at the beginning of the function to warn users.
  • Update the roxygen2 documentation.
    • Indicate in the @title that the function is deprecated.
    • In the @description, write the date that you deprecated the function. (We need to know when we can go back and make it defunct.)
    • Right under @export, add a line with #' @keywords internal to hide drake_cache_log_file() from help(package = "drake").
    • Remove the @examples.
  • Add a test in tests/testthat/test-deprecate.R to test that the deprecation message shows up and the function still produces a sensible cache log. Feel free to borrow from this line and make a new call to test_with_dir().
  • Remove calls to drake_cache_log_file() from the other tests in tests/testthat.
  • Remove drake_cache_log_file() from _pkgdown.yml so it no longer shows up in the online function reference.
  • Mention the deprecation in NEWS.md.
  • If you like, add yourself as a contributor in the DESCRIPTION file.
@tjmahr
Copy link
Contributor

tjmahr commented Mar 10, 2019

preprocess-config.R refers to the function and the default filename. We can remove its mention from the documentation but should make sure that the behavior described here is accurate.

#' @param cache_log_file Name of the cache log file to write.
#'   If `TRUE`, the default file name is used (`drake_cache.log`).
#'   If `NULL`, no file is written.
#'   If activated, this option uses
#'   [drake_cache_log_file()] to write a flat text file
#'   to represent the state of the cache
#'   (fingerprints of all the targets and imports).
#'   If you put the log file under version control, your commit history
#'   will give you an easy representation of how your results change
#'   over time as the rest of your project changes. Hopefully,
#'   this is a step in the right direction for data reproducibility.
#' ...
drake_config <- function(
# ...

@wlandau
Copy link
Member Author

wlandau commented Mar 10, 2019

Yes please. The explicit mention of drake_cache_log_file() will no longer be accurate after #778.

@wlandau
Copy link
Member Author

wlandau commented Mar 10, 2019

Thanks @tjmahr for #778.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants