Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user case of tar_skip / tar_change, deal with empty targets #17

Closed
9 tasks done
ginolhac opened this issue Oct 26, 2020 · 4 comments
Closed
9 tasks done

user case of tar_skip / tar_change, deal with empty targets #17

ginolhac opened this issue Oct 26, 2020 · 4 comments

Comments

@ginolhac
Copy link

Prework

  • Read and agree to the code of conduct and contributing guidelines.
  • [x If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Description

Hi,
first of all thank a lot for the fabulous work you are doing, I was using slighly drake (and actually posted a relevant issue that you solved.
Now I moved to targets and I like it a lot. My workflow is to track new lectures/tutorials in dedicated folders and build a website for teaching. The part that is not working is how to deal with empty targets.

Here is a small example of something that is working, I am using tar_change to track the apparition of new files, then we compute md5sum and do stuff.

library(targets)
dir.create("tmp")
file.create("tmp/a")
#> [1] TRUE
writeLines("aa", con = "tmp/a")
file.create("tmp/b")
#> [1] TRUE
writeLines("bb", con = "tmp/b")

tar_script({
  library(tarchetypes)
  options(crayon.enabled = FALSE)
  tar_option_set(packages = c("tarchetypes", "tools"))
  tar_pipeline(
    tar_change(x, dir("tmp", full.names = TRUE),
               change = md5sum(dir("tmp", full.names = TRUE))),
    tar_file(y,
               setNames(x, md5sum(x)),
               pattern = map(x)),
    tar_target(z,
               readLines(y),
               pattern = map(y))
  )
})
tar_make()
#> ● run target x_change
#> ● run target x
#> ● run branch y_41a0680a
#> ● run branch y_d407878d
#> ● run branch z_cc0cfe46
#> ● run branch z_d699c360
tar_make()
#> ● run target x_change
#> ✓ skip target x
#> ✓ skip branch y_41a0680a
#> ✓ skip branch y_d407878d
#> ✓ skip branch z_cc0cfe46
#> ✓ skip branch z_d699c360
# adding a file
file.create("tmp/c")
#> [1] TRUE
tar_read(z)
#> [1] "aa" "bb"
# modify a file
writeLines("new_a", con = "tmp/a")
tar_make()
#> ● run target x_change
#> ● run target x
#> ● run branch y_41a0680a
#> ✓ skip branch y_d407878d
#> ● run branch y_435a25b4
#> ● run branch z_cc0cfe46
#> ✓ skip branch z_d699c360
#> ● run branch z_7ae6e1a8
tar_read(z)
#> [1] "new_a" "bb"

Created on 2020-10-26 by the reprex package (v0.3.0)

Reproducible example

  • Post a minimal reproducible example so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Now comes the issue, if we don't have any file in the folder, I get callr subprocess failed: cannot branch over empty target.

To control for this, I tried to use tar_skip() that seems to be relevant for this user case:

library(targets)
dir.create("tmp")
tar_script({
  library(tarchetypes)
  options(crayon.enabled = FALSE)
  tar_option_set(packages = c("tarchetypes", "tools"))
  tar_pipeline(
    tar_skip(files,
             dir("tmp", full.names = TRUE),
             skip = !length(dir("tmp"))),
    tar_change(x,
               files,
               change = md5sum(dir("tmp", full.names = TRUE))),
    tar_file(y,
             {
               setNames(x, md5sum(x))
              },
             pattern = map(x)),
    tar_target(z,
               readLines(y),
               pattern = map(y))
  )
})
tar_make()
#> ● run target files
#> ● cancel target files
#> ● run target x_change
#> ● run target x
#> Error : cannot branch over empty target (x)
#> Error: callr subprocess failed: cannot branch over empty target (x)

Created on 2020-10-26 by the reprex package (v0.3.0)

Desired result

I can see that the rule files was indeed cancel as expected. But it didn't prevent the following steps to be processed.
I am a little lost now for preventing dependent branches to be performed.
Many thanks in advance for your kind help

Diagnostic information

Created on 2020-10-26 by the reprex package (v0.3.0)

Session info
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices datasets  utils     methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.0.3  magrittr_1.5    htmltools_0.5.0 tools_4.0.3    
#>  [5] yaml_2.2.1      stringi_1.5.3   rmarkdown_2.5   highr_0.8      
#>  [9] knitr_1.30      stringr_1.4.0   xfun_0.18       digest_0.6.27  
#> [13] rlang_0.4.8     renv_0.12.0-30  evaluate_0.14
  • A stack trace from traceback() or rlang::trace_back().
  • The SHA-1 hash of the GitHub commit of tarchetypes currently installed. packageDescription("tarchetypes")$GithubSHA1 shows you this.

tarchetypes shasum commit: 585bce24595d672537d2fe19c4540350b81a675e

@wlandau
Copy link
Member

wlandau commented Oct 26, 2020

My recommendations are to:

  1. Use tar_files() instead of tar_change() and tar_file() individually (but please update tarchetypes to 6d72150 or later).
  2. Write a custom function to that handles the case when the directory is empty. In the example below it returns an error via stopifnot(), but you can implement different behavior: for example, supply the path of an existing file containing default placeholder data. I believe these kinds of checks should be handled early in the pipeline.
library(targets)
dir.create("tmp")
writeLines("line1", "tmp/a")
writeLines("line2", "tmp/b")

tar_script({
  library(tarchetypes)
  options(crayon.enabled = FALSE)
  list_files <- function(dir) {
    files <- dir(dir, full.names = TRUE)
    stopifnot(length(files) > 0L)
    files
  }
  tar_pipeline(
    tar_files(files, list_files("tmp")),
    tar_target(lines, readLines(files), pattern = map(files))
  )
})

tar_make()
#> ● run target files_files
#> ● run branch files_98d8cd68
#> ● run branch files_33ec7836
#> ● run branch lines_3ac9849f
#> ● run branch lines_e2330cb9

tar_read(lines)
#> [1] "line1" "line2"

Created on 2020-10-26 by the reprex package (v0.3.0)

There are workarounds that do not require tar_files():

library(targets)
dir.create("tmp")
writeLines("line1", "tmp/a")
writeLines("line2", "tmp/b")

tar_script({
  options(crayon.enabled = FALSE)
  list_files <- function(dir) {
    files <- dir(dir, full.names = TRUE)
    stopifnot(length(files) > 0L)
    files
  }
  tar_pipeline(
    tar_target(paths, list_files("tmp"), cue = tar_cue(mode = "always")),
    tar_target(files, paths, pattern = map(paths), format = "file"),
    tar_target(lines, readLines(files), pattern = map(files))
  )
})

tar_make()
#> ● run target paths
#> ● run branch files_fe460508
#> ● run branch files_dcad1233
#> ● run branch lines_ef9c4ed9
#> ● run branch lines_bc722500

Created on 2020-10-26 by the reprex package (v0.3.0)

@wlandau wlandau closed this as completed Oct 26, 2020
@ginolhac
Copy link
Author

Thanks a lot William for your incredible reactivity and helpful comments. I will try both approaches tomorrow.
Thanks again for your precious time

@ginolhac
Copy link
Author

I ended using tar_files that remove all md5sum I was doing manually. And great idea with a placeholder, when I have no file, I catch this with a dummy file and listing further down is adapted. Thanks again! Much appreciated!

@wlandau
Copy link
Member

wlandau commented Oct 27, 2020

Glad to hear it's working for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants