Accommodation of script-based imperative workflows #994

wlandau · 2019-08-22T13:50:54Z

Prework

Read and abide by drake's code of conduct.
Search for duplicates among the existing issues, both open and closed.

Idea

Suggested by @thebioengineer. May improve the migration of old projects to drake and contribute to ropensci-books/drake#41.

script_in("file.R") could be shorthand for source("file.R")$value at runtime. At code analysis time, script_in() could tell drake to analyze the code in file.R for dependencies in the usual way.

drake_plan(
  data = script_in("01-data.R"),
  munge = script_in("02-munge.R"),
  model = script_in("03-model.R"),
  results = script_in("04-results.R")
)

Concerns

This way of doing things goes against drake's function-oriented style, and it makes more room for suboptimal programming practices. Plus, users can already achieve script-based behavior like this:

drake_plan(
  data = source(file_in("01-data.R"))$value,
  munge = {
    data # mentioned as an explicit dependency
    source(file_in("02-munge.R"))$value
  },
  model = {
    munge
    source(file_in("03-model.R"))$value
  },
  results = {
    model
    source(file_in("04-results.R"))$value
  }
)

I am eager to discuss, and my mind could be changed. However, my current opinion is that we should not make script-based imperative workflows easier. I think we should keep nudging users to write functions.

The text was updated successfully, but these errors were encountered:

thebioengineer · 2019-08-22T16:09:11Z

I agree that promoting the usage of scripts rather than functions should try to be avoided. My thought was trying to allow for users to add pre-existing workflows they have into drake. In addition, using the source(file_in("my_script.R"))$value only allows for the last value that is generated to the captured.

If source_in were to be used and promoted the usage of the other "traditional" drake functions (ie file_in, file_out, loadd, readd) to find and document dependencies, I think it would be less friction. In addition, loud warnings and complaining about using external scripts could be added.

wlandau · 2019-08-22T19:08:26Z

I agree that promoting the usage of scripts rather than functions should try to be avoided. My thought was trying to allow for users to add pre-existing workflows they have into drake. In addition, using the source(file_in("my_script.R"))$value only allows for the last value that is generated to the captured.

I totally agree.

If source_in were to be used and promoted the usage of the other "traditional" drake functions (ie file_in, file_out, loadd, readd) to find and document dependencies, I think it would be less friction. In addition, loud warnings and complaining about using external scripts could be added.

It seems odd to make things smoother with script_in()/source_in() only to make them rougher again with warnings, all for something we would rather avoid anyway.

If the issue is converting old imperative workflows to drake, maybe we should focus on the conversion itself.

library(drake)

parse_script <- function(file) {
  lines <- paste(readLines(file), sep = "\n")
  code <- parse(text = lines, keep.source = FALSE)
  out <- call("{")
  for (i in seq_along(code)) {
    out[[i + 1]] <- code[[i]]
  }
  out
}

script1 <- tempfile()
script2 <- tempfile()
writeLines(c("data <- my_data()", "munge(data)"), script1)
writeLines("analyze(munged)", script2)

plan <- drake_plan(
  munged = !!parse_script(script1),
  analysis = !!parse_script(script2)
)

drake_plan_source(plan)
#> drake_plan(
#>   munged = {
#>     data <- my_data()
#>     munge(data)
#>   },
#>   analysis = {
#>     analyze(munged)
#>   }
#> )

config <- drake_config(plan)
vis_drake_graph(config)

^{Created on 2019-08-22 by the reprex package (v0.3.0)}

wlandau · 2019-08-22T19:10:16Z

All we have to do is add this new parse_script() function into drake. (Hopefully with a better name. Thoughts?)

I consider this approach far lighter than script_in().

wlandau · 2019-08-22T19:16:00Z

To emphasize: drake_plan_source() completes the conversion. It gives you code to create the plan in a drake-approved way. Glad we already have that part.

thebioengineer · 2019-08-22T19:41:46Z

I was thinking along the same lines? My working version is:

script_in<-function(path){
  str2expression(c("{",readLines(path),"}"))[[1]]
}

I was looking for ways to add in the file tracking that you have with file_in too?

wlandau · 2019-08-23T10:41:03Z

Speed

This feature will probably not cause a bottleneck, but speed is still worth a look. It seems like the speed gains of str2expression() come from skipping the source ref info, which drake skips anyway in safe_parse(). So maybe str2expression() is not necessary in drake after all?

library(microbenchmark)
lines <- readLines(url("http://bioconductor.org/biocLite.R"))

# Used throughout drake's internals:
drake:::safe_parse("# 123")
#> expression()

microbenchmark(
  parse_src = parse(text = lines, keep.source = TRUE),
  parse = parse(text = lines, keep.source = FALSE),
  str = str2expression(lines),
  drake = drake:::safe_parse(lines)
)
#> Unit: microseconds
#>       expr     min      lq     mean   median       uq      max neval
#>  parse_src 862.956 907.467 969.0071 918.4455 992.3515 1484.220   100
#>      parse 568.086 596.110 622.2085 602.0705 629.5830  844.246   100
#>        str 567.813 596.388 626.6053 601.8305 646.4530  881.024   100
#>      drake 579.209 604.267 636.5746 612.3220 655.9230  845.641   100

^{Created on 2019-08-23 by the reprex package (v0.3.0)}

Implementation

Your one-liner is way better than my parse_script() from #994 (comment). We have to deal with the file text anyway, so we might as well put the curly braces on there. My only suggestion is to use drake's safe_parse() instead of str2expression(). It skips the source ref, and it avoids taking [[1]] on objects of length 0.

script_in <- function(path){
  safe_parse(c("{", readLines(path), "}"))
}

More thoughts on names

This is the hardest part.

script_in()

On its own, this would be an excellent name. I just worry that it is too much like the existing file_in() and knitr_in(), which behave differently.

source_script()

You brought up a great point yesterday that from the user's perspective, the script will ultimately be sourced for all intents and purposes when the user calls make(). However, I still think it misrepresents what is actually happening when the function is called. There is a real separation between parsing and evaluation, and I hope we can reflect this in the name.

inline_script()

Inline functions are already an established concept in C/C++, and the Rcpp and inline have at least planted the seed in at least some parts of the R community. More importantly, I think it better reflects what the function is doing. On the other hand, the term "inline" is more esoteric than "source".

insert_script()

This one is my favorite. It evokes what is really going on, and I think people will know what we mean by insertion.

include_script()

I thought about it, but I don't like it that much. Inclusion does not necessarily mean we literally insert the code (e.g. # include <c++_header.h>).

file_in()-like tracking

The code will already be tracked in the plan, so I believe this will not be necessary.

wlandau · 2019-08-23T11:30:23Z

A big problem with this whole approach is that we will end up with an enormous plan that users will ultimately need to refactor. How about script_to_function()? Details shortly.

wlandau · 2019-08-23T11:30:32Z

library(drake)

script_to_function <- function(path) {
  lines <- readLines(path)
  lines <- c("function(...) {", lines, "}")
  text <- paste(lines, sep = "\n")
  drake:::safe_parse(text)
}

munge_script <- tempfile()
analysis_script <- tempfile()
writeLines(c("data <- my_data()", "munge(data)"), munge_script)
writeLines("analyze(munged)", analysis_script)

do_munging <- script_to_function(munge_script)
do_analysis <- script_to_function(analysis_script)

do_munging
#> function(...) {
#>     data <- my_data()
#>     munge(data)
#> }

do_analysis
#> function(...) {
#>     analyze(munged)
#> }

drake_plan(
  munged_value = do_munging(),
  analysis_value = do_analysis(munged_value)
)
#> # A tibble: 2 x 2
#>   target         command                  
#>   <chr>          <expr>                   
#> 1 munged_value   do_munging()             
#> 2 analysis_value do_analysis(munged_value)

^{Created on 2019-08-23 by the reprex package (v0.3.0)}

wlandau · 2019-08-23T11:48:38Z

Benefits

Shorter plans.
An extra nudge toward functions.
Users can learn how function call arguments control dependency relationships in the plan.
As before, users can leave their original code alone. If they don’t like drake, they have egress.
No need for tidy eval (!!).

Concerns

Arguments to those functions don’t do anything except connect targets together for drake. Not terrible, but a bit awkward. Can we get people closer to something more ideal for drake?

brendanf · 2019-08-23T12:09:14Z

Arguments to those functions don’t do anything except connect targets together for drake.

It looks like your example wouldn't run, because the body of do_analysis calls analysis(munged) but the target name is munged_value. This is exactly the sort of error that I expect many people (myself included) would make if the functions produced by script_to_function accept only dummy arguments that don't actually pass data to the computation.

What about:

script_to_function <- function(path, args) {
  #need to check that args is an unnamed character vector of valid names
  lines <- readLines(path)
  lines <- c("function(", paste(args, sep = ", "), ") {", lines, "}")
  text <- paste(lines, sep = "\n")
  drake:::safe_parse(text)
}

?

brendanf · 2019-08-23T12:14:21Z

Another issue that I expect would bite new users of this method is that file dependencies in the script would be untracked by drake unless the user adds a call to file_in.

wlandau · 2019-08-23T15:01:33Z

In this scenario, the target names are unrelated to the preexisting scripts of a non-drake project. From what I have seen, these scripts usually save intermediate results to files, and those files establish the dependency relationships. The scripts do not use file_in() or file_out(), so drake cannot detect file-induced dependency relationships automatically. I believe ... solves this problem because it lets people set those relationships ad-hoc dummy arguments: not only file_in() and file_out() files, but also target names. It's a quick way to get a project working, and it's a quick way to learn how drake thinks about dependencies. That's the goal, and I do not think we need formal arguments to achieve it.

In other words, script_to_function() just does the simplest thing, and it is just a starting point. It is the hint I think users need to start refactoring their code. A new chapter in the manual will walk through the post-script_to_function() refactoring process, re ropensci-books/drake#41.

wlandau · 2019-08-23T16:16:54Z

Just occurred to me: people use R Markdown for script-oriented workflows: 01_data.Rmd, 02_munge.Rmd, 03_analysis.Rmd, etc. So either

We make script_to_function() understand R Markdown (my vote) or
We need a new rmd_to_function().

code_to_plan() already does (1):

drake/R/drake_plan_helpers.R

Lines 777 to 790 in 6fca3fd

    
           code_to_plan <- function(path) { 
        
             stopifnot(file.exists(path)) 
        
             txt <- readLines(path) 
        
             # From CodeDepends: https://github.com/duncantl/CodeDepends/blob/7c9cf7eceffaea1d26fe25836c7a455f059e13c1/R/frags.R#L74 # nolint 
        
             # Checks if the file is a knitr report. 
        
             if (any(grepl("^(### chunk number|<<[^>]*>>=|```\\{r.*\\})", txt))) { # nolint 
        
               txt <- get_tangled_text(path) 
        
             } 
        
             nodes <- parse(text = txt) 
        
             out <- lapply(nodes, node_plan) 
        
             out <- do.call(rbind, out) 
        
             out <- parse_custom_plan_columns(out) 
        
             sanitize_plan(out) 
        
           }

where

drake/R/analyze_code.R

Lines 301 to 312 in 6fca3fd

    
           # From https://github.com/duncantl/CodeDepends/blob/master/R/sweave.R#L15 
        
           get_tangled_text <- function(doc) { 
        
             assert_pkg("knitr") 
        
             id <- make.names(tempfile(), unique = FALSE, allow_ = TRUE) 
        
             con <- textConnection(id, "w", local = TRUE) 
        
             on.exit(close(con)) 
        
             with_options( 
        
               new = list(knitr.purl.inline = TRUE), 
        
               code = knitr::knit(doc, output = con, tangle = TRUE, quiet = TRUE) 
        
             ) 
        
             textConnectionValue(con) 
        
           }

If we go with (1), should we go with code_to_function() to be more consistent with code_to_plan()?

We could add an argument to get_tangled_frags() to suppress parsing ()

thebioengineer · 2019-08-23T16:25:18Z

We might want to make code more similar to the code to plan. Are you thinking that the rmd would be able to be rendered still at each step too?

wlandau · 2019-08-23T16:33:15Z

I was thinking we could just extract the code from the active chunks and stick it in a function. No rendering required. Sound good?

wlandau · 2019-08-23T18:41:28Z

Another thing: we need to think about the return values of the functions. For example, if all the functions generated by code_to_function() return NULL (which could easily happen) then updates to downstream targets will not trigger. Quick-and-dirty catch-all solution: return a hash of the function's own body. Sketch:

code_to_function <- function(path) {
  lines <- readLines(path)
  knitr_pattern <- "^(### chunk number|<<[^>]*>>=|```\\{r.*\\})"
  if (any(grepl(knitr_pattern, lines))) {
    lines <- get_tangled_text(path) 
  }
  lines <- c(
    "function(...) {",
    lines,
    "standardize_function(sys.function())", # From drake. Calls deparse(), but this shouldn't be a bottleneck...
    "}"
  )
  text <- paste(lines, sep = "\n")
  eval(safe_parse(text))
}

wlandau · 2019-08-23T18:48:21Z

So then user-side refactoring might not be simple. We really need to talk about the sophisticated dependency tracking you get when you start with functions. But at least we can slap drake onto an arbitrary project and get people through the door.

thebioengineer · 2019-08-24T13:14:10Z

what do you think about adding code_to_function to the escaped functions from the tidy_eval? (like target and transform are now, this would allow users to use the function inline in the drake plan. Not sure if this goes against your thoery that we don't want to promote usage of this technique explicitly. But it would reduce friction for users so they don't need to remember !! if they use it directly in the plan.

wlandau · 2019-08-24T16:43:45Z

Would you sketch what you are thinking? Not 100% sure I follow exactly.

I could be convinced otherwise, but my current (and strong) preference is to connect the concept of a script to the concept of a function that gets defined outside the plan, and then all the plan has to do is connect the predefined pieces together (#994 (comment)). This is when drake workflows are cleanest and most manageable.

wlandau · 2019-08-24T16:50:17Z

Also cc @pat-s, re #193. Do you think this would help get script-loving users on board? To be honest, I have trouble relating to this personally because I have always preferred functions.

thebioengineer · 2019-08-24T19:21:49Z

I see. My thought is this:

code_to_function <- function(path,...) {
  stopifnot(file.exists(path))
  path<-gsub("\\","\\\\",path,fixed=TRUE) # needed to allow for using path in a string to eval in windows systems

  dependson<- match.call(expand.dots = FALSE)$...

  expr<-paste0(c("{",
                 paste0("file_in(\"",path, "\")"),
                 dependson,
                 paste0("source_function<-eval(parse_source(\"",path,"\"))"),
                 "source_function()",
                 "}"),collapse = "\n")
  safe_parse(expr)
}

parse_source <- function(path) {
  lines <- readLines(path)
  knitr_pattern <- "^(### chunk number|<<[^>]*>>=|```\\{r.*\\})"
  if (any(grepl(knitr_pattern, lines))) {
    lines <- get_tangled_text(path)
  }
  lines <- c(
    "function(){",
    lines,
    "digest::digest(lines, algo = config$cache$hash_algorithm,serialize = FALSE)",
    "}"
  )
  text <- paste(lines, sep = "\n")
  safe_parse(text)
}

munge_script <- file.path(tempdir(),"01-munge.R")
analysis_script <- file.path(tempdir(),"02-analyze.R")
vis_script <- file.path(tempdir(),"03-plot.R")
writeLines(c("data <- my_data()", "munge(data)"), munge_script)
writeLines("analyze(munged)", analysis_script)
writeLines("plot(analyzed)", vis_script)

# currently we need the !! to allow the evaluation and return of the parsed script. My proposal is to remove that
plan<-drake_plan(
  munged_value = !!code_to_function(munge_script),
  analysis_value = !!code_to_function(analysis_script,munged_value),
  plot_value = !!code_to_function(vis_script,analysis_value)
)

drake_plan_source(plan)

plan
config <- drake_config(plan)
vis_drake_graph(config)
 
#next are theorhetical, but should function
make(plan)
vis_drake_graph(config)

writeLines(c("analyze(munged)", "newthing(data)"), analysis_script)

#analysis and vis scripts are now outdated 
vis_drake_graph(config)

#reruns analysis and vis scripts only
make(plan)
vis_drake_graph(config)

Is this totally against how you want to incorporate external R scripts? This was another shower thought that I wanted to explore the functionality.

In addition, this set up allows for triggering rebuilding itself if the file changes and updates will be captured. This function setup does not need to be done within the plan, but it was a thought

pat-s · 2019-08-24T21:04:47Z

Also cc @pat-s, re #193. Do you think this would help get script-loving users on board? To be honest, I have trouble relating to this personally because I have always preferred functions.

TL;DR - also seems you're still in deep discussion.
Q: What is the difference between script_in() and code_to_plan()? Was not really obvious to me by reading the first few comments.
(HTH if I can.)

wlandau · 2019-08-24T21:41:23Z

@thebioengineer, clever, but I would rather not go that route because it tracks scripts as files instead of totally relying on the parsed code in the R session. I think code_to_function() should be like source("R/functions.R"). The goal is to get people closer to using drake properly, and I also think we should keep the implementation simple.

@pat-s, I am curious if #994 (comment) + #994 (comment) + a walkthrough in the manual would have helped you better understand drake back when you posted #193.

thebioengineer · 2019-08-25T19:54:18Z

@wlandau Gotcha. I was approaching it as if the user would want to just update their script and then run make(plan), without having to think about needing to re-generate the plan object. Thinking more, the file_in really didn't need to exist., it just added tracking for me.

The steps to add drake to a pre-existing workflow would be as follows then:

make a 00-plan.R file.
generate a function for each of the scripts/Rmd that need to be run, using code_to_function()
generate the plan object using drake_plan, where each script dependency is linked to the prior script by using the output object (need to work on the wording here)
to visualize the network of dependencies, generate a config object via drake_config(), and then plot it via vis_drake_graph()
To update the plan to incorporate the update that happen to your script/Rmd, regenerate the scripts object by rerunning code_to_function(), and rerun make(plan) to have drake rerun all dependencies

Does that workflow jive with your mental model?

ooo · 2019-08-25T19:54:20Z

👋 Hey @thebioengineer...

Letting you know, @wlandau is currently OOO until Thursday, September 12th 2019. ❤️

wlandau · 2019-08-26T02:34:27Z

Yeah, that's basically it! Here's how I would explain it. Suppose we have a workflow with traditional scripts.

run_everything.R
R/
├─ 01_data.R
├─ 02_munge.R
└─ 03_analyze.R

where run_everything.R looks like this:

source("01_data.R")
source("02_munge.R")
source("03_analyze.R")

I propose we change run_everything.R to this:

library(drake)

do_data <- code_to_function("01_data.R")
do_munge <- code_to_function("02_munge.R")
do_analyze <- code_to_function("03_analyze.R")

plan <- drake_plan(
  data = do_data(),
  munged = do_munge(data),
  analysis = do_analyze(munged)
)

make(plan)

Take-home messages:

Functions replace scripts.
The return values are important.
The commands in the plan reference the functions and targets.
The symbols in the plan determine the runtime order of the targets.

This is what I am planning to flesh out for ropensci-books/drake#41.

wlandau · 2019-08-26T02:37:28Z

The goal is to make the conceptional leap from imperative (script-oriented) workflows to function-oriented workflows. Does #994 (comment) accomplish this? I have trouble understanding why function-oriented workflows are so confusing to people, so I do not always know how to help.

thebioengineer · 2019-08-26T04:10:03Z

Gotcha. I think most people are trained and typically approach workflows as a series of steps, and use functions as more of a "DRY" principal, rather than seeing each step as an opportunity to write a function. At least that has been my typical approach.

I think working in tandem with the manual, developing the idea of how to translate the workflow steps into functions is the key.

mik3y64 · 2019-08-26T04:34:52Z

I think this is good #994 (comment). It's clean approach for users to try out drake.

Additionally, when users finally decide to use drake approach, code_to_function can be further extended, perhaps called write_code_to_function to convert and write out existing scripts in new folder (keeping the old scripts as backup). Many extra arguments/features can be implemented to help smooth transition, for example, allowing users to specify name of functions, otherwise default to existing name of scripts.

# existing script
analyze(munged)

write_code_to_function(path = 'analyze.R')

# new script
analyze <- function(...) {
  analyze(munged)
}

From there on, users can start using the drake function approach.

wlandau · 2019-08-26T12:14:23Z

Yeah, let's see if there are other parts of the transition we can automate. write_code_to_function() certainly helps get the idea across, but duplicated code in the same workspace can be difficult to maintain, especially if users need to go back and forth between the drake and non-drake stuff. I will mull it over.

pat-s · 2019-08-26T12:55:14Z

@pat-s, I am curious if #994 (comment) + #994 (comment) + a walkthrough in the manual would have helped you better understand drake back when you posted #193

#994 (comment) is certainly a nice start that makes things more clear.

From my side I would recommend to make the function-based approach from drake much more clear right from the start and link to the appropriate examples in the manual.
Maybe get them top-level with something like "are you a script-based person -> read here; are you a function-based person? -> read here".

Leaving beside how drake deals with script-based workflows in the end, users need to be briefed right from the start that their view on generating a workflow might need to change (or drake might even softly push them to change).

In my field (ecological modelling) almost all people come from a script-based workflow, simply because they lack the skill of writing functions (or just the experience).
When telling them about drake, the script-vs-function oriented question gets raised a lot ("what is is", "where is the advantage", etc.).

thebioengineer · 2019-08-29T05:18:05Z

@pat-s Good points. Getting drake to "accept" script based workflows might not be the more difficult part, but more convincing people the value of workflow management. I am going to pour over the current manual to see how we can make the value add to script based workflows more apparent, and how to move to a script based flow!

wlandau · 2019-08-29T12:33:03Z

I recommend chapter 5 ("drake projects") which explains how best to organize code into files for drake.

@thebioengineer, are you saying you want traditional imperative script-based workflows to be a final solution/destination for drake use? Do you think we can do that in a way that stays true to drake's core values?

thebioengineer · 2019-08-29T15:26:56Z

My comment was based on @pat-s 's comment that a number of people in his field have difficulty seeing the value in function-based workflows. Making drake to handle, but have a little friction like we do now, might be an opportunity to then espouse the value of converting to more of a function based workflow.

In the manual we could provide some advice on how to perform the conversion, and what exactly the value add is. Not just from it is easier for drake, but it is easier on the person maintaining the workflow to have discrete functions to perform specific tasks.

wlandau · 2019-08-29T21:12:39Z

Ah, got it, thanks for clarifying. I agree that we should argue for a function-based approach in and of itself, drake or no drake.

wlandau · 2019-09-11T15:03:58Z

@thebioengineer, do you still want to submit a PR with code_to_function() + docs, or should I? I realize you may have been waiting for me to return to the office.

thebioengineer · 2019-09-11T15:26:04Z

hey @wlandau, yes, sorry this has been in my queue to submit. I will do it tonight!

I have been thinking on how to best incorporate drake into my existing script-based workflows to be able to speak more from experience on how to use and purpose of this function in the manual. Hopefully what I write ends up coherent :)

ooo · 2019-09-11T15:26:06Z

👋 Hey @thebioengineer...

Letting you know, @wlandau is currently OOO until Thursday, September 12th 2019. ❤️

wlandau · 2019-09-13T05:00:21Z

Re #1007 (comment), I propose a different code_to_function() that returns a timestamp and tempfile instead of a hash.

code_to_function <- function(path) {
  lines <- readLines(path)
  knitr_pattern <- "^(### chunk number|<<[^>]*>>=|```\\{r.*\\})"
  if (any(grepl(knitr_pattern, lines))) {
    lines <- get_tangled_text(path)
  }
  lines <- c(
    "function(...) {",
    lines,
    "c(format(Sys.time(), \"%Y-%m-%d %H:%M:%OS9 %z GMT\"), tempfile())",
    "}"
  )
  text <- paste(lines, sep = "\n")
  eval(parse(text = text))
}

thebioengineer · 2019-09-14T14:15:24Z

I like that solution, as you said over in #1007, it should then act like a basic make system. I was flying yesterday, so I had come up with this solution: before I saw your response:

code_to_function <- function(path,...) {
  lines <- readLines(path)
  knitr_pattern <- "^(### chunk number|<<[^>]*>>=|```\\{r.*\\})"
  if (any(grepl(knitr_pattern, lines))) {
    lines <- get_tangled_text(path)
  }
  lines <- c(
    "function(...){",
    # This may look a little funky, but this is grabbing the resulting environment that generated the output
    "script<-function(){",lines,"sys.frame(sys.nframe())","}",
    "evalenv<-script()",
    # Then I take a digest of the objects found in the script to identify if things changed
    "digest::digest(as.list(evalenv),algo = \"xxhash64\")",
    "}"
  )
  text <- paste(lines, sep = "\n")

  eval(safe_parse(text))
}

It evaluates the environment that gets created when the script would be run and returns that. If the environment changes, the output would then change and trigger downstream builds.

wlandau · 2019-09-14T19:02:45Z

Interesting. Unfortunately, though, I think #994 (comment) is likely to get us into trouble.

Tracking an environment directly tends to lead us to edge cases like R6 object keeps rebuilding #345. R's internals are strange, fickle, and above all brittle, and it is difficult to exclude them from the hash. This is exactly why drake serializes functions to text before hashing them, and it even takes special precautions to eliminate raw pointers mentioned in Rcpp functions (Bug in combination with Rcpp #344).
Accommodation of script-based imperative workflows #994 (comment) approach still does not track files written to storage, and those files could be ancillary to the environment.

We are bypassing drake's elaborate tracking system here, so it is safer to always invalidate downstream targets whenever something gets run. Timestamps are good at accomplishing this, and tempfile() makes up for the platform-dependent imprecision we often find in timestamps (see #4).

thebioengineer · 2019-09-16T14:53:24Z

I see. That makes sense, I had forgot about the cases where raw pointers can change. Like I had said, I had come up with that solution before I had seen yours and wanted to propose it.

I will incorporate the proposed solution and add additional tests!

thebioengineer · 2019-09-23T05:56:00Z

@wlandau, I have added a test that checks what I think was checking the idea. I have resolved a number of the lintr comments and added tests for using RMD as scripts as well.

lintr is complaining about the line of code that Identifies if the input is an RMD so it can execute get_tangled_text() on the input. - do you have any suggestions for how to handle that?

I resolved this by adding #nolint to the end of the line. Based on the "Project Configuration" section in lintr

wlandau · 2019-09-23T14:44:54Z

Awesome! I will look at your tests. #nolint is totally fine for that long grep pattern.

wlandau · 2019-09-30T00:46:08Z

Thank you @thebioengineer for implementing this in #1007!

wlandau added topic: api topic: reproducibility status: uncertain labels Aug 22, 2019

wlandau closed this as completed Aug 22, 2019

wlandau added the status: may revisit label Aug 22, 2019

wlandau reopened this Aug 22, 2019

wlandau removed status: may revisit status: uncertain labels Aug 23, 2019

wlandau mentioned this issue Aug 23, 2019

More explicit integration with the workflow management paradigm by Jenny Bryan et al. ropensci-books/drake#41

Closed

wlandau mentioned this issue Aug 30, 2019

Plans as functions #999

Closed

3 tasks

thebioengineer mentioned this issue Sep 12, 2019

Source script #1007

Merged

4 tasks

wlandau changed the title ~~script_in()?~~ Accommodation of script-based imperative workflows Sep 13, 2019

wlandau closed this as completed Sep 30, 2019

thebioengineer mentioned this issue Oct 3, 2019

script_based_workflows chapter ropensci-books/drake#116

Merged

12 tasks

Accommodation of script-based imperative workflows #994

Accommodation of script-based imperative workflows #994

Comments

wlandau commented Aug 22, 2019 • edited Loading

Prework

Idea

Concerns

thebioengineer commented Aug 22, 2019

wlandau commented Aug 22, 2019

wlandau commented Aug 22, 2019

wlandau commented Aug 22, 2019

thebioengineer commented Aug 22, 2019 • edited Loading

wlandau commented Aug 23, 2019

Speed

Implementation

More thoughts on names

script_in()

source_script()

inline_script()

insert_script()

include_script()

file_in()-like tracking

wlandau commented Aug 23, 2019

wlandau commented Aug 23, 2019 • edited Loading

wlandau commented Aug 23, 2019 • edited Loading

Benefits

Concerns

brendanf commented Aug 23, 2019 • edited Loading

brendanf commented Aug 23, 2019

wlandau commented Aug 23, 2019

wlandau commented Aug 23, 2019

thebioengineer commented Aug 23, 2019

wlandau commented Aug 23, 2019

wlandau commented Aug 23, 2019 • edited Loading

wlandau commented Aug 23, 2019 • edited Loading

thebioengineer commented Aug 24, 2019

wlandau commented Aug 24, 2019

wlandau commented Aug 24, 2019

thebioengineer commented Aug 24, 2019 • edited Loading

pat-s commented Aug 24, 2019

wlandau commented Aug 24, 2019 • edited Loading

thebioengineer commented Aug 25, 2019

ooo bot commented Aug 25, 2019

wlandau commented Aug 26, 2019 • edited Loading

wlandau commented Aug 26, 2019 • edited Loading

thebioengineer commented Aug 26, 2019

mik3y64 commented Aug 26, 2019 • edited Loading

wlandau commented Aug 26, 2019

pat-s commented Aug 26, 2019

thebioengineer commented Aug 29, 2019

wlandau commented Aug 29, 2019

thebioengineer commented Aug 29, 2019 • edited Loading

wlandau commented Aug 29, 2019

wlandau commented Sep 11, 2019

thebioengineer commented Sep 11, 2019

ooo bot commented Sep 11, 2019

wlandau commented Sep 13, 2019 • edited Loading

thebioengineer commented Sep 14, 2019 • edited Loading

wlandau commented Sep 14, 2019 • edited Loading

thebioengineer commented Sep 16, 2019

thebioengineer commented Sep 23, 2019 • edited Loading

wlandau commented Sep 23, 2019

wlandau commented Sep 30, 2019

wlandau commented Aug 22, 2019 •

edited

Loading

thebioengineer commented Aug 22, 2019 •

edited

Loading

wlandau commented Aug 23, 2019 •

edited

Loading

wlandau commented Aug 23, 2019 •

edited

Loading

brendanf commented Aug 23, 2019 •

edited

Loading

wlandau commented Aug 23, 2019 •

edited

Loading

wlandau commented Aug 23, 2019 •

edited

Loading

thebioengineer commented Aug 24, 2019 •

edited

Loading

wlandau commented Aug 24, 2019 •

edited

Loading

wlandau commented Aug 26, 2019 •

edited

Loading

wlandau commented Aug 26, 2019 •

edited

Loading

mik3y64 commented Aug 26, 2019 •

edited

Loading

thebioengineer commented Aug 29, 2019 •

edited

Loading

wlandau commented Sep 13, 2019 •

edited

Loading

thebioengineer commented Sep 14, 2019 •

edited

Loading

wlandau commented Sep 14, 2019 •

edited

Loading

thebioengineer commented Sep 23, 2019 •

edited

Loading