Use tibbles and language objects internally instead of data frames and text #247

wlandau · 2018-02-08T04:44:26Z

I still prefer users to be able to supply data frames of text, but I think the preferable format of a workflow plan should be a tibble with commands as language objects or something similar. @krlmlr, I think we agree on this. To me, it seems like the more natural, more sophisticated, and safer way of doing things.

When I began drake, I knew almost nothing about computing on the language, and I was much more comfortable with text processing. I am still not an expert in computing on the language, so I think I will need help to get this right. #246 is a first step, and it frees up work on other issues, but there is a lot more to do.

From my perspective, there are some challenges:

Make sure the command gets called inside a call to local({...}). Currently, it is perfectly fine to write plan <- drake_plan(target = {x <- 1; x}), and make(plan) does not add or modify x in the user's environment. We should keep this behavior in order to avoid side effects and race conditions. So far, I have been adding on local({...}) with paste0() in preprocess_command(), and I hope there is a similar way to modify language objects.
Use tidy evaluation properly (see Incorporate rlang tidy evaluation in drake #200). Maybe drake should even use quosures internally instead of ordinary language objects. We should make a solid group decision on this.
Commands may get standardized differently. Standardized commands are dependencies of targets, so changing standardization will break back compatibility with old projects. I really don't want to do this without a good reason, and it certainly should not happen very often. The recent release of 5.0.0 already broke back compatibility in order to solve Huge number of files in .drake/keys .drake/data #154 properly, and writing the functions to migrate old projects was a real pain.

The text was updated successfully, but these errors were encountered:

wlandau · 2018-02-08T20:16:57Z

Thinking about it more, we should move to tibbles before starting on #169 and #237. tibble-based workflow plans are the right way to contain target-level future evaluators. The point about commands as language objects is important, but it only affects #232 and #233.

clarkfitzg · 2018-02-13T18:28:43Z

Language objects can be modified like this:

> code = quote(x <- 5)
> code
x <- 5
> eval(code)
> x
[1] 5
> x = 20
> code2 = call("local", call("{", code))
> code2
local({
    x <- 5
})
> eval(code2)
> x
[1] 20

tibble-based workflow plans are the right way to contain target-level future evaluators.

Why?

wlandau · 2018-02-13T20:46:29Z

Thanks, @clarkfitzg! Knowing this helps a ton.

I heard somewhere that tibbles were better at handling list columns than ordinary data frames, which is important for custom evaluators. Am I wrong about that?

clarkfitzg · 2018-02-14T00:09:42Z

List columns work with ordinary data frames.

> code = list(quote(1 + 2), quote(print(x)))
> code
[[1]]
1 + 2

[[2]]
print(x)

> class(code)
[1] "list"
> df = data.frame(1:2)
> df
  X1.2
1    1
2    2
> df$code = code
> df
  X1.2     code
1    1    1 + 2
2    2 print(x)
> sapply(df, class)
     X1.2      code 
"integer"    "list" 
>

wlandau · 2018-02-24T19:40:14Z

We do have tibbles now, and the way we handle commands (text or expressions or quosures) will depend on the #233. I do not think it is worth getting to far ahead of the DSL right now, and having expressions/quosures could break wildcard templating. Will reopen once we are far along enough with #233.

krlmlr · 2018-02-24T19:53:19Z

I think we can start with accepting a plan data frame that has a list column full of calls in the command column, this is now possible with file_in() etc.. Agree to postpone switching of internals to a later stage.

wlandau · 2018-02-24T22:34:08Z

Actually, depending on whether we use quosures, there may not even be anything to change in the internals.

library(drake)
plan <- data.frame(target = letters[1:2])
plan$command <- list(quote(sqrt(1 + 1234123)), quote(c("x", "y")))
plan
#>   target           command
#> 1      a sqrt(1 + 1234123)
#> 2      b       c("x", "y")
make(plan)
#> cache /tmp/RtmpmonKp5/.drake
#> connect 1 import: plan
#> connect 2 targets: a, b
#> check 2 items: c, sqrt
#> check 2 items: a, b
#> All targets are already up to date.
make(plan)
#> cache /tmp/RtmpmonKp5/.drake
#> connect 1 import: plan
#> connect 2 targets: a, b
#> check 2 items: c, sqrt
#> check 2 items: a, b
#> All targets are already up to date.
plan$command = c("sqrt(1 + 1234123)", "c(\"x\", \"y\")")
make(plan)
#> cache /tmp/RtmpmonKp5/.drake
#> connect 1 import: plan
#> connect 2 targets: a, b
#> check 2 items: c, sqrt
#> check 2 items: a, b
#> All targets are already up to date.

So language objects may be fine. Drake thinks the difference between expression(c("x", "y")) and "c(\"x\", \"y\")" is nontrivial, so we just may want users to define commands as language objects rather than expressions.

krlmlr · 2018-02-28T13:42:29Z

Can we change drake_plan() to return a list in the command column then?

wlandau · 2018-02-28T17:38:53Z

Quite possibly, but printing may not be as nice.

library(drake)
my_plan <- drake_plan(x = simulate(), y = summarize(x))
my_plan
#> # A tibble: 2 x 2
#>   target command     
#>   <chr>  <chr>       
#> 1 x      simulate()  
#> 2 y      summarize(x)
my_plan$command <- as.list(my_plan$command)
my_plan
#> # A tibble: 2 x 2
#>   target command  
#>   <chr>  <list>   
#> 1 x      <chr [1]>
#> 2 y      <chr [1]>

krlmlr · 2018-02-28T21:42:34Z

Fortunately, that's easy to fix ;-)

krlmlr · 2018-02-28T21:43:06Z

Maybe even with an S3 class that prints with syntax highlighting?

wlandau · 2018-03-01T13:34:14Z

That would be super nice. +1 if it also works in HTML vignettes.

krlmlr · 2018-03-02T16:22:11Z

We can do it with some tweaking, see tidyverse/tidyverse.org#121 and https://www.tidyverse.org/articles/2018/03/pillar-1-2-1/.

...now that pillar 1.2.1 is out. Rleated: #247.

wlandau · 2018-08-19T23:05:52Z

Update from R/Pharma 2018: if we do ever make a complete switch from text to language objects, substituteDirect() might help with the wildcard interface if we do not have the DSL yet.

wlandau added depends: help or input difficulty: advanced internals labels Feb 8, 2018

wlandau mentioned this issue Feb 10, 2018

Embrace the tibble! #250

Merged

wlandau mentioned this issue Feb 13, 2018

Make it easier to load workflows structured as packages. #208

Closed

wlandau closed this as completed Feb 24, 2018

wlandau-lilly added a commit that referenced this issue Mar 2, 2018

Use tibbles for build times...

92b2baa

...now that pillar 1.2.1 is out. Rleated: #247.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use tibbles and language objects internally instead of data frames and text #247

Use tibbles and language objects internally instead of data frames and text #247

wlandau commented Feb 8, 2018

wlandau commented Feb 8, 2018 •

edited

Loading

clarkfitzg commented Feb 13, 2018

wlandau commented Feb 13, 2018

clarkfitzg commented Feb 14, 2018

wlandau commented Feb 24, 2018

krlmlr commented Feb 24, 2018

wlandau commented Feb 24, 2018

krlmlr commented Feb 28, 2018

wlandau commented Feb 28, 2018

krlmlr commented Feb 28, 2018

krlmlr commented Feb 28, 2018

wlandau commented Mar 1, 2018 •

edited

Loading

krlmlr commented Mar 2, 2018

wlandau commented Aug 19, 2018

Use tibbles and language objects internally instead of data frames and text #247

Use tibbles and language objects internally instead of data frames and text #247

Comments

wlandau commented Feb 8, 2018

wlandau commented Feb 8, 2018 • edited Loading

clarkfitzg commented Feb 13, 2018

wlandau commented Feb 13, 2018

clarkfitzg commented Feb 14, 2018

wlandau commented Feb 24, 2018

krlmlr commented Feb 24, 2018

wlandau commented Feb 24, 2018

krlmlr commented Feb 28, 2018

wlandau commented Feb 28, 2018

krlmlr commented Feb 28, 2018

krlmlr commented Feb 28, 2018

wlandau commented Mar 1, 2018 • edited Loading

krlmlr commented Mar 2, 2018

wlandau commented Aug 19, 2018

wlandau commented Feb 8, 2018 •

edited

Loading

wlandau commented Mar 1, 2018 •

edited

Loading