Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tibbles and language objects internally instead of data frames and text #247

Closed
wlandau opened this issue Feb 8, 2018 · 14 comments
Closed

Comments

@wlandau
Copy link
Member

wlandau commented Feb 8, 2018

I still prefer users to be able to supply data frames of text, but I think the preferable format of a workflow plan should be a tibble with commands as language objects or something similar. @krlmlr, I think we agree on this. To me, it seems like the more natural, more sophisticated, and safer way of doing things.

When I began drake, I knew almost nothing about computing on the language, and I was much more comfortable with text processing. I am still not an expert in computing on the language, so I think I will need help to get this right. #246 is a first step, and it frees up work on other issues, but there is a lot more to do.

From my perspective, there are some challenges:

  1. Make sure the command gets called inside a call to local({...}). Currently, it is perfectly fine to write plan <- drake_plan(target = {x <- 1; x}), and make(plan) does not add or modify x in the user's environment. We should keep this behavior in order to avoid side effects and race conditions. So far, I have been adding on local({...}) with paste0() in preprocess_command(), and I hope there is a similar way to modify language objects.
  2. Use tidy evaluation properly (see Incorporate rlang tidy evaluation in drake #200). Maybe drake should even use quosures internally instead of ordinary language objects. We should make a solid group decision on this.
  3. Commands may get standardized differently. Standardized commands are dependencies of targets, so changing standardization will break back compatibility with old projects. I really don't want to do this without a good reason, and it certainly should not happen very often. The recent release of 5.0.0 already broke back compatibility in order to solve Huge number of files in .drake/keys .drake/data #154 properly, and writing the functions to migrate old projects was a real pain.
@wlandau
Copy link
Member Author

wlandau commented Feb 8, 2018

Thinking about it more, we should move to tibbles before starting on #169 and #237. tibble-based workflow plans are the right way to contain target-level future evaluators. The point about commands as language objects is important, but it only affects #232 and #233.

@clarkfitzg
Copy link

Language objects can be modified like this:

> code = quote(x <- 5)
> code
x <- 5
> eval(code)
> x
[1] 5
> x = 20
> code2 = call("local", call("{", code))
> code2
local({
    x <- 5
})
> eval(code2)
> x
[1] 20

tibble-based workflow plans are the right way to contain target-level future evaluators.

Why?

@wlandau
Copy link
Member Author

wlandau commented Feb 13, 2018

Thanks, @clarkfitzg! Knowing this helps a ton.

I heard somewhere that tibbles were better at handling list columns than ordinary data frames, which is important for custom evaluators. Am I wrong about that?

@clarkfitzg
Copy link

List columns work with ordinary data frames.

> code = list(quote(1 + 2), quote(print(x)))
> code
[[1]]
1 + 2

[[2]]
print(x)

> class(code)
[1] "list"
> df = data.frame(1:2)
> df
  X1.2
1    1
2    2
> df$code = code
> df
  X1.2     code
1    1    1 + 2
2    2 print(x)
> sapply(df, class)
     X1.2      code 
"integer"    "list" 
> 

@wlandau
Copy link
Member Author

wlandau commented Feb 24, 2018

We do have tibbles now, and the way we handle commands (text or expressions or quosures) will depend on the #233. I do not think it is worth getting to far ahead of the DSL right now, and having expressions/quosures could break wildcard templating. Will reopen once we are far along enough with #233.

@wlandau wlandau closed this as completed Feb 24, 2018
@krlmlr
Copy link
Collaborator

krlmlr commented Feb 24, 2018

I think we can start with accepting a plan data frame that has a list column full of calls in the command column, this is now possible with file_in() etc.. Agree to postpone switching of internals to a later stage.

@wlandau
Copy link
Member Author

wlandau commented Feb 24, 2018

Actually, depending on whether we use quosures, there may not even be anything to change in the internals.

library(drake)
plan <- data.frame(target = letters[1:2])
plan$command <- list(quote(sqrt(1 + 1234123)), quote(c("x", "y")))
plan
#>   target           command
#> 1      a sqrt(1 + 1234123)
#> 2      b       c("x", "y")
make(plan)
#> cache /tmp/RtmpmonKp5/.drake
#> connect 1 import: plan
#> connect 2 targets: a, b
#> check 2 items: c, sqrt
#> check 2 items: a, b
#> All targets are already up to date.
make(plan)
#> cache /tmp/RtmpmonKp5/.drake
#> connect 1 import: plan
#> connect 2 targets: a, b
#> check 2 items: c, sqrt
#> check 2 items: a, b
#> All targets are already up to date.
plan$command = c("sqrt(1 + 1234123)", "c(\"x\", \"y\")")
make(plan)
#> cache /tmp/RtmpmonKp5/.drake
#> connect 1 import: plan
#> connect 2 targets: a, b
#> check 2 items: c, sqrt
#> check 2 items: a, b
#> All targets are already up to date.

So language objects may be fine. Drake thinks the difference between expression(c("x", "y")) and "c(\"x\", \"y\")" is nontrivial, so we just may want users to define commands as language objects rather than expressions.

@krlmlr
Copy link
Collaborator

krlmlr commented Feb 28, 2018

Can we change drake_plan() to return a list in the command column then?

@wlandau
Copy link
Member Author

wlandau commented Feb 28, 2018

Quite possibly, but printing may not be as nice.

library(drake)
my_plan <- drake_plan(x = simulate(), y = summarize(x))
my_plan
#> # A tibble: 2 x 2
#>   target command     
#>   <chr>  <chr>       
#> 1 x      simulate()  
#> 2 y      summarize(x)
my_plan$command <- as.list(my_plan$command)
my_plan
#> # A tibble: 2 x 2
#>   target command  
#>   <chr>  <list>   
#> 1 x      <chr [1]>
#> 2 y      <chr [1]>

@krlmlr
Copy link
Collaborator

krlmlr commented Feb 28, 2018

Fortunately, that's easy to fix ;-)

@krlmlr
Copy link
Collaborator

krlmlr commented Feb 28, 2018

Maybe even with an S3 class that prints with syntax highlighting?

@wlandau
Copy link
Member Author

wlandau commented Mar 1, 2018

That would be super nice. +1 if it also works in HTML vignettes.

@krlmlr
Copy link
Collaborator

krlmlr commented Mar 2, 2018

We can do it with some tweaking, see tidyverse/tidyverse.org#121 and https://www.tidyverse.org/articles/2018/03/pillar-1-2-1/.

wlandau-lilly added a commit that referenced this issue Mar 2, 2018
...now that pillar 1.2.1 is out. Rleated: #247.
@wlandau
Copy link
Member Author

wlandau commented Aug 19, 2018

Update from R/Pharma 2018: if we do ever make a complete switch from text to language objects, substituteDirect() might help with the wildcard interface if we do not have the DSL yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants