Specify resources within target #942

joelnitta · 2019-07-14T08:24:42Z

Prework

Read and abide by drake's code of conduct.
Search for duplicates among the existing issues, both open and closed.

Description

I am following the drake manual section '9.7.5 The resources column for transient workers' to run my plan on a cluster specifying memory and run times, etc. for targets. The example in the manual shows adding a resources column to the drake plan as a list. However, this is unwieldy and error-prone in the case of a large plan. I would like to be able to specify resources (as a named list) as a custom column with target(), with reasonable defaults instead of NA values (e.g., requesting only a small amount of memory).

This is what the manual shows (note that plan$resources is a list-column):

library(drake)

plan <- drake_plan(
  data = download_data(),
  model = big_machine_learning_model(data)
)

plan$resources <- list(
  list(cores = 1, gpus = 0),
  list(cores = 4, gpus = 1)
)

plan
#> # A tibble: 2 x 3
#>   target command                          resources       
#>   <chr>  <expr>                           <list>          
#> 1 data   download_data()                  <named list [2]>
#> 2 model  big_machine_learning_model(data) <named list [2]>

Here is an example of what I would like to be able to do:

library(drake)

# Define plan
plan <- drake_plan(
  
  data = read_csv("https://bit.ly/ppgi_taxonomy"),
  
  # As a test, change memory settings for a single target
  data_slice_1 = target(
    slice(data, 1:10),
    resources = list(
      queue = "mThC.q",
      memory = "mres=2G,h_data=2G,h_vmem=2G")
  ),
  
  data_slice_2 = slice(data, 11:20),
  
  data_out_1 = write_csv(data_slice_1, file_out("data1.csv")),
  
  data_out_2 = write_csv(data_slice_2, file_out("data2.csv"))
  
)

plan
#> # A tibble: 5 x 3
#>   target       command                                        resources 
#>   <chr>        <expr>                                         <list>    
#> 1 data         read_csv("https://bit.ly/ppgi_taxonomy")       <lgl [1]> 
#> 2 data_slice_1 slice(data, 1:10)                              <language>
#> 3 data_slice_2 slice(data, 11:20)                             <lgl [1]> 
#> 4 data_out_1   write_csv(data_slice_1, file_out("data1.csv")) <lgl [1]> 
#> 5 data_out_2   write_csv(data_slice_2, file_out("data2.csv")) <lgl [1]>

In this case, plan$resources is again a list-column, but resources for the target that I tried to specify resources for is a language object, not a named list.

I tried to get this to work anyways by filling the rest of the NAs in plan$resources with a loop:

# Set default memory settings to lowest (short run time, 1 Gb memory)
for (i in 1:nrow(plan)) {
  if(is.na(plan$resources[i]) == TRUE) {
    plan$resources[i] <- list(
      list(queue = "sThC.q", memory = "mres=1G")
    )
  }
}

plan
#> # A tibble: 5 x 3
#>   target       command                                      resources      
#>   <chr>        <expr>                                       <list>         
#> 1 data         read_csv("https://bit.ly/ppgi_taxonomy")   … <named list [2…
#> 2 data_slice_1 slice(data, 1:10)                          … <language>     
#> 3 data_slice_2 slice(data, 11:20)                         … <named list [2…
#> 4 data_out_1   write_csv(data_slice_1, file_out("data1.csv… <named list [2…
#> 5 data_out_2   write_csv(data_slice_2, file_out("data2.csv… <named list [2…

When I tried to run this with future.batchtools, I got the following error:

target data_slice_1
Error in batchtools::submitJobs(reg = reg, ids = jobid, resources = resources) : 
  Assertion on 'resources' failed: Must have names, but element 1 is empty.

Indeed, running names() on plan$resources shows this:

purrr::map(plan$resources, names)
#> [[1]]
#> [1] "queue"  "memory"
#> 
#> [[2]]
#> [1] ""       "queue"  "memory"
#> 
#> [[3]]
#> [1] "queue"  "memory"
#> 
#> [[4]]
#> [1] "queue"  "memory"
#> 
#> [[5]]
#> [1] "queue"  "memory"

For now I'm probably just going to manually tweak the resources column after creating the entire plan, since there aren't that many targets that need special treatment. But I think being able to specify it on-the-fly within target() would be nice.

^{Created on 2019-07-14 by the reprex package (v0.2.1)}

The text was updated successfully, but these errors were encountered:

wlandau · 2019-07-14T11:40:03Z

Actually, this is already possible. You can define any custom column with target(). Unfortunately, drake_plan() seems to interpret lists as language objects, but once I fix that, it will be totally seamless.

library(drake)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)

plan <- drake_plan(
  data = target(
    download_data(),
    resources = list(cores = 1, gpus = 0)
  ),
  model = target(
    big_machine_learning_model(data),
    resources = list(cores = 4, gpus = 1)
  )
)

plan
#> # A tibble: 2 x 3
#>   target command                          resources                
#>   <chr>  <expr>                           <expr>                   
#> 1 data   download_data()                  list(cores = 1, gpus = 0)
#> 2 model  big_machine_learning_model(data) list(cores = 4, gpus = 1)

plan <- plan %>%
  mutate(resources = map(resources, eval))

plan
#> # A tibble: 2 x 3
#>   target command                          resources       
#>   <chr>  <expr>                           <list>          
#> 1 data   download_data()                  <named list [2]>
#> 2 model  big_machine_learning_model(data) <named list [2]>

plan$resources
#> [[1]]
#> [[1]]$cores
#> [1] 1
#> 
#> [[1]]$gpus
#> [1] 0
#> 
#> 
#> [[2]]
#> [[2]]$cores
#> [1] 4
#> 
#> [[2]]$gpus
#> [1] 1

^{Created on 2019-07-14 by the reprex package (v0.3.0)}

I should probably update the manual too.

wlandau · 2019-07-14T13:20:16Z

Thanks for bringing up this use case. #943 should make it easier to define custom resources.

@joelnitta

cc @joelnitta

wlandau added type: bug topic: api and removed type: bug labels Jul 14, 2019

wlandau mentioned this issue Jul 14, 2019

Interpret custom columns as non-language objects #943

Merged

4 tasks

wlandau closed this as completed in b40b534 Jul 14, 2019

wlandau pushed a commit to ropensci-books/drake that referenced this issue Jul 14, 2019

Document ropensci/drake#942

559ccaf

cc @joelnitta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify resources within target #942

Specify resources within target #942

joelnitta commented Jul 14, 2019

wlandau commented Jul 14, 2019

wlandau commented Jul 14, 2019

Specify resources within target #942

Specify resources within target #942

Comments

joelnitta commented Jul 14, 2019

Prework

Description

wlandau commented Jul 14, 2019

wlandau commented Jul 14, 2019