Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directly modifying the data slot of S4 Spatial objects within functions confuses drake. #1130

Closed
3 tasks done
famuvie opened this issue Jan 9, 2020 · 6 comments
Closed
3 tasks done

Comments

@famuvie
Copy link

famuvie commented Jan 9, 2020

Prework

Description

When you directly modify the data slot of a S4 Spatial object (from package {sp}) within a function, drake does not update the target. Therefore, the target itself and all those that depend on it downstream remain outdated.

I'm not sure whether this is a problem only with {sp} objects, or it is a more general issue with S4 objects and slots.

I'm aware that directly accessing a slot is bad practice. The solution is simple: use slot() or $ (see repr. example). I just wanted to leave this here for reference for others and in case you want to take a look at the underlying problem. It took me a while to narrow down the issue.

Reproducible example

library(drake)
remotes:::local_sha('drake')
#> [1] "061d8d89ae98cafc8b6d1178d94da9ae2eebba79"
writeLines(
c(
"   library(drake)  ",
"   library(sp)  ",
"    ",
"   compute_area <- function(x) {  ",
"    ",
"      ## Do not access slots like this!    ",
"      x@data$cadmium <- 1 ",
"       ",
"      ## These are the safe ways, and they work fine with drake ",
"      # slot(x, 'data')$cadmium <- 1 ",
"      # x$cadmium <- 1 ",
"       ",
"      ## Yet, the issue do not arise with other slots. ",
"      ## For instance, the following modification works ",
"      # x@bbox <- x@bbox + 1 ",
"       ",
"      return(x)  ",
"   }  ",
"    ",
"   data(meuse)  ",
"    ",
"   plan <- drake_plan(  ",
"    meuse_sp = SpatialPointsDataFrame(  ",
"      coords = meuse[, c('x', 'y')],  ",
"      data = meuse  ",
"    ),  ",
"      ",
"    tgt = compute_area(meuse_sp)  ",
"   )  ",
"    ",
"   drake_config(plan)  "
),
"_drake.R"
)
r_make()
#> �[32mtarget�[39m meuse_sp
#> �[32mtarget�[39m tgt
r_outdated()
#> [1] "tgt"

Created on 2020-01-09 by the reprex package (v0.3.0)

Expected result

Ideally, r_outdated() should yield nothing. drake would execute the target and mark it as done.
But taking into account that correctly accessing the data slot does not trigger the issue, this can be safely closed.

Session info

Session info
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.2 (2019-12-12)
#>  os       Linux Mint 19.3             
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_GB                       
#>  collate  en_GB.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/Paris                
#>  date     2020-01-09                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version    date       lib
#>  assertthat    0.2.1      2019-03-21 [1]
#>  backports     1.1.4      2019-04-10 [1]
#>  base64url     1.4        2018-05-14 [1]
#>  callr         3.3.0      2019-07-04 [1]
#>  cli           1.1.0      2019-03-19 [1]
#>  crayon        1.3.4      2017-09-16 [1]
#>  desc          1.2.0      2018-05-01 [1]
#>  devtools      2.1.0      2019-07-06 [1]
#>  digest        0.6.23.2   2020-01-09 [1]
#>  drake       * 7.9.0.9000 2020-01-09 [1]
#>  evaluate      0.14       2019-05-28 [1]
#>  filelock      1.0.2      2018-10-05 [1]
#>  fs            1.3.1      2019-05-06 [1]
#>  glue          1.3.1      2019-03-12 [1]
#>  highr         0.8        2019-03-20 [1]
#>  htmltools     0.3.6      2017-04-28 [1]
#>  igraph        1.2.4.1    2019-04-22 [1]
#>  knitr         1.23       2019-05-18 [1]
#>  magrittr      1.5        2014-11-22 [1]
#>  memoise       1.1.0      2017-04-21 [1]
#>  pillar        1.4.2      2019-06-29 [1]
#>  pkgbuild      1.0.3      2019-03-20 [1]
#>  pkgconfig     2.0.2      2018-08-16 [1]
#>  pkgload       1.0.2      2018-10-29 [1]
#>  prettyunits   1.0.2      2015-07-13 [1]
#>  processx      3.4.0      2019-07-03 [1]
#>  ps            1.3.0      2018-12-21 [1]
#>  R6            2.4.0      2019-02-14 [1]
#>  Rcpp          1.0.1      2019-03-17 [1]
#>  remotes       2.1.0      2019-06-24 [1]
#>  rlang         0.4.0      2019-06-25 [1]
#>  rmarkdown     1.14       2019-07-12 [1]
#>  rprojroot     1.3-2      2018-01-03 [1]
#>  sessioninfo   1.1.1      2018-11-05 [1]
#>  storr         1.2.1      2018-10-18 [1]
#>  stringi       1.4.3      2019-03-12 [1]
#>  stringr       1.4.0      2019-02-10 [1]
#>  testthat      2.1.1      2019-04-23 [1]
#>  tibble        2.1.3      2019-06-06 [1]
#>  txtq          0.1.3      2019-06-23 [1]
#>  usethis       1.5.1      2019-07-04 [1]
#>  vctrs         0.2.0      2019-07-05 [1]
#>  withr         2.1.2      2018-03-15 [1]
#>  xfun          0.8        2019-06-25 [1]
#>  yaml          2.2.0      2018-07-25 [1]
#>  zeallot       0.1.0      2018-01-28 [1]
#>  source                              
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  Github (eddelbuettel/digest@199be2e)
#>  Github (ropensci/drake@061d8d8)     
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#>  CRAN (R 3.6.2)                      
#> 
#> [1] /home/facu/lib/R/library
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
@wlandau-lilly
Copy link
Collaborator

Odd, I cannot reproduce the issue on either Mac OS or RHEL 7. Maybe you have an old cache somewhere in tempdir()?

library(drake)
remotes:::local_sha("drake")
#> [1] "061d8d89ae98cafc8b6d1178d94da9ae2eebba79"
writeLines(
  c(
    "   library(drake)  ",
    "   library(sp)  ",
    "    ",
    "   compute_area <- function(x) {  ",
    "    ",
    "      ## Do not access slots like this!    ",
    "      x@data$cadmium <- 1 ",
    "       ",
    "      ## These are the safe ways, and they work fine with drake ",
    "      # slot(x, 'data')$cadmium <- 1 ",
    "      # x$cadmium <- 1 ",
    "       ",
    "      ## Yet, the issue do not arise with other slots. ",
    "      ## For instance, the following modification works ",
    "      # x@bbox <- x@bbox + 1 ",
    "       ",
    "      return(x)  ",
    "   }  ",
    "    ",
    "   data(meuse)  ",
    "    ",
    "   plan <- drake_plan(  ",
    "    meuse_sp = SpatialPointsDataFrame(  ",
    "      coords = meuse[, c('x', 'y')],  ",
    "      data = meuse  ",
    "    ),  ",
    "      ",
    "    tgt = compute_area(meuse_sp)  ",
    "   )  ",
    "    ",
    "   drake_config(plan)  "
  ),
  "_drake.R"
)
r_make()
#> �[32mtarget�[39m meuse_sp
#> �[32mtarget�[39m tgt

r_outdated()
#> character(0)

r_make()
#> All targets are already up to date.

Created on 2020-01-09 by the reprex package (v0.3.0)

@famuvie
Copy link
Author

famuvie commented Jan 10, 2020

After all morning and thanks to Docker, I've found the issue.
It turns out that I have set in .Rprofile the CRAN mirror https://cran.univ-paris1.fr/ (since it is typically faster) which has an outdated version of {callr} (v3.3.0, instead of the latest released v3.4.0). This was the cause of the problem.

If you want to check, I pushed a Docker image from the following Dockerfile:

FROM rocker/tidyverse
RUN R -e "install.packages('sp')"
RUN R -e "devtools::install_version('callr', version = '3.3.0')"
RUN R -e "devtools::install_github('ropensci/drake')"

ADD _drake.R /home/rstudio/

You can easily reproduce in your browser if you have Docker installed:

docker pull facundom//drake_callr_data_issue
docker run -e PASSWORD=drake --rm -p 8787:8787 drake_data_issue

Browse to localhost:8787 and login with username rstudio and password drake and run

library(drake)
r_make()
r_outdated()

to reproduce.

Since this is related with an outdated version of another package, I don't think any further action is required. Sorry about that.

@wlandau
Copy link
Member

wlandau commented Jan 10, 2020

I never would have guessed, thanks for looking into it.

The solution is surprising at first glance, but one of the news items of callr 3.3.1 provides a clue:

r_session now avoids creating data and env objects in the global environment of the subprocess.

So if we use callr 3.3.0, a data object is in the global environment, and so drake thinks data is a dependency because the compute_area() function mentions x@data.

library(drake)

packageVersion("callr") 
#> [1] '3.3.0'

writeLines(
  c(
    "   library(drake)  ",
    "   library(sp)  ",
    "    ",
    "   compute_area <- function(x) {  ",
    "    ",
    "      ## Do not access slots like this!    ",
    "      x@data$cadmium <- 1 ",
    "       ",
    "      ## These are the safe ways, and they work fine with drake ",
    "      # slot(x, 'data')$cadmium <- 1 ",
    "      # x$cadmium <- 1 ",
    "       ",
    "      ## Yet, the issue do not arise with other slots. ",
    "      ## For instance, the following modification works ",
    "      # x@bbox <- x@bbox + 1 ",
    "       ",
    "      return(x)  ",
    "   }  ",
    "    ",
    "   data(meuse)  ",
    "    ",
    "   plan <- drake_plan(  ",
    "    meuse_sp = SpatialPointsDataFrame(  ",
    "      coords = meuse[, c('x', 'y')],  ",
    "      data = meuse  ",
    "    ),  ",
    "      ",
    "    tgt = compute_area(meuse_sp)  ",
    "   )  ",
    "    ",
    "   drake_config(plan)  "
  ),
  "_drake.R"
)
r_make()
#> �[32mtarget�[39m meuse_sp
#> �[32mtarget�[39m tgt

r_outdated()
#> [1] "tgt"

r_vis_drake_graph()

r_make()
#> �[32mtarget�[39m tgt

Created on 2020-01-10 by the reprex package (v0.3.0)

So this is indeed a bug in drake. The @ operator should be treated just like $ in the code analysis.

@wlandau
Copy link
Member

wlandau commented Jan 10, 2020

Minimal reprex:

drake::deps_code(quote(x@y))
#> # A tibble: 2 x 2
#>   name  type   
#>   <chr> <chr>  
#> 1 x     globals
#> 2 y     globals

Created on 2020-01-10 by the reprex package (v0.3.0)

Expected result:

#> # A tibble: 1 x 2
#>   name  type   
#>   <chr> <chr>  
#> 1 x     globals

@famuvie
Copy link
Author

famuvie commented Jan 10, 2020

So if we use callr 3.3.0, a data object is in the global environment, and so drake thinks data is a dependency because the compute_area() function mentions x@data.

Incredible. What a creative way of breaking things :)))

So this is indeed a bug in drake. The @ operator should be treated just like $ in the code analysis.

Glad it helped. At least I didn't just spend a day and a half without a purpose :)

@wlandau
Copy link
Member

wlandau commented Jan 10, 2020

You definitely made a contribution. Thanks for reporting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants