Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R6 object keeps rebuilding #345

Closed
bart1 opened this issue Mar 28, 2018 · 11 comments
Closed

R6 object keeps rebuilding #345

bart1 opened this issue Mar 28, 2018 · 11 comments

Comments

@bart1
Copy link

bart1 commented Mar 28, 2018

First of all thanks for all the good work and quick bug fixes! For some simulations I work with R6 classes, it is a convenient way to keep track of a lot of things in individual based simulations. I encountered issues where targets are rebuild while nothing changed. I tried to make a small reproducible example (here target fenv2 is also rebuild on the second call to make):

require(R6)                                                                                         
FoodEnvironment <- R6Class(                                                                         
   "FoodEnvironment",                                                                                
  private = list(foodItemList = list()),                                                            
  public = list(                                                                                    
    initialize = function(foodDensity,                                                              
                          arenaExtent                                                               
    ) {                                                                       
      private$foodItemList = replicate(                                                           
        round((arenaExtent * 2) ^ 2 * foodDensity),                                               
        runif(2)*arenaExtent*2-arenaExtent, simplify=FALSE                                        
      )                                                                                           
    }
  )                                                                                                 
)                                                                                                   
require(drake)                                                                                      
tmpf<-                                                                                              
  function(foodDensity, arenaExtent) replicate(                                               
    round((arenaExtent * 2) ^ 2 * foodDensity),                                               
    runif(2)*arenaExtent*2-arenaExtent, simplify=FALSE                                        
  )                                                                                           
dd<-drake_plan(                                                                                     
  fenv2=FoodEnvironment$new(.1, 40),                                                   
  fenv=tmpf(.1, 40)                                                                    
)                                                                                    
cache<-recover_cache('asfiiiiif')                                                                      
drake::make(dd, cache=cache, verbose=4)                                                             
drake::make(dd, cache=cache, verbose=4)  

This gives the following results:

> source('~/.active-rstudio-document')
Loading required package: R6
Loading required package: drake
connect 4 imports: FoodEnvironment, tmpf, cache, dd
connect 2 targets: fenv2, fenv
check 4 items: FoodEnvironment, replicate, round, runif
import FoodEnvironment
import replicate
import round
import runif
check 1 item: tmpf
import tmpf
check 2 items: fenv, fenv2
target fenv
target fenv2
Unloading targets from environment:
  fenv2
  fenv
connect 4 imports: FoodEnvironment, tmpf, cache, dd
connect 2 targets: fenv2, fenv
check 4 items: FoodEnvironment, replicate, round, runif
import FoodEnvironment
import replicate
import round
import runif
check 1 item: tmpf
import tmpf
check 2 items: fenv, fenv2
target fenv2
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 17.10

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=de_DE.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] drake_5.1.0 R6_2.2.2   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16       compiler_3.4.4     pillar_1.2.1       formatR_1.5        plyr_1.8.4         bindr_0.1.1        R.methodsS3_1.7.1 
 [8] R.utils_2.6.0      tools_3.4.4        testthat_2.0.0     digest_0.6.15      jsonlite_1.5       lubridate_1.7.3    evaluate_0.10.1   
[15] tibble_1.4.2       pkgconfig_2.0.1    rlang_0.2.0        igraph_1.2.1       rstudioapi_0.7     yaml_2.1.18        parallel_3.4.4    
[22] withr_2.1.2        storr_1.1.3        stringr_1.3.0      knitr_1.20         htmlwidgets_1.0    CodeDepends_0.5-3  globals_0.11.0    
[29] rprojroot_1.3-2    tidyselect_0.2.4   glue_1.2.0         listenv_0.7.0      future.apply_0.1.0 XML_3.98-1.10      purrr_0.2.4       
[36] magrittr_1.5       htmltools_0.3.6    backports_1.1.2    codetools_0.2-15   future_1.7.0       stringi_1.1.7      visNetwork_2.0.3  
[43] crayon_1.3.4       R.oo_1.21.0       
@wlandau wlandau changed the title Target keeps rebuilding R6 object keeps rebuilding Mar 28, 2018
@wlandau
Copy link
Member

wlandau commented Mar 28, 2018

From the dependency graph, fenv2 only depends on FoodEnvironment.

capture

And as it turns out, when you define a new R6 object, something in the R6 class itself actually changes. @wch, is this the intended behavior?

library(digest)  # For fingerprinting objects.
digest(FoodEnvironment, algo = "xxhash64")
#> [1] "67082b8a867db411"
fenv2 <- FoodEnvironment$new(0.1, 40)
digest(FoodEnvironment, algo = "xxhash64")
#> [1] "b313eafc964c8ea9"

The hashes above are different, so drake really is doing what it is supposed to do. In your case, you may unfortunately need to use a different trigger.

> dd$trigger <- c("missing", "any")
> dd
# A tibble: 2 x 3
  target command                              trigger
  <chr>  <chr>                                <chr>  
1 fenv2  ignore(FoodEnvironment$new)(0.1, 40) missing
2 fenv   tmpf(0.1, 40)                        any 

You could even ignore() the call to FoodEnvironment$new in the command,.

> dd <- drake_plan(fenv2 = ignore(FoodEnvironment$new)(.1, 40))
> make(dd)
target fenv2 
> make(dd)
Unloading targets from environment:
   fenv2
All targets are already up to date.

But both options undermine reproducibility.

@wlandau wlandau closed this as completed Mar 28, 2018
@wch
Copy link

wch commented Mar 28, 2018

R does some record-keeping when functions are executed. I think it has something to do with the bytecode compiler. In general this means using digest() on an object that contains functions may not behave as you are expecting.

For example:

library(digest)

f <- function() {
  1+1
}


f
digest(f)


f()
f
digest(f)


f()
f
digest(f)


f()
f
digest(f)

Output:


> library(digest)
> f <- function() {
+   1+1
+ }
> f
function() {
  1+1
}
> digest(f)
[1] "36427c8b493b9efe3ec7733ec53ca10f"
> f()
[1] 2
> f
function() {
  1+1
}
> digest(f)
[1] "ccc484b93c302bfb4379318132ea4dcc"
> f()
[1] 2
> f
function() {
  1+1
}
<bytecode: 0x7fc77b1272a8>
> digest(f)
[1] "2f1266fce29d2157bf9851c77810331b"
> f()
[1] 2
> f
function() {
  1+1
}
<bytecode: 0x7fc77b1272a8>
> digest(f)
[1] "2f1266fce29d2157bf9851c77810331b"

You can use .Internal(inspect(f)) if you want to find out more about the internals.

@wlandau
Copy link
Member

wlandau commented Mar 28, 2018

Thanks Winston, that helps a lot. Fortunately, when drake detects a function dependency, it deparses the function before hashing it. But for objects containing functions, drake may have to adapt after all. Off the top of my head, I am not sure how.

@bart1
Copy link
Author

bart1 commented Mar 28, 2018

Looking at the hashes and indeed trying around one work around seems to be to call the functions 3 times before starting building with drake. Adding this seems to make drake work as expected

FoodEnvironment$new(1,2)
FoodEnvironment$new(1,2)
FoodEnvironment$new(1,2)
drake::make(dd, cache=cache, verbose=4)                                                             
drake::make(dd, cache=cache, verbose=4)  

@bart1
Copy link
Author

bart1 commented Mar 28, 2018

Another quick solution is to compile the function before hand

# this re runs the function
f<-function() 1+1
b<-list('a'=f)
plan<-drake_plan(sim=b[[1]]())
make(plan)
make(plan)
# this works fine
f<-compiler::cmpfun(function() 1+1)
b<-list('a'=f)
plan<-drake_plan(sim=b[[1]]())
make(plan)
make(plan)

@wlandau
Copy link
Member

wlandau commented Mar 29, 2018

As far as drake is concerned, I think the best we can do is document this edge case in the caution vignette (see 1ae3b84).

@wlandau
Copy link
Member

wlandau commented Apr 27, 2018

Reopening. Just realizing that sometimes there's just no way around this one if you absolutely have to use a certain toolkit. Looking for a way to represent an arbitrary R object that more stable under hashing.

In the case of functions, .Internal(inspect(f)) is really long for the first two calls to f() and then shortens down a lot when f() is called three times or more.

f <- function(){}
.Internal(inspect(f))
f()
.Internal(inspect(f))
f()
.Internal(inspect(f))
f()
.Internal(inspect(f))
f()
.Internal(inspect(f))
f()

I do not understand what it means yet. Like I said, drake tracks only the deparsed versions of functions, but it does not have a good solution for objects containing functions.

@wlandau wlandau reopened this Apr 27, 2018
@wch
Copy link

wch commented Apr 27, 2018

A few thoughts: The source of the problem is that serialize() is too exact. You want something that is a little less precise, like identical() (but of course that function doesn't serialize the object).

For example, see below. (Note that identical(,,F,F,F,F) "pickily tests for exact equality" according to the documentation.)

f <- function() { 1 + 1 }
g <- function() { 1 + 1 }

identical(f, g)                             # TRUE
identical(digest(f), digest(g))             # FALSE

f()
identical(f, g)                             # TRUE
identical(digest(f), digest(g))             # FALSE

f()
identical(f, g)                             # TRUE
identical(digest(f), digest(g))             # FALSE

I believe the publicly-accessible bytecode compiler functions don't provide the same control as what happens in R. For example, if f and g refer to the same object, when you run f() a few times, it also mutates g. Compiling the function in place would be nice for this, but I don't think it's possible with cmpfun.

f <- function() { 1 + 1 }
g <- f

identical(f, g)                             # TRUE
identical(digest(f), digest(g))             # TRUE

f()
identical(f, g)                             # TRUE
identical(digest(f), digest(g))             # TRUE

f()
identical(f, g)                             # TRUE
identical(digest(f), digest(g))             # TRUE

To really solve the problem, you may need to write your own serialization function, but that's definitely a non-trivial task.

@wlandau
Copy link
Member

wlandau commented Apr 29, 2018

I see. With a less exact serialization method, we could just call digest(serialize = FALSE) afterwards and the rest would be easy. I agree on both the need and the difficulty.

@wlandau
Copy link
Member

wlandau commented Jun 21, 2018

Update: I just posted a Stack Overflow question here. Maybe someone has a clever workaround we have not considered.

@wlandau
Copy link
Member

wlandau commented Jun 22, 2018

Convenient workaround: just wrap the R6 class definition inside a function. I do not know why I missed this before.

library(drake)
library(R6)
clean()
new_circle <- function(radius){
  circle_class <- R6Class(
    "circle_class",
    private = list(radius = NULL),
    public = list(
      initialize = function(radius){
        private$radius <- radius
      },
      area = function(){
        pi * private$radius ^ 2
      }
    )
  )
  circle_class$new(radius = radius)
}
plan <- drake_plan(
  circle = new_circle(radius = 10),
  area = circle$area()
)
make(plan)
#> target circle
#> target area
make(plan)
#> All targets are already up to date.

I will reopen this issue if more robust serialization seems within reach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants