Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interoperability with R and renv #55

Closed
4 tasks done
chainsawriot opened this issue Feb 16, 2023 · 30 comments
Closed
4 tasks done

interoperability with R and renv #55

chainsawriot opened this issue Feb 16, 2023 · 30 comments
Assignees
Labels

Comments

@chainsawriot
Copy link
Collaborator

chainsawriot commented Feb 16, 2023

The pkgs argument of resolve() should accept:

The rang S3 object should be exportable as:

  • renv lockfile
@chainsawriot
Copy link
Collaborator Author

self tag sessionInfo() @chainsawriot

@schochastics
Copy link
Member

My suggestion for the guesstimation #53 would be to use it if pkg and/or snapshot is missing. Snapshot also only makes sense if material_dir is not missing? Or of course if pkg is something else than a vector of packages

@chainsawriot
Copy link
Collaborator Author

@schochastics Probably like this:

pkgs snapshot_date materials_dir Proposed v 0.2.0 default Is v0.1.0 default?
NULL NULL NULL sessionInfo(), snapshot_date <- 30 days ago no
NULL NULL Yes Guesstimate pkgs and snapshot_date from materials_dir no
NULL Yes Yes Guesstimate pkgs no
Yes NULL NULL snapshot_date <- 30 days ago yes
Yes NULL Yes snapshot_date <- 30 days ago yes

I am still not so sure whether we should make sessionInfo() or renv::dependecies(".")$packages the triple-NULL default.

@schochastics
Copy link
Member

schochastics commented Feb 17, 2023

I think sessionInfo() is a safer option. This probably will include packages that are not needed, but I for instance have a random Rscript folder with 100s of *.R files. I woulnd't want to use renv::dependecies(".")$packages there.

Otherwise I think the table is good

@hadley
Copy link

hadley commented Feb 17, 2023

I think you'd be better off using renv::dependencies(); no, it won't capture every single package you might have used, but it will capture the vast majority of them, and IMO that's better than including a snapshot with hundreds of files in it.

My expectation is that in the cases where you use rang, you'd going to need to do a little checking afterwards (because you're original state is likely to be a mix of packages installed on different dates), so that IMO the goal of resolve() should be a solid 95% effort that you can then tweak as needed.

@schochastics
Copy link
Member

hmm maybe I am missing something but the default behavior of renv::dependencies() is recursive, right? So if I work in
my home directory (which I obviously shouldn't do anyway) then it would basically grab all R files on my hard drive? I think renv::dependencies() as a standard makes sense if working in a project directory, but we cannot assume that every user does this. Unless, as I said, I am missing something here.

@hadley
Copy link

hadley commented Feb 17, 2023

I think using project directories is table stakes for reproducibility.

@schochastics
Copy link
Member

totally agree!

@schochastics
Copy link
Member

@chainsawriot just realized material_dir is not even a parameter for resolve()...
maybe pkgs="." (or any path) should invoke the guesstimation?

@chainsawriot
Copy link
Collaborator Author

chainsawriot commented Feb 17, 2023

@schochastics You are right that resolve() doesn't have materials_dir. Of course, we can add materials_dir to resolve() for v0.2. But that's kind of weird.

My concern is #57. We haven't implemented the complete specification of pkgrefs. https://r-lib.github.io/pkgdepends/reference/pkg_refs.html

If we really need to implement the complete spec. of pkgrefs, pkgs = "." should be interpreted as a (shorthand of a) local package in the current directory; not scanning for R packages in the current directory. But probably, this is not the common use case of resolve(). I think this should be the better default.

## guesstimate; make `paths_as_local_packages` default to FALSE
resolve(pkgs = ".", paths_as_local_packages = FALSE) #default
resolve(pkgs = c(".", "~/anotherdirectory")) ## scan two directories

## interpret paths as local packages
resolve(pkgs = ".", paths_as_local_packages = TRUE)
resolve(pkgs = c(".", "~/anotherdirectory"), paths_as_local_packages = TRUE)
resolve(pkgs = "local::.", paths_as_local_packages = TRUE)
resolve(pkgs = c("local::.", "~/anotherdirectory"), paths_as_local_packages = FALSE)

@schochastics
Copy link
Member

This looks sensible to me.

chainsawriot added a commit that referenced this issue Feb 19, 2023
@chainsawriot
Copy link
Collaborator Author

chainsawriot commented Feb 19, 2023

The renv scanning can now be done as as_pkgrefs.data.frame(). But we can find another way.

@schochastics
Copy link
Member

I will start experimenting a bit

@schochastics
Copy link
Member

first idea. renv also creates an S3 object called renv_lockfile internally so I thought we can use that too.

as_pkgrefs.renv_lockfile <- function(x, ...){
  sources <- vapply(lockfile[["Packages"]],`[[`,character(1),"Source",USE.NAMES = FALSE)
  pkgs <- c()
  if("Repository"%in%sources){
    pkgs <- c(pkgs, paste0("cran::",vapply(lockfile[["Packages"]][sources=="Repository"],`[[`,character(1),"Package",USE.NAMES = FALSE)))
  }
  if("Bioconductor"%in%sources){
    pkgs <- c(pkgs,paste0("bioc::",vapply(lockfile[["Packages"]][sources=="Bioconductor"],`[[`,character(1),"Package",USE.NAMES = FALSE)))
  }
  if("GitHub"%in%sources){
    pkgs <- c(pkgs,
              paste0("github::",
                     vapply(lockfile[["Packages"]][sources=="GitHub"],`[[`,character(1),"RemoteUsername",USE.NAMES = FALSE),"/",
                     vapply(lockfile[["Packages"]][sources=="GitHub"],`[[`,character(1),"Package",USE.NAMES = FALSE))
    )
  }
  pkgs
}

f <- "renv.lock"
cat(readLines(f),sep = "\n")
#> {
#>   "R": {
#>     "Version": "4.2.2",
#>     "Repositories": [
#>       {
#>         "Name": "CRAN",
#>         "URL": "https://cloud.r-project.org"
#>       }
#>     ]
#>   },
#>   "Packages": {
#>     "BiocGenerics": {
#>       "Package": "BiocGenerics",
#>       "Version": "0.44.0",
#>       "Source": "Bioconductor",
#>       "git_url": "https://git.bioconductor.org/packages/BiocGenerics",
#>       "git_branch": "RELEASE_3_16",
#>       "git_last_commit": "d7cd9c1",
#>       "git_last_commit_date": "2022-11-01",
#>       "Hash": "0de19224c2cd94f48fbc0d0bc663ce3b",
#>       "Requirements": []
#>     },
#>     "levelnet": {
#>       "Package": "levelnet",
#>       "Version": "0.5.0",
#>       "Source": "GitHub",
#>       "RemoteType": "github",
#>       "RemoteHost": "api.github.com",
#>       "RemoteRepo": "levelnet",
#>       "RemoteUsername": "schochastics",
#>       "RemoteRef": "HEAD",
#>       "RemoteSha": "775cf5e91b83cb73fe35e378ed1d7facb1d741eb",
#>       "Hash": "29eed562ec1c9bb7e31ce87e321b9252",
#>       "Requirements": [
#>         "Matrix",
#>         "Rcpp",
#>         "igraph"
#>       ]
#>     },
#>     "rtoot": {
#>       "Package": "rtoot",
#>       "Version": "0.3.0",
#>       "Source": "Repository",
#>       "Repository": "CRAN",
#>       "Hash": "06eb72de42a3f8fcb252badc58f92b2b",
#>       "Requirements": [
#>         "clipr",
#>         "curl",
#>         "dplyr",
#>         "httr",
#>         "jsonlite",
#>         "tibble"
#>       ]
#>     }
#>   }
#> }

lockfile <- jsonlite::fromJSON(f, simplifyVector = FALSE)
class(lockfile) <- "renv_lockfile"
as_pkgrefs.renv_lockfile(lockfile)
#> [1] "cran::rtoot"                   "bioc::BiocGenerics"           
#> [3] "github::schochastics/levelnet"

Created on 2023-02-24 with reprex v2.0.2

@schochastics
Copy link
Member

schochastics commented Feb 24, 2023

Would it make sense to "smartly" attach a class to each input?
SessionInfo is trivial since it is there. But all other inputs will be a character:
"rtoot","path/renv.lock","path/to/dir/for/renv_dependencies"

"path/renv.lock" -> class("renv_lock") implemented in #80
"path/to/dir/for/renv_dependencies" -> class("renv_directory")

then just do as_pkgrefs?

@schochastics
Copy link
Member

schochastics commented Feb 24, 2023

renv::dependencies() only returns package names. We can reconstruct if the package is from Bioc, but not the correct github repository

I guess we need to display a warning when renv::dependencies() is used that results may not be accurate

chainsawriot added a commit that referenced this issue Feb 26, 2023
add support for renv lockfiles (#55)
chainsawriot added a commit that referenced this issue Feb 26, 2023
package guesstimating wit renv::dependencies (#53, #55)
@schochastics
Copy link
Member

I am gonna try to work on the remainder of this issue, specifically the table and #55 (comment)

@chainsawriot
Copy link
Collaborator Author

@schochastics Given the current implementation, this case

resolve(pkgs = c(".", "~/anotherdirectory")) ## scan two directories

Should not be considered as scanning? I was thinking, maybe we don't need that paths_as_local_packages parameter.

Single directory, no local:: -> scan
Single directory, with local:: -> not scan
Multiple directories -> not scan

@schochastics
Copy link
Member

@chainsawriot hmm maybe for directories it actually makes sense to allow multiple scanning? I just went with the same as for lock files. Happy to change this behaviour if you think allowing to scan multiple directories at once is a desirable feature

@chainsawriot
Copy link
Collaborator Author

@schochastics I was thinking a possible use case. For example, I have two material dirs. And therefore, I might want to scan the two dirs. But the issue is, transferring two dirs into the Docker image is not possible with the current dockerize.

If we allow multiple scanning here, we should somehow also allow materials_dir (materials_dirs?) to be a vector of dirs.

@schochastics
Copy link
Member

@chainsawriot Then maybe lets implement both "vectorized". I can do that. Question is: this version or next?

@chainsawriot
Copy link
Collaborator Author

@schochastics Next ver. I think we have enough features on our plate already.

chainsawriot added a commit that referenced this issue Feb 27, 2023
set snapshot date depending on pkgs (#53,#55)
@schochastics
Copy link
Member

@chainsawriot working on the export to renv lockfile. Do you know where the hash value comes from? It is optional, but I'd try to include it

@chainsawriot
Copy link
Collaborator Author

chainsawriot commented Feb 28, 2023

@schochastics I didn't know. But a bit of UTSL indicates it is a hash of the DESCRIPTION file after readLines().

https://github.com/rstudio/renv/blob/d672d72d4d43fb4bca2c4ddfd03468e8f77eb7ee/tests/testthat/test-hash.R#L2-L8

As we don't have access to all DESCRIPTION files, maybe we should keep this optional. Unless you want to implement a strict mode that will download the DESCRIPTION files and hash them accordingly.

Current ver. CRAN packages have it available.

https://cran.r-project.org/web/packages/rtoot/DESCRIPTION

Archive ver. I need to figure out, but it is just downloading the tarball. (d2e34f9)

bioc, I don't know whether we can just hash a section of the big DESCRIPTION file.

GH and local (in the making) is probably straight forward.

@schochastics
Copy link
Member

hmm I'll think about it but I think for now i will leave the hash for the future.

@chainsawriot
Copy link
Collaborator Author

chainsawriot commented Mar 1, 2023

@schochastics
Copy link
Member

schochastics commented Mar 1, 2023

Doing another coding round tonight so will work it in

@schochastics schochastics self-assigned this Mar 2, 2023
chainsawriot added a commit that referenced this issue Mar 2, 2023
@chainsawriot
Copy link
Collaborator Author

@schochastics For this and #53 , the last thing to do is to make "." the default of resolve, which doesn't require tests.

@chainsawriot
Copy link
Collaborator Author

13f909d

@chainsawriot
Copy link
Collaborator Author

It's done. Thank you @schochastics for implementing this; and @hadley for the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants