diff --git a/DESCRIPTION b/DESCRIPTION index cadfd18..afa3101 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -12,6 +12,8 @@ Encoding: UTF-8 Roxygen: list(markdown = TRUE) RoxygenNote: 7.2.3 Suggests: + knitr, + rmarkdown, testthat (>= 3.0.0) Config/testthat/edition: 3 Imports: @@ -25,3 +27,4 @@ Imports: gh Depends: R (>= 3.5.0) +VignetteBuilder: knitr diff --git a/vignettes/.gitignore b/vignettes/.gitignore new file mode 100644 index 0000000..097b241 --- /dev/null +++ b/vignettes/.gitignore @@ -0,0 +1,2 @@ +*.html +*.R diff --git a/vignettes/faq.Rmd b/vignettes/faq.Rmd new file mode 100644 index 0000000..786485d --- /dev/null +++ b/vignettes/faq.Rmd @@ -0,0 +1,219 @@ +--- +title: "FAQ" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{FAQ} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +# General questions + +**GQ1: Tell me in 10 lines how this package works** + +**GA1:** Get the dependency graph of several R packages on CRAN or Github at a specific snapshot date(time) + +```r +graph <- resolve(c("crsh/papaja", "rio"), snapshot_date = "2019-07-21") +``` + +Dockerize the dependency graph to a directory + +```r +dockerize(graph, output_dir = "rangtest") +``` + +You can build the Docker image either by the R package `stevedore` or Docker CLI client. We use the CLI client. + +```sh +cd rangtest +docker build -t rangtest . ## might need sudo +``` + +Launch the container with the built image + +```sh +docker run --rm --name "rangcontainer" -ti rangtest +``` + +And the tenth line is not needed. + +**GQ2: For running `resolve()`, how do I know which packages are used in a project?** + +**GA2:** We recommend `renv::dependencies()`. + +Suppose in a directory called "project" there are two R files: + +```r +here::here() +``` + +```r +library(rio) +x <- import("hello.csv") +``` + +Running this reveals + +```{r include = FALSE} +## x <- tempfile() +## dir.create(x) +## writeLines(c("library(rio)", "here::here()"), file.path(x, "fake.R")) +## y <- renv::dependencies(x) +## y$Source <- c("myproject/1.R", "myproject/2.R") +## y +``` + +
renv::dependencies("project")
+
#> Finding R package dependencies ... Done!
+#>          Source Package Require Version   Dev
+#> 1 myproject/1.R    here                 FALSE
+#> 2 myproject/2.R     rio                 FALSE
+ +**GQ3: Why is the R script generated by `dockerize()` and `export_rang()` so strange/unidiomatic/inefficient/did you guys read `fortunes::fortune("answer is parse")`?** + +**GA3:** It is because we optimize the R code in `rang.R` for backward compatibility. We need to make sure that the code runs well in vanilla R environments since 2.1.0. + +**GQ4: Why doesn't `rang` support R < 2.1.0 yet?** + +**GA4:** It is because installing source packages from within R was introduced in R 2.1.0. Before that one needed to install with `R CMD INSTALL`. But we are working on supporting R in the 1.x series. + +**GQ5: Does `rang.R` (generated by `export_rang()` or `dockerize()`) run on non-Linux OSes?** + +**GA5:** Theoretically speaking, yes. But strongly not recommended. If the system requirements are fulfilled, `rang.R` should probably run fine on OS X if the R packages do not contain compiled code. C and Fortran compilers are needed if it is the case. See [this entry](https://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#Installation-of-source-packages) in R Mac OS X FAQ. On Windows, installing Github packages requires properly set up PATH and `tar`. Similarly, R packages with compiled code requires compilers. See [this entry](https://cran.r-project.org/bin/windows/base/rw-FAQ.html#Can-I-install-packages-into-libraries-in-this-version_003f) in R for Windows FAQ. + +**GQ6: What are the caveats of using rang?** + +**GA6:** Many + +* `rang` does not support reconstructing computational environments with R < 2.1.0 (i.e. `snapshot_date` < "2005-04-19 09:01") yet +* `dockerize()` can only generate Debian/Ubuntu-based Docker images +* `dockerize(cache = TRUE)` does not cache [R source code](https://cran.r-project.org/src/base/) (yet) and System Requirements (in `deb` packages) +* `query_sysreqs()` (as well as `resolve(query_sysreqs = TRUE)`) queries for System Requirements based on the latest version of the packages on CRAN / Github. Therefore: + * Archived CRAN packages are assumed to have no System Requirements + * R Packages with changed System Requirements between `snapshot_date` and the date of running `resolve()` might produce incorrect System Requirements +* A result from `resolve()` with R version < 3.1 and has at least one Github package must be dockerized with caching (i.e. `dockerize(cache = TRUE)`). It is because the outdated version of Debian cannot communicate with the Github API +* R packages on Github (likely) or CRAN (very not likely) might not be available in the near future. But one can cache the packages (`dockerize(cache = TRUE)`). +* The Rocker project and its host Docker Hub might not be available in the near future (not likely) +* Ubuntu / Debian archives (for System Requirements) might not be available in the future (super not likely) + +**GQ7: `rang` depends on R >= 3.5.0. Several of the dependencies depend on many modern R packages. How dare you claiming your package supports R >= 2.1.0?** + +**GA7:** To clarify, it is true that `resolve()` and `dockerize()` depend on many factors. But the reconstruction process (if with caching of R packages) depends only on the availability of Docker images from Docker Hub, availability of R source code on CRAN (R < 3.1.0), and `deb` packages from Ubuntu and Debian in the future. If you don't believe in all of these, see also: DQ4. + +**GQ8: What are the data sources of `resolve()`?** + +**GA8:** Several + +* Dependencies / R version / System Requirements: r-hub APIs [pkgsearch](https://r-hub.github.io/pkgsearch/) [r-versions](https://api.r-hub.io/rversions) [sysreqs](https://sysreqs.r-hub.io/) +* Github: [Github API](https://docs.github.com/en/rest) +* Dependencies of Github packages: [Package manager from Posit](https://github.com/rstudio/r-system-requirements#operating-systems) + +**GQ9: I am not convinced by this package. What are the alternatives?** + +**GA9:** If you don't consider the Dockerization part of `rang`, the date-based pinning of R packages can be done by: + +* Using [Package Manager](https://packagemanager.rstudio.com/) + +```r +library(pak) +options(repos = c(REPO_NAME = "https://packagemanager.rstudio.com/cran/2019-07-21")) +pkg_install("rio") +pkg_install("crsh/papaja") +``` + +* Using [groundhog](https://groundhogr.com/) + +```r +library(groundhog) +pkgs <- c("rio","crsh/papaja") +groundhog.library(pkgs, "2019-07-21") +``` + +If you don't consider the date-based pinning of R packages, the Dockerization can be done by: + +* Using [containerit](https://github.com/o2r-project/containerit) [not on CRAN] + +```r +library(containerit) +## combine with Package Manager to pin packages by date +install.packages("rio") +remotes::install_github("crsh/papaja") +library(rio) +library(papaja) +print(containerit::dockerfile(from = utils::sessionInfo())) +``` + +* Using [dockerfiler](https://cran.r-project.org/web/packages/dockerfiler/index.html) + +```r +library(dockerfiler) +my_dock <- Dockerfile$new() +## combine with Package Manager to pin packages by date +my_dock$RUN(r(install.packages(c("remotes", "rio")))) +my_dock$RUN(r(remotes::install_github("crsh/papaja"))) +my_dock +``` + +# Docker questions + +**DQ1: Is Docker an overkill to simply ensure that a few lines of R code are reproducible?** + +**DA1:** It might be the case for recent R code, e.g. R >= 3.0 (or `snapshot_date` > "2013-04-03 09:10"). But we position `rang` as an archaeological tool to run really old R code (`snapshot_date` >= "2005-04-19 09:01", but see GQ4). For this, Docker is essential because R in the 2.x (or even 1.x in future) series might not be installable anymore in a non-virtualized environment. + +According to [The Turing Way](https://the-turing-way.netlify.app/reproducible-research/compendia.html), a research compendium that aids computational reproducibility should contain a complete description of the computational environment. The directory exported by `dockerize()`, especially when `materials_dir` and `cache` were used, can be directly shared as a research compendium. + +**DQ2: How do I access bash instead of R?** + +**DA2:** By default, containers launched with the images generated by `rang` goes to R. One can override this by launching the container with an alternative entry point. + +Suppose an image was built as per GA1. + +```sh +docker run --rm --name "rangcontainer" --entrypoint bash -ti rangtest +``` + +**DQ3: How do I copy files from and to a launched container?** + +**DA3:** Again an image was built as per GA1 and launched as below + +```sh +docker run --rm --name "rangcontainer" -ti rangtest +``` + +```sh +# probably you need to run this from another terminal +docker cp rangcontainer:/rang.R rang2.R +docker cp rang2.R rangcontainer:/rang2.R +``` + +We want to emphasize here that launching a container with `--name` is useful because the name of the container is randomly generated when `--name` was not used to launch it. It is also important to remind you that a relaunched container goes back to the initial state, any file generated inside the container will be removed. So use `docker cp` to copy any artifact if one wants to preserve any artifact. + +**DQ4: How do I back up an image?** + +**DA4:** If you don't believe Docker Hub / Debian / Ubuntu would be available forever, you may back up the generated image. + +```sh +docker save rangtest | gzip > rangtest.tar.gz +``` + +You can also share the back up (usually < 1G, depending on the size of `materials_dir`, thus sharable on Zenodo). + +To restore the backup image: + +```sh +docker load < rangtest.tar.gz +``` + +And launch a container the same way + +```sh +docker run --rm --name "rangcontainer" -ti rangtest +```