diff --git a/DESCRIPTION b/DESCRIPTION index cadfd18..afa3101 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -12,6 +12,8 @@ Encoding: UTF-8 Roxygen: list(markdown = TRUE) RoxygenNote: 7.2.3 Suggests: + knitr, + rmarkdown, testthat (>= 3.0.0) Config/testthat/edition: 3 Imports: @@ -25,3 +27,4 @@ Imports: gh Depends: R (>= 3.5.0) +VignetteBuilder: knitr diff --git a/vignettes/.gitignore b/vignettes/.gitignore new file mode 100644 index 0000000..097b241 --- /dev/null +++ b/vignettes/.gitignore @@ -0,0 +1,2 @@ +*.html +*.R diff --git a/vignettes/faq.Rmd b/vignettes/faq.Rmd new file mode 100644 index 0000000..786485d --- /dev/null +++ b/vignettes/faq.Rmd @@ -0,0 +1,219 @@ +--- +title: "FAQ" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{FAQ} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +# General questions + +**GQ1: Tell me in 10 lines how this package works** + +**GA1:** Get the dependency graph of several R packages on CRAN or Github at a specific snapshot date(time) + +```r +graph <- resolve(c("crsh/papaja", "rio"), snapshot_date = "2019-07-21") +``` + +Dockerize the dependency graph to a directory + +```r +dockerize(graph, output_dir = "rangtest") +``` + +You can build the Docker image either by the R package `stevedore` or Docker CLI client. We use the CLI client. + +```sh +cd rangtest +docker build -t rangtest . ## might need sudo +``` + +Launch the container with the built image + +```sh +docker run --rm --name "rangcontainer" -ti rangtest +``` + +And the tenth line is not needed. + +**GQ2: For running `resolve()`, how do I know which packages are used in a project?** + +**GA2:** We recommend `renv::dependencies()`. + +Suppose in a directory called "project" there are two R files: + +```r +here::here() +``` + +```r +library(rio) +x <- import("hello.csv") +``` + +Running this reveals + +```{r include = FALSE} +## x <- tempfile() +## dir.create(x) +## writeLines(c("library(rio)", "here::here()"), file.path(x, "fake.R")) +## y <- renv::dependencies(x) +## y$Source <- c("myproject/1.R", "myproject/2.R") +## y +``` + +
::dependencies("project") renv
#> Finding R package dependencies ... Done!
+#> Source Package Require Version Dev
+#> 1 myproject/1.R here FALSE
+#> 2 myproject/2.R rio FALSE
+
+**GQ3: Why is the R script generated by `dockerize()` and `export_rang()` so strange/unidiomatic/inefficient/did you guys read `fortunes::fortune("answer is parse")`?**
+
+**GA3:** It is because we optimize the R code in `rang.R` for backward compatibility. We need to make sure that the code runs well in vanilla R environments since 2.1.0.
+
+**GQ4: Why doesn't `rang` support R < 2.1.0 yet?**
+
+**GA4:** It is because installing source packages from within R was introduced in R 2.1.0. Before that one needed to install with `R CMD INSTALL`. But we are working on supporting R in the 1.x series.
+
+**GQ5: Does `rang.R` (generated by `export_rang()` or `dockerize()`) run on non-Linux OSes?**
+
+**GA5:** Theoretically speaking, yes. But strongly not recommended. If the system requirements are fulfilled, `rang.R` should probably run fine on OS X if the R packages do not contain compiled code. C and Fortran compilers are needed if it is the case. See [this entry](https://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#Installation-of-source-packages) in R Mac OS X FAQ. On Windows, installing Github packages requires properly set up PATH and `tar`. Similarly, R packages with compiled code requires compilers. See [this entry](https://cran.r-project.org/bin/windows/base/rw-FAQ.html#Can-I-install-packages-into-libraries-in-this-version_003f) in R for Windows FAQ.
+
+**GQ6: What are the caveats of using rang?**
+
+**GA6:** Many
+
+* `rang` does not support reconstructing computational environments with R < 2.1.0 (i.e. `snapshot_date` < "2005-04-19 09:01") yet
+* `dockerize()` can only generate Debian/Ubuntu-based Docker images
+* `dockerize(cache = TRUE)` does not cache [R source code](https://cran.r-project.org/src/base/) (yet) and System Requirements (in `deb` packages)
+* `query_sysreqs()` (as well as `resolve(query_sysreqs = TRUE)`) queries for System Requirements based on the latest version of the packages on CRAN / Github. Therefore:
+ * Archived CRAN packages are assumed to have no System Requirements
+ * R Packages with changed System Requirements between `snapshot_date` and the date of running `resolve()` might produce incorrect System Requirements
+* A result from `resolve()` with R version < 3.1 and has at least one Github package must be dockerized with caching (i.e. `dockerize(cache = TRUE)`). It is because the outdated version of Debian cannot communicate with the Github API
+* R packages on Github (likely) or CRAN (very not likely) might not be available in the near future. But one can cache the packages (`dockerize(cache = TRUE)`).
+* The Rocker project and its host Docker Hub might not be available in the near future (not likely)
+* Ubuntu / Debian archives (for System Requirements) might not be available in the future (super not likely)
+
+**GQ7: `rang` depends on R >= 3.5.0. Several of the dependencies depend on many modern R packages. How dare you claiming your package supports R >= 2.1.0?**
+
+**GA7:** To clarify, it is true that `resolve()` and `dockerize()` depend on many factors. But the reconstruction process (if with caching of R packages) depends only on the availability of Docker images from Docker Hub, availability of R source code on CRAN (R < 3.1.0), and `deb` packages from Ubuntu and Debian in the future. If you don't believe in all of these, see also: DQ4.
+
+**GQ8: What are the data sources of `resolve()`?**
+
+**GA8:** Several
+
+* Dependencies / R version / System Requirements: r-hub APIs [pkgsearch](https://r-hub.github.io/pkgsearch/) [r-versions](https://api.r-hub.io/rversions) [sysreqs](https://sysreqs.r-hub.io/)
+* Github: [Github API](https://docs.github.com/en/rest)
+* Dependencies of Github packages: [Package manager from Posit](https://github.com/rstudio/r-system-requirements#operating-systems)
+
+**GQ9: I am not convinced by this package. What are the alternatives?**
+
+**GA9:** If you don't consider the Dockerization part of `rang`, the date-based pinning of R packages can be done by:
+
+* Using [Package Manager](https://packagemanager.rstudio.com/)
+
+```r
+library(pak)
+options(repos = c(REPO_NAME = "https://packagemanager.rstudio.com/cran/2019-07-21"))
+pkg_install("rio")
+pkg_install("crsh/papaja")
+```
+
+* Using [groundhog](https://groundhogr.com/)
+
+```r
+library(groundhog)
+pkgs <- c("rio","crsh/papaja")
+groundhog.library(pkgs, "2019-07-21")
+```
+
+If you don't consider the date-based pinning of R packages, the Dockerization can be done by:
+
+* Using [containerit](https://github.com/o2r-project/containerit) [not on CRAN]
+
+```r
+library(containerit)
+## combine with Package Manager to pin packages by date
+install.packages("rio")
+remotes::install_github("crsh/papaja")
+library(rio)
+library(papaja)
+print(containerit::dockerfile(from = utils::sessionInfo()))
+```
+
+* Using [dockerfiler](https://cran.r-project.org/web/packages/dockerfiler/index.html)
+
+```r
+library(dockerfiler)
+my_dock <- Dockerfile$new()
+## combine with Package Manager to pin packages by date
+my_dock$RUN(r(install.packages(c("remotes", "rio"))))
+my_dock$RUN(r(remotes::install_github("crsh/papaja")))
+my_dock
+```
+
+# Docker questions
+
+**DQ1: Is Docker an overkill to simply ensure that a few lines of R code are reproducible?**
+
+**DA1:** It might be the case for recent R code, e.g. R >= 3.0 (or `snapshot_date` > "2013-04-03 09:10"). But we position `rang` as an archaeological tool to run really old R code (`snapshot_date` >= "2005-04-19 09:01", but see GQ4). For this, Docker is essential because R in the 2.x (or even 1.x in future) series might not be installable anymore in a non-virtualized environment.
+
+According to [The Turing Way](https://the-turing-way.netlify.app/reproducible-research/compendia.html), a research compendium that aids computational reproducibility should contain a complete description of the computational environment. The directory exported by `dockerize()`, especially when `materials_dir` and `cache` were used, can be directly shared as a research compendium.
+
+**DQ2: How do I access bash instead of R?**
+
+**DA2:** By default, containers launched with the images generated by `rang` goes to R. One can override this by launching the container with an alternative entry point.
+
+Suppose an image was built as per GA1.
+
+```sh
+docker run --rm --name "rangcontainer" --entrypoint bash -ti rangtest
+```
+
+**DQ3: How do I copy files from and to a launched container?**
+
+**DA3:** Again an image was built as per GA1 and launched as below
+
+```sh
+docker run --rm --name "rangcontainer" -ti rangtest
+```
+
+```sh
+# probably you need to run this from another terminal
+docker cp rangcontainer:/rang.R rang2.R
+docker cp rang2.R rangcontainer:/rang2.R
+```
+
+We want to emphasize here that launching a container with `--name` is useful because the name of the container is randomly generated when `--name` was not used to launch it. It is also important to remind you that a relaunched container goes back to the initial state, any file generated inside the container will be removed. So use `docker cp` to copy any artifact if one wants to preserve any artifact.
+
+**DQ4: How do I back up an image?**
+
+**DA4:** If you don't believe Docker Hub / Debian / Ubuntu would be available forever, you may back up the generated image.
+
+```sh
+docker save rangtest | gzip > rangtest.tar.gz
+```
+
+You can also share the back up (usually < 1G, depending on the size of `materials_dir`, thus sharable on Zenodo).
+
+To restore the backup image:
+
+```sh
+docker load < rangtest.tar.gz
+```
+
+And launch a container the same way
+
+```sh
+docker run --rm --name "rangcontainer" -ti rangtest
+```