Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FAQ #26 #51

Merged
merged 2 commits into from
Feb 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Suggests:
knitr,
rmarkdown,
testthat (>= 3.0.0)
Config/testthat/edition: 3
Imports:
Expand All @@ -25,3 +27,4 @@ Imports:
gh
Depends:
R (>= 3.5.0)
VignetteBuilder: knitr
2 changes: 2 additions & 0 deletions vignettes/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.html
*.R
221 changes: 221 additions & 0 deletions vignettes/faq.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
---
title: "FAQ"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{FAQ}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

# General questions

**GQ1: Tell me in 10 lines how to use this package.**

**GA1:** Get the dependency graph of several R packages on CRAN or Github at a specific snapshot date(time)

```r
graph <- resolve(c("crsh/papaja", "rio"), snapshot_date = "2019-07-21")
```

Dockerize the dependency graph to a directory

```r
dockerize(graph, output_dir = "rangtest")
```

You can build the Docker image either by the R package `stevedore` or Docker CLI client. We use the CLI client.

```sh
cd rangtest
docker build -t rangtest . ## might need sudo
```

Launch the container with the built image

```sh
docker run --rm --name "rangcontainer" -ti rangtest
```

And the tenth line is not needed.

**GQ2: For running `resolve()`, how do I know which packages are used in a project?**

**GA2:** We recommend `renv::dependencies()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mention that this only works with CRAN packages? That is, it doesnt tell you if a package comes from github


Suppose in a directory called "project" there are two R files:

```r
here::here()
```

```r
library(rio)
x <- import("hello.csv")
```

Running this reveals

```{r include = FALSE}
## x <- tempfile()
## dir.create(x)
## writeLines(c("library(rio)", "here::here()"), file.path(x, "fake.R"))
## y <- renv::dependencies(x)
## y$Source <- c("myproject/1.R", "myproject/2.R")
## y
```

<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true"></a>renv<span class="op">::</span><span class="kw">dependencies</span>(<span class="st">&quot;project&quot;</span>)</span></code></pre></div>
<pre><code>#&gt; Finding R package dependencies ... Done!
#&gt; Source Package Require Version Dev
#&gt; 1 myproject/1.R here FALSE
#&gt; 2 myproject/2.R rio FALSE</code></pre>

You may still need to manually review which packages are from Github.

**GQ3: Why is the R script generated by `dockerize()` and `export_rang()` so strange/unidiomatic/inefficient/did you guys read `fortunes::fortune("answer is parse")`?**

**GA3:** It is because we optimize the R code in `rang.R` for backward compatibility. We need to make sure that the code runs well in vanilla R environments since 2.1.0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add something in the line of "you better not edit it anyway"?


**GQ4: Why doesn't `rang` support reconstructing computational environments with R < 2.1.0 yet?**

**GA4:** It is because installing source packages from within R was introduced in R 2.1.0. Before that one needed to install source packages with `R CMD INSTALL`. But we are working on supporting R in the 1.x series.

**GQ5: Does `rang.R` (generated by `export_rang()` or `dockerize()`) run on non-Linux OSes?**

**GA5:** Theoretically speaking, yes. But strongly not recommended. If the system requirements are fulfilled, `rang.R` should probably run fine on OS X if the R packages do not contain compiled code. C and Fortran compilers are needed if it is the case. See [this entry](https://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#Installation-of-source-packages) in R Mac OS X FAQ. On Windows, installing Github packages requires properly set up PATH and `tar`. Similarly, R packages with compiled code require C / Fortran compilers. See [this entry](https://cran.r-project.org/bin/windows/base/rw-FAQ.html#Can-I-install-packages-into-libraries-in-this-version_003f) in R for Windows FAQ.

**GQ6: What are the caveats of using rang?**

**GA6:** Many

* `rang` does not support reconstructing computational environments with R < 2.1.0 (i.e. `snapshot_date` < "2005-04-19 09:01") yet
* `dockerize()` can only generate Debian/Ubuntu-based Docker images
* `dockerize(cache = TRUE)` does not cache [R source code](https://cran.r-project.org/src/base/) (yet) and System Requirements (in `deb` packages)
* `query_sysreqs()` (as well as `resolve(query_sysreqs = TRUE)`) queries for System Requirements based on the latest version of the packages on CRAN / Github. Therefore:
* Removed CRAN packages are assumed to have no System Requirements
* R Packages with changed System Requirements between `snapshot_date` and the date of running `resolve()` might produce incorrect System Requirements
* A result from `resolve()` with R version < 3.1 and has at least one Github package must be dockerized with caching (i.e. `dockerize(cache = TRUE)`). It is because the outdated version of Debian cannot communicate with the Github API
* R packages on Github or CRAN might not be available in the near future (Github: likely; CRAN: very unlikely). But one can cache the packages (`dockerize(cache = TRUE)`).
* The Rocker project and its host Docker Hub might not be available in the near future (unlikely)
* Ubuntu / Debian archives (for System Requirements) might not be available in the future (super unlikely)

**GQ7: `rang` depends on R >= 3.5.0. Several of the dependencies depend on many modern R packages. How dare you claiming your package supports R >= 2.1.0?**

**GA7:** To clarify, it is true that `resolve()` and `dockerize()` depend on many factors, including a modern version of R. But the reconstruction process (if with caching of R packages) depends only on the availability of Docker images from Docker Hub, availability of R source code on CRAN (R < 3.1.0), and `deb` packages from Ubuntu and Debian in the future. If you don't believe in all of these, see also: DQ4.

**GQ8: What are the data sources of `resolve()`?**

**GA8:** Several

* Dependencies / R version / System Requirements: r-hub APIs [pkgsearch](https://r-hub.github.io/pkgsearch/) [r-versions](https://api.r-hub.io/rversions) [sysreqs](https://sysreqs.r-hub.io/)
* Github: [Github API](https://docs.github.com/en/rest)
* Dependencies of Github packages: [Package manager from Posit](https://github.com/rstudio/r-system-requirements#operating-systems)

**GQ9: I am not convinced by this package. What are the alternatives?**

**GA9:** If you don't consider the Dockerization part of `rang`, the date-based pinning of R packages can be done by:

* Using [Package Manager](https://packagemanager.rstudio.com/)

```r
library(pak)
options(repos = c(REPO_NAME = "https://packagemanager.rstudio.com/cran/2019-07-21"))
pkg_install("rio")
pkg_install("crsh/papaja")
```

* Using [groundhog](https://groundhogr.com/)

```r
library(groundhog)
pkgs <- c("rio","crsh/papaja")
groundhog.library(pkgs, "2019-07-21")
```

If you don't consider the date-based pinning of R packages, the Dockerization can be done by:

* Using [containerit](https://github.com/o2r-project/containerit) [not on CRAN]

```r
library(containerit)
## combine with Package Manager to pin packages by date
install.packages("rio")
remotes::install_github("crsh/papaja")
library(rio)
library(papaja)
print(containerit::dockerfile(from = utils::sessionInfo()))
```

* Using [dockerfiler](https://cran.r-project.org/web/packages/dockerfiler/index.html)

```r
library(dockerfiler)
my_dock <- Dockerfile$new()
## combine with Package Manager to pin packages by date
my_dock$RUN(r(install.packages(c("remotes", "rio"))))
my_dock$RUN(r(remotes::install_github("crsh/papaja")))
my_dock
```

# Docker questions

**DQ1: Is Docker an overkill to simply ensure that a few lines of R code are reproducible?**
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the wording like this might irritate someone

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like "Is docker really necessary to simply ensure..."


**DA1:** It might be the case for recent R code, e.g. R >= 3.0 (or `snapshot_date` > "2013-04-03 09:10"). But we position `rang` as an archaeological tool to run really old R code (`snapshot_date` >= "2005-04-19 09:01", but see GQ4). For this, Docker is essential because R in the 2.x (or even 1.x in future) series might not be installable anymore in a non-virtualized environment.

According to [The Turing Way](https://the-turing-way.netlify.app/reproducible-research/compendia.html), a research compendium that aids computational reproducibility should contain a complete description of the computational environment. The directory exported by `dockerize()`, especially when `materials_dir` and `cache` were used, can be directly shared as a research compendium.

**DQ2: How do I access bash instead of R?**

**DA2:** By default, containers launched with the images generated by `rang` goes to R. One can override this by launching the container with an alternative entry point.

Suppose an image was built as per GA1.

```sh
docker run --rm --name "rangcontainer" --entrypoint bash -ti rangtest
```

**DQ3: How do I copy files from and to a launched container?**

**DA3:** Again an image was built as per GA1 and launched as below

```sh
docker run --rm --name "rangcontainer" -ti rangtest
```

```sh
# probably you need to run this from another terminal
docker cp rangcontainer:/rang.R rang2.R
docker cp rang2.R rangcontainer:/rang2.R
```

We want to emphasize here that launching a container with `--name` is useful because the name of the container is randomly generated when `--name` was not used to launch it. It is also important to remind you that a relaunched container goes back to the initial state. Any file generated inside the container previously will be removed. So use `docker cp` to copy any artifact if one wants to preserve any artifact.

**DQ4: How do I back up an image?**

**DA4:** If you don't believe Docker Hub / Debian archives / Ubuntu archives would be available forever, you may back up the generated image.

```sh
docker save rangtest | gzip > rangtest.tar.gz
```

You can also share the back up gzipped tarball file (usually < 1G, depending on the size of `materials_dir`, thus sharable on Zenodo).

To restore the backup image:

```sh
docker load < rangtest.tar.gz
```

And launch a container the same way

```sh
docker run --rm --name "rangcontainer" -ti rangtest
```