Consider using the original project tschaffter/rstudio, which comes with additional features and is more frequently updated.
Docker image for analyses using RStudio and Python-Conda
The motivation for this project is to encourage the use of portable development environments in research and engineering. The environment should be intuitive to use so that anyone can deploy it and reproduce your results - even you six months from now!
This project provides a portable development environment that enables you to use R and Python together. The Docker image sagebionetworks/rstudio offered by this project is based on the image rocker/rstudio.
Features:
- Use sentitive information like credentials without specifying them in your notebooks, hence preventing the risk of publishing this information publicly.
- Create and manage Conda environments (Miniconda) using the R library reticulate to run and/or develop Python programs that require different version of Python or packages.
- Render Rmd notebook to HTML using the Docker image provided in this project, e.g. to generate HTML notebooks in GitHub workflows before publishing them to GitHub Pages.
- Benefit from regular updates of the image sagebionetworks/rstudio which will bring the latest versions of R/RStudio and other dependencies (Miniconda, R and Python packages).
- You only need the Docker Engine on your system to develop code in R and Python (see Requirements).
This image includes the following common Sage Bionetworks software:
- R libraries
- sagethemes: Sage-branded plot themes.
- Python packages
- challengeutils: Synapse challenge utility functions.
- synapseclient: Programmatic interface to Synapse services for Python.
All packages:
- R (see renv.lock).
- Python (see conda/sage-bionetworks/environment.yml).
- Docker Engine >=19.03.0
The example notebooks below are rendered to HTML and published to GitHub Pages by the CI/CD workflow of this repository.
Rmd Notebook | Description | HTML Notebook |
---|---|---|
notebook.Rmd | Default RStudio notebook. | |
r-and-python.Rmd | Shows how to use R and Python together. | |
sagethemes.Rmd | Example notebook provided by the R library sagethemes. | |
synapse.Rmd | Shows how to interact with Synapse API. |
Important: Please make sure when you write your own notebooks that no sensitive information ends up being publicly available. Please check with the information security officer of your organization to confirm that the approach described here can be applied to your use case.
-
Create and edit the configuration file. You can initially start RStudio using this configuration as-is.
cp .env.example .env
-
Start RStudio. Add the option
-d
or--detach
to run in the background.docker compose up
RStudio is now available at http://localhost. On the login page, enter the
default username (rstudio
) and the password specified in .env
.
To stop RStudio, enter Ctrl+C
followed by docker compose down
. If running
in detached mode, you will only need to enter docker compose down
.
You can use the image sagebionetworks/rstudio as-is to start an instance of RStudio and develop tools that interact with Sage Bionetworks services, e.g. Synapse.
If you want to create a portable development environment, start by creating a new GitHub repository from this template. You can then customize your environment by specifying the R and Python packages to include with your image. Finally, edit the the GitHub workflow .github/workflows/ci.yml to indicates the Docker repository where the image should be pushed (see Section Versioning).
Example projects that use this repository / image:
- Sage-Bionetworks-Challenges/challenge-analysis (work in progress)
The configuration of the development environment is defined in the file .env
.
This file contains environment variables that are set when the environment
starts.
For security reason, a user session in RStudio does not see all the environment
variables of the system. However, the variables defined in .env
with a name
that starts with APP_
will be made visible to the user session via the
creation of the file .Renviron
.
> variables <- names(s <- Sys.getenv())
> variables[startsWith(variables, "APP_")]
[1] "APP_BAR" "APP_FOO"
In RStudio, use the following options to add and update libraries:
Tools
>Install Packages...
Tools
>Check for Package Updates...
Run the command renv::snapshot(type="all")
to update the file renv.lock
,
which is used in Dockerfile
to install the required R libraries.
See the content of the folder conda
for an example of how to define a conda
environment. The packages to add to this environment must be added to the file
requirements.txt
. The creation of one or more Conda environments can be
specified in Dockerfile
.
Set the environment variables SYNAPSE_TOKEN
to the value of one of your
Synapse Personal Access Tokens. If this variable is set, it will be used to
create the configuration file ~/.synapseConfig
when the container starts.
This Docker image comes with Miniconda installed (see below) and an example
Conda environment named sage-bionetworks
. This environment includes packages
used to interact with the collaborative platform Synapse developed by Sage
Bionetworks.
Attach to the RStudio container (here assuming that rstudio
is the name of the
container). For better safety, it is recommended to work as a non-root user. You
can then list the environments available, activate an existing environment or
create a new one.
$ docker exec -it rstudio bash
container # su yourusername
container $ conda env list
container $ conda activate sage-bionetworks
The R code below lists the environment available before activating the existing
environment named sage-bionetworks
.
> library(reticulate)
> conda_list()
name python
1 miniconda /opt/miniconda/bin/python
2 sage-bionetworks /opt/miniconda/envs/sage-bionetworks/bin/python
> use_condaenv("sage-bionetworks", required = TRUE)
When using Docker volumes, permissions issues can arise between the host OS and
the container. You can avoid these issues by letting RStudio know the User ID
(UID) and Group ID (GID) it should use when creating and editting files so that
these IDs match yours, which you can get using the command id
:
$ id
uid=1000(kelsey) gid=1000(kelsey) groups=1000(kelsey)
In this example, we would set USERID=1000
and GROUPID=1000
.
Set the environment variable ROOT=TRUE
(default is FALSE
).
docker logs --follow rstudio
This Docker image provides the command render
that generates an HTML or PDF
notebook from an R notebook (.Rmd). Run the command below from the host to
mount the directory $(pwd)/notebooks
where the R notebook is and generate the
HTML notebook that will be saved to the same directory with the extension
.nb.html
.
docker run --rm \
--env-file .env \
-v $(pwd)/notebooks:/data \
sagebionetworks/rstudio:4.1.0 \
render /data/examples/*.Rmd
This repository uses semantic versioning to track the releases of this project. This repository uses "non-moving" GitHub tags, that is, a tag will always point to the same git commit once it has been created.
The artifact published by this repository is the Docker image sagebionetworks/rstudio. The versions of the image are aligned with the versions of R/RStudio, not the GitHub tags of this repository.
The table below describes the image tags available.
Tag name | Moving | Description |
---|---|---|
latest |
Yes | Latest stable release. |
edge |
Yes | Lastest commit made to the default branch. |
weekly |
Yes | Weekly release from the default branch. |
<major> |
Yes | Latest stable major release of R/RStudio. |
<major>.<minor> |
Yes | Latest stable minor release of R/RStudio. |
<major>.<minor>.<patch> |
Yes | Latest stable patch release of R/RStudio. |
<major>.<minor>.<patch>-<sha> |
No | Same as above but with the reference to the git commit. |
You should avoid using a moving tag like latest
when deploying containers in
production, because this makes it hard to track which version of the image is
running and hard to roll back.