This is a data analysis repository for the study at https://greenelab.github.io/iscb-diversity-manuscript/.
Datasets are stored in the data
directory.
This repository uses Git LFS to store large / binary datasets.
Make sure to have Git LFS installed locally before cloning the repository,
if you'd like to download the datasets.
You can also download datasets directly from the GitHub website by clicking "Raw".
The source code saves large files using XZ compression (denoted by an .xz
extension).
Since not all users are familiar with XZ-compression,
we have also created gzip exports of all XZ-compressed files
(with the convert-xz-to-gzip.bash
script).
These files are placed alongside their XZ source in the data
directory.
The source code pipelines use XZ compression since gzip encodes a timestamp causing non-deterministic output files.
This repository has a corresponding Docker image with the required dependencies.
See environment
for the Docker image specification.
Note that the following Docker commands have a --mount
argument to give the Docker container access to files in this repository.
Therefore, any changes to the repository content created while running the Docker container will persist in this directory after the container is stopped.
The Docker image is automatically built and published by a GitHub Action. Even though this repository is public, GitHub requires authentication to download from its package registry. Therefore, you will need a GitHub account to pull the image.
Use the following steps to authenticate your local docker with your GitHub.
Go to https://github.com/settings/tokens and create a new personal access token, selecting only the read:packages
scope.
You can name the token anything, for example "docker login read-only token".
Then run the following command, substituting your username and token from above:
docker login --username USERNAME --password TOKEN docker.pkg.github.com
For interactive development in Python notebooks, run the following command:
# This command must be run with the repository root as your working directory.
# Requires docker version >= 17.06.
docker run \
--name iscb-diversity \
--detach --rm \
--env JUPYTER_TOKEN=ksbegpqzrurktbkikyo \
--publish 8899:8888 \
--mount type=bind,source="$(pwd)",target=/user/jupyter \
docker.pkg.github.com/greenelab/iscb-diversity/iscb-diversity
Then navigate to the following URL in your browser: http://localhost:8899?token=ksbegpqzrurktbkikyo
You should see a Jupyter Notebook landing page where you can open, edit, and run any of the notebooks.
When you are done, you shutdown the Jupyter notebook server and remove the Docker container by running docker stop iscb-diversity
in a new terminal.
Similarly, for the R notebooks:
# This command must be run with the repository root as your working directory.
# Requires docker version >= 17.06.
docker run \
--name iscb-diversity-r \
--detach --rm \
--publish 8787:8787 \
--env DISABLE_AUTH=true \
--mount type=bind,source="$(pwd)",target=/home/rstudio/repo \
docker.pkg.github.com/greenelab/iscb-diversity/iscb-diversity-r
Navigate to http://localhost:8787 and you should be logged into RStudio as the rstudio user.
When you are done, shutdown the RStudio server and remove the Docker container by running docker stop iscb-diversity-r
.
The docs
directory is used as the GitHub Pages source for https://github.com/greenelab/iscb-diversity.
To regenerate outputs in the docs
directory, run the following command
python utils/prepare_docs.py --nbviewer --readme
Edit utils/prepare_docs.py
to change the template for docs/readme.md
.
The entire repository is released under the CC BY 4.0 License available in license.md
.
All code files and snippets are additionally released under the BSD 3-Clause License available in license-code.md
.