You need a beefy Linux x86_64
machine, with docker
and docker-compose
available (and ready to use by the current UNIX user).
You'll also need to access this host on port 443 (HTTPS).
For example on Ubuntu 16.04 (LTS):
sudo apt-get install -y docker docker-compose
sudo adduser $USER docker
sudo su $USER # so that the group is taken into account
docker ps # should not complain :)
Secotrec-local is a tool that can deploy the infrastructure needed to run Epidisco. We need to download the application:
mkdir ~/bin/
wget -O ~/bin/secotrec-local https://www.dropbox.com/s/qwf9272txvmue5n/secotrec-local-amd64-bin?raw=1
chmod +x ~/bin/secotrec-local
export PATH=~/bin:$PATH
And do a quick test:
secotrec-local --help
Secotrec-local is configured with environment variables, you can put them in a file and source it, or just paste them in your shell:
# HTTPS port to access the Ketrew UI:
export tls_port=443
# Directory where Epidisco will write files
export biokepi_work=$HOME/biokepi-work/
# Long, random, secret string of you choice:
export ketrew_auth_token=dsleaijdej308098ddecja9c8jra8cjrf98r
# The maximal number of jobs that are allowed to run at once:
export coclobas_max_jobs=8
# We also want a configured Epidisco development environment:
export epidisco_dev=$HOME/epidisco-shared
# We want to get recent version of most tools:
export coclobas_docker_image=hammerlab/keredofi:coclobas-aws-biokepi-dev
Docker can be pretty annoying with access rights, so let's be on the safe side:
umask 000
export secotrec_tmp=/tmp/fresh-tmp-dir
You check that the configuration is taken into account:
secotrec-local print-conf
Then you bring all the services up in one command (workflow manager, scheduler, database, etc):
secotrec-local up
the first time it takes a while because it pulls docker images.
Check the status of the deployment with:
secotrec-local status
Using the hostname or IP address of your host, you can access Ketrew's (the workflow manager) WebUI at (using the auth-token you defined above):
https:// ... /gui?token=dsleaijdej308098ddecja9c8jra8cjrf98r
Let's generate the “infrastructure” configuration for Epidisco,
biokepi_machine.ml
(named after
the Biokepi project which provides the
basic reusable building blocs that make up Epidisco):
secotrec-local biokepi-machine $epidisco_dev/biokepi_machine.ml
and then enter the docker-container which has all the necessary dependencies (incl. Epidisco itself):
secotrec-local docker-compose -- exec epidisco-dev opam config exec bash
Let's configure the client-side for Ketrew (fill your host like above):
ketrew init --just-client http://kserver:8080/gui?token=dsleaijdej308098ddecja9c8jra8cjrf98r
Let's just run a quick test of the configuration:
ketrew submit --dae /tmp/,"du -sh /biokepi" --wet-run
The job should be visible on the Ketrew WebUI (you can also use the “TextUI” by
simply typing coclobas-ketrew interact
and following the 80's-styled menus).
We just need to get the “epidisco runner:”
curl -O https://raw.githubusercontent.com/hammerlab/epidisco/master/tools/epirunner.sh
And check that it works:
bash epirunner.sh /epidisco-shared/biokepi_machine.ml --help=plain
We need to tell Biokepi how to download non-openly-accessible software, you need to fill these with your own URLs:
export GATK_JAR_URL="https://..."
export MUTECT_JAR_URL="https://..."
export NETMHC_TARBALL_URL="https://..."
export NETMHCPAN_TARBALL_URL="https://..."
export PICKPOCKET_TARBALL_URL="https://..."
export NETMHCCONS_TARBALL_URL="https://..."
export CIBERSORT_URL="https://..."
Let's now run it with some data:
export normal=https://storage.googleapis.com/hammerlab-biokepi-data/samples/training-dream/normal.chr20.bam
export tumor=https://storage.googleapis.com/hammerlab-biokepi-data/samples/training-dream/tumor.chr20.bam
export experiment=training-without-rna
export mhc_alleles="H-2-Kb,H-2-Db"
bash epirunner.sh /epidisco-shared/biokepi_machine.ml \
--normal $normal \
--tumor $tumor \
--reference-build b37decoy \
--results-path /biokepi/results \
--mhc-alleles $mhc_alleles \
$experiment
You can now go babysit your Epidisco run with the Ketrew UI.
If all goes as planned and succeeds, you should have an HTML report at
$biokepi_work/results/training-without-rna-1normals-1tumors--b37decoy/index.html
Then you can also try with bigger data, for instance:
export normal=https://storage.googleapis.com/dream-challenge/synthetic.challenge.set2.normal.bam
export tumor=https://storage.googleapis.com/dream-challenge/synthetic.challenge.set2.tumor.bam
export experiment=dream2
or just your own samples.
Just call:
secotrec-local down --yes
Yes, it happens, see
issue
hammerlab/biokepi#193
.
Ketrew's UI allows you to inspect the failed jobs.
You can also resubmit the pipeline while parts of the previous submission are still running; Ketrew will merge the new pipeline are piggy-back on the previous workflow.
We tested this
on a Google Cloud n1-highmem-32
node (32 vCPUs, 208 GB memory)
with a 3 TB hard disk.
Together with a quite conservative setting: coclobas_max_jobs=8
.
If your host is smaller (memory or CPU) allow less jobs to run in parallel
(again, with coclobas_max_jobs
).
Still, some individual jobs may be very memory-greedy (e.g. variant callers working on high-coverage areas).
You can put your input files (FASTQ or Bam) somewhere in the
$biokepi_work
directory, but from the docker-containers' point of view the
paths have a different prefix: /biokepi
.
For instance if you put your RNA input file in
$biokepi_work_dir/input/rna.bam
, you need to call Epidisco with
--rna /biokepi/input/rna.bam
.
Yes, it's possible :)
If your change is “client-side” (meaning that it impacts the definition of
the workflow submitted to Ketrew), you need to bring the change into the
epidisco-dev
container:
Once you enter (with secotrec-local docker-compose exec ...
), you're in a
fully functioning opam
environment, so let's say you have a branch of Biokepi
you want to test, you can simply:
opam pin add biokepi https://github.com/some-user/biokepi.git#my_branch
Secotrec can also call opam pin
on any given branch of Ketrew or Coclobas, e.g.:
export pin_ketrew=my_branch
...
secotrec-local up
Note that since the opam pin
triggers a bunch of re-builds the Ketrew and
Coclobas servers take longer to be available when there is a pin
, you can
check progress looking at the docker-compose logs:
secotrec-local docker-compose logs {kserver,coclo}
If secotrec was built without the postgresql
dependency, the secotrec-local
binary is quite portable, if you built it with the postres library, the binary
needs the shared libpq
library (sudo apt install -y libpq-dev
on
Debian/Ubuntu).