This repository contains the Nextflow executor used inside the OpenEBench VRE instance. The execution is done following the very secure docker-in-docker approach.
The executor is a python wrapper which is able to understand the way
inputs and outputs are defined by OpenVRE. It expects to receive from VRE both
nextflow_repo_uri
and nextflow_repo_tag
input parameters in order to
learn where the workflow to be run can be fetched. These parameters are
used to issue a git clone -b nextflow_repo_tag nextflow_repo_uri
command
in order to materialize the workflow in a given directory.
If the executor also receives a nextflow_repo_reldir
, this relative
directory within the folder containing the materialized workflow will
be used to find the nextflow.config
file which should be available in
any minimally documented Nextflow workflow.
If the executor also receives a nextflow_repo_profile
, this profile
should exist in the nextflow.config
file, and it will be used when
the workflow is executed. By default, docker
profile is expected.
You can tell no explicit profile to be used just declaring nextflow_repo_profile
either empty or null.
Through this configuration file the executor can learn several details:
- The workflow entry point through the
mainScript
directive. - The minimal required version of Nextflow through
nextflowVersion
.
Right now, only this last one is taken into account.
The project
input parameter tells the wrapper where the results and
other final files can be stored. If that directory contains an archived
version of the workflow, which usually happens when a workflow is re-run,
then it will be used instead of fetching it either from the workflow
cache or from internet.
As this executor was created for OpenEBench, there are several hardcoded
expected input parameters which should be understood by the workflow to
be run: challenges_ids
, participant_id
, input
, goldstandard_dir
,
public_ref_dir
, assess_dir
.
There are also several hardcoded expected parameters which rule out where
the outputs and metadata are stored: statsdir
, outdir
, otherdir
.
The executor will pass to Nextflow other additional parameters which
OpenVRE has declared in its metadata. When some of this parameters represent
a path, it has to fulfil some restrictions: the path should be a descendant
of any of the other required paths, like for instance public_ref_dir
;
the path should be managed as such inside the workflow, so docker instance
volume mappings are properly done before each process step is run.
Before the first run of the wrapper, follow the installation instructions in INSTALL.md.
Once the dependencies are installed, a VRE_NF_RUNNER_py.ini
deployment
configuration file is needed in order to know the default Nextflow version
to use, the directory where the workflows should be cached, etc...
You can find the template configuration file at VRE_NF_RUNNER_py.ini.template, with all the understood deployment setup parameters described, and either declared or commented.
If you want to test your own benchmarking workflow before deploying it to OpenEBench VRE, you can use the script test_VRE_NF_RUNNER.sh.
The currently available tests are the different subdirectories at
tests directory. Each subdirectory has a README.md
file
describing how to materialize the needed datasets and containers for that
specific test. Usually, they are in the form of materialize-data.sh
and
materialize-containers.sh
scripts, which should be run from within the
test directory.
Next commands (which should be run once) show how to materialize both the containers and datasets needed by the TCGA test:
cd tests/TCGA
bash ./materialize-data.sh
bash ./materialize-containers.sh
Next command shows how run the tests from TCGA example:
./test_VRE_NF_RUNNER.sh TCGA
which will create a fake project directory at TCGA-test
, and it will use
the files config.json
and in_metadata.json
from the directory
[tests/TCGA] to read the parameters, inputs and expected outputs, and the
relative or absolute path of the input files and directories. Relative paths
are interpreted from the very same directory where the script is run.
Both JSON files follow the very same format OpenVRE uses.
If you want to test your own workflows with your own metadata, you should to mimic one of the existing templates inside tests.