Skip to content
Anna Price edited this page Sep 3, 2020 · 3 revisions

Autodatabase is a Nextflow DSL2 pipeline that automates the building of a Kraken2 database from fasta files. It requires Nextflow version>= 20.01.0 and either Docker or Singularity to run.

Overview of Nextflow

Nextflow is a workflow manager that enables the development of portable and reproducible bioinformatics workflows. It can be partnered with Docker or Singularity containers to create fully self-contained and reproducible workflows. Nextflow greatly simplifies writing bioinformatics pipelines which can run in parallel, as the parallelisation is implicitly defined by the process input.

Nextflow DSL2 provides new syntax that allows the construction of workflows to be modular. Workflow blocks are introduced and processes can be split over several module files, allowing for a library based approach of building bioinformatics pipelines.

Overview of Docker and Singularity

The requirements (Kraken2, Mash and Python 3) for the autodatabase pipeline are managed using containers. Autodatabase has profiles set so that the pipeline can be run using either Docker or Singularity containers.

Docker and Singularity are container engines which can be used to deploy software packages in lightweight, standalone, and reproducible environments. Using containers, software is packaged with its dependencies and is isolated from the host machine. This ensures software runs uniformly regardless of infrastructure.

What the autodatabase pipeline does

The input for the autodatabase pipeline are fasta files which are organised into directories for each taxon where the directory name is the taxon name with spaces replaced with underscores (e.g. Mycobacterium_tuberculosis_complex, Escherichia_coli). It uses the directory name to look up the taxonomic ID and map to the fastas, and provides quality control using Mash to select the high quality assemblies which are then used to build the Kraken2 database.

Clone this wiki locally