Skip to content

Latest commit

 

History

History
226 lines (166 loc) · 10.2 KB

README.md

File metadata and controls

226 lines (166 loc) · 10.2 KB

Bioinformatics Template in Nextflow

BiTeN is a short pipeline written in nextflow that aims to be used as a template for nextflow pipeline development.

Nextflow is a free, open source software project that facilitates the execution of a computational workflow consisting of a series of interconnected steps/tasks. Utilizing Nextflow can take various forms. This repository offers a specific example illustrating how a bioinformatician can organize their code to be executed using Nextflow.

Table of Contents

Foreword

The pipeline and the whole repository (readme/Contributing/etc) can be use as template for nextflow pipeline projects. Comment in pipeline's code help the user to better understand the different usages.

This pipeline template follow the following steps:

  • handling parameters, file input and help (Deal with gz and not gz file, deal with paired and unpaired input reads, etc.)
  • QC
  • Alignment
  • file conversion (sam2bam)
  • file sorting (samtools_sort)

Project layout

BiTeN/
├── README.md               # Documentation that gives users a detailed description of a project and with guidelines on how to use it.
├── LICENSE                 # Lience of your projet. Licenses are important for open-source projects because they set the legal terms and conditions for using, distributing, and modifying the software
├── CONTRIBUTING.md         # Provides potential project contributors with a short guide to how they can help with your project 
├──  img                    # Folder containing images used by the README
|
|         // FROM HERE IT IS RELATED TO THE NEXTFLOW PIPELINE
|
├── main.nf                # The nextflow main executable file use to run your pipeline. It contains the logic of your pipeline
├── modules/               # Modules folder contains components that can be included in workflows. Think as functions in programming languages. Module were introduced in DSL2. See https://www.nextflow.io/docs/latest/module.html It is encouraged to have a module file by tool.
│   ├── bowtie2.nf         # A module file containings processes (the basic processing primitive to execute a user script see https://www.nextflow.io/docs/latest/process.html#processes) related to the bowtie2 tool.
│   ├── fastqc.nf          # A module file containings processes related to the fastqc tool.
│   ├── samtools.nf        # A module file containings processes related to the samtools tool.
│   └── template.nf        # A template module file.
├── subworkflows/          # Subworkflows folder contains workflow components that can be included in other workflows, typically used by the main workflow in the main.nf 
├── nextflow.config        # Configuration file. Nextflow has multiple way to handle config ((see here)[https://www.nextflow.io/docs/latest/config.html#configuration-file]). We can define it this file, parameters, profiles, etc.
├── ressources/            # Contains configuration files that define the differents ressources i.e. computing and tools
│   ├── computing/         # Contains configuration files that define the computing ressources that will be loaded via profiles
│   │   ├── hpc.config     # A hpc configuration that define computing ressource on HPC (CPU, TimeOut, RAM per process/label and other information like parallelisation and scheduler)
│   │   └── local.config   # A local configuration that define computing ressource on local machine (CPU, TimeOut, RAM per process/label).
│   └── softwares.config   # A software configuration that define where Nextflow have to fetch the container of each tool.
└── test                   # Folder containing a test data set 
    ├── reads.fastq.gz
    └── genome.fa

Helping to develop

Documentation

Community

Installation

The prerequisites to run the pipeline are:

BiTeN

# clone the workflow repository
git clone https://github.com/Juke34/BiTeN.git

# Move in it
cd BiTeN

Nextflow

  • Via conda

    See here
    conda create -n nextflow
    conda activate nextflow
    conda install nextflow
    
  • Manually

    See here Nextflow runs on most POSIX systems (Linux, macOS, etc) and can typically be installed by running these commands:
    # Make sure 11 or later is installed on your computer by using the command:
    java -version
    
    # Install Nextflow by entering this command in your terminal(it creates a file nextflow in the current dir):
    curl -s https://get.nextflow.io | bash 
    
    # Add Nextflow binary to your user's PATH:
    mv nextflow ~/bin/
    # OR system-wide installation:
    # sudo mv nextflow /usr/local/bin
    

Container platform

To run the workflow you will need a container platform: docker or singularity.

Docker

Please follow the instructions at the Docker website

Singularity

Please follow the instructions at the Singularity website

Usage

You can first check the available options and parameters by running: nextflow run BiTeN.nf --help

To run the workflow you must select a profile according to the container platform you want to use:

  • singularity, a profile using Singularity to run the containers
  • docker, a profile using Docker to run the containers

The command will look like that:

nextflow run main.nf -profile docker <rest of paramaters>

Another profile is available (/!\Work in progress):

  • slurm, to add if your system has a slurm executor (local by default)

The use of the slurm profile will give a command like this one:

nextflow run main.nf -profile docker,slurm <rest of paramaters>

Test the workflow

Test data are included in the BiTeN repository in the test folder.

A typical command to run a test on single end data will look like that:

nextflow run -profile local,docker main.nf --genome test/genome.fa --reads test --single_end true

On success you should get a message looking like this:

  BiTeN Pipeline execution summary
    --------------------------------------
    Completed at : 2024-03-07T21:40:23.180547+01:00
    UUID         : e2a131e3-3652-4c90-b3ad-78f758c06070
    Duration     : 8.4s
    Success      : true
    Exit Status  : 0
    Error report : -

Parameters

Parameter Comment
--help prints the help section
--reads         path to the directory containing the reads
--pattern_reads pattern to match the read files. In the case of single end data it would looks like: ".fastq.gz". In the case of paired end data it would looks like: "_{R1,R2}001.fastq.gz" or "*{1,2}.fastq.gz"
--single_end Boolean to inform if we have a single end or paired end data.
--stranded Boolean to inform if we have a single or stranded data.
--genome path to the genome file in fasta format.
--bowtie2_options Parameter to tune the bowtie2 aligner behaviour.

Contributing

We welcome contributions from the community! See our Contributing guidelines

Report bugs and issues

Found a bug or have a question? Please open an issue.

How to cite?

If you use this template for your developement please cite or acknowledge e.g.

Development based on the BiTeN template (https://github.com/Juke34/BiTeN), Dainat J.

Acknowledgement

Jacques Dainat (@Juke34) Juliette Hayer (@jhayer) Mahesh Binzer-Panchal (@mahesh-panchal)