Skip to content

Latest commit

 

History

History
executable file
·
135 lines (92 loc) · 6.67 KB

output.md

File metadata and controls

executable file
·
135 lines (92 loc) · 6.67 KB

nf-core/nanoseq: Output

This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

  • Guppy - demultiplexing of Nanopore data
  • PycoQC - read quality control
  • NanoPlot - read quality control
  • GraphMap2 - mapping for long reads
  • MiniMap2 - mapping for long reads
  • SortBam - coordinate sort BAM files using SAMtools
  • bedtools - create bigWig and bigBed files
  • MultiQC - aggregate report, describing results of the alignment

Demultiplexing

Documentation:
Guppy

Description:
Guppy will demultiplex and barcode the data given from an ONT device. The flowcell, kit and barcode kit must be given in the command line if demultiplexing needed. This step can by bypassed using the --skip_demultiplexing parameter when initiating the pipeline. The output folders will be separated into the barcodes from the kit used and unclassified. The output in each barcode folder is then merged into one fastq file for easier downstream processing.

Output directories:

  • guppy/basecalling/barcode*/
    FastQ files output for each barcode
  • guppy/basecalling/unclassified/
    FastQ files output that are unclassified
  • guppy/fastq/ Merged output of fastq files into one fastq for each barcode

Sequencing Quality Control

Documentation:
PycoQC, NanoPlot

Description:
PycoQC and NanoPlot give general quality metrics about the sequencing run. It provides information about the distribution of read length, read length over time, number of reads per barcode and other general stats. PycoQC - Number of Reads per Barcode plot

Output directories:

  • pycoQC/
    An .html file output is produced that includes a run summary and graphical representation of distribution of read length, distribution of read quality scores, mean read quality per sequence length, output per channel over experiment time, output over experiment time, read quality over experiment time, readlength over experiment time, and percentage of reads per barcode.
  • nanoplot/summary/
    An output of .png files of metric plots and an html summary file of overall run.

FastQ Quality Control

Documentation:
NanoPlot

Description:
NanoPlot give general quality metrics about the fastq output per barcode from Guppy. It provides information about the quality score distribution across your reads, read lengths and other general stats. Nanoplot - Read quality vs read length

Output directories:

  • nanoplot/fastq/ An output of QC metric plots in individual .png files and in one html file summarizing the output.

Alignment

Documentation:
GraphMap2, MiniMap2, SortBam

Description:
The FastQ reads are mapped to the given reference assembly provided using either GraphMap2 or Minimap2 and then sorted and indexed using SAMtools or these processes can be bypassed using the --skip_alignment parameter.

The files resulting from the alignment with graphmap2 or minimap2 of individual libraries are not saved by default so this directory will not be present in your results. You can override this behaviour with the use of the --save_align_intermeds flag in which case it will contain the coordinate sorted alignment files in *.bam format.

ALIGNER - Alignment per barcode

Output directories:

  • graphmap2/ If the --aligner graphmap2 parameter is used, the sorted and indexed bam files will be output here.
  • minimap2/ If the --aligner minimap2 parameter is used, the sorted and indexed bam files will be output here.
  • <ALIGNER>/samtools_stats/ *.flagstat, *.idxstats and *.stats files generated from the alignment files using SAMtools.

bigWig and bigBed

Documentation: BEDTools, bedGraphToBigWig, bedToBigBed

Description: Creation of bigWig and bigBed coverage tracks for visualisation. This can be bypassed by setting the parameters --skip_bigwig and/or --skip_bigbed.

Output directories:

  • <ALIGNER>/bigwig/
    The bigWig files will be output here.
  • <ALIGNER>/bigbed/
    The bigbed files will be output here.

MultiQC

MultiQC is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in within the report data directory.

The pipeline has special steps which allow the software versions used to be reported in the MultiQC output for future traceability.

Output directories:

  • multiqc/Project_multiqc_report.html
    MultiQC report - a standalone HTML file that can be viewed in your web browser
  • multiqc/multiqc_data/
    Directory containing parsed statistics from the different tools used in the pipeline
  • multiqc/multiqc_plots/ Directory containing the image files of the graphs included in MultiQC

For more information about how to use MultiQC reports, see http://multiqc.info

Pipeline information

Documentation:
Nextflow

Description:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to trouble-shoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

Output directories:

  • pipeline_info/
    • Reports generated by the pipeline - pipeline_report.html, pipeline_report.txt and software_versions.csv.
    • Reports generated by Nextflow - execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.svg.
    • Reformatted samplesheet files used as input to the pipeline - samplesheet_reformat.csv.
  • Documentation/
    Documentation for interpretation of results in HTML format - results_description.html.