Skip to content

Latest commit

 

History

History
65 lines (38 loc) · 2.26 KB

README.md

File metadata and controls

65 lines (38 loc) · 2.26 KB

Variant calling pipeline

A Snakemake workflow for calling and annotation of short variants.
Workflow takes paired-end Illumina short read data (fastq files) as input and outputs annotated variant calls in a vcf file as the final result. Input directory contains PE Illumina reads from a publicly available SARS-CoV-2 dataset SRA accession SRR15660643 downsampled to 16000 paired reads (sample.R1.paired.fq.gz and sample.R2.paired.fq.gz).
A fasta file with the Wuhan-Hu-1 reference genome Genbank accession MN908947.3 is included in the
reference directory (MN908947.3.fasta), along with the VEP cache for successful annotation of genomic features.

Usage

git clone https://github.com/LorenaDerezanin/pipeline_test

Step 1: Install Miniconda

Minimal conda installer for running pipeline in an isolated conda environment to avoid dependency hell and ensure reproducibility.

Step 2 (Recommended): Install mamba - faster package manager

conda install mamba -n base -c conda-forge

Recommended installation to speed up env setup. Mamba is a more robust and faster package manager (parallel download of data), and handles releases and dependencies better than conda. If continuing with conda, mamba command should be replaced with conda in Step 3.

Step 3: Recreate conda environment

cd pipeline_test/

mamba env create -n snek -f envs/snek.yml

Step 4: Activate environment

conda activate snek

Step 5: Run pipeline

snakemake --use-conda --cores 4 --verbose

Number of suggested --cores when running pipeline locally, should be increased if running on a cluster.

Troubleshooting

If conda fails to install snakemake v.6.15, install snakemake with mamba: mamba install snakemake.

Pipeline content

Bioinformatics tools used in the Snakemake workflow, in the form of snakemake wrappers obtained from The Snakemake Wrappers Repository:

  • fastQC
  • multiQC
  • trim_galore
  • bwa
  • samtools
  • picard
  • freebayes
  • bcftools
  • vep
  • to do:
    • Docker container + conda/mamba
    • AWS/Google cloud deployment
    • unit tests