snakemake_freyja_covidwastewater

This snakemake pipeline is designed to automate periodic Sars-CoV-2 wastewater sequencing data. It uses PE or SE Illumina sequencing data, trims the reads with fastp, maps with minimap2, classifies the reads with Kraken2, then processes the BAM files with Freyja to create an aggregated summary file and figure.

This pipeline was developed for Drs. Subhash Verma and Krishna Pagilla at the University of Nevada, Reno.

Configuration

Clone repo
Copy config.template.yaml to config.yaml
Edit config.yaml for all parameters

directory Read data were organized by site as the main directory (INPUT_DIR in directory tree below) with subdirectory specifying sample dates when the wastewater sample collection date in YYYY-MM-DD format. Within this sub-directory, only one collected sample may be present with R1 and R2 reads labeled in the filename in *.fastq.gz format. If only a R1 filename is found, it will assume SE Illumina sequencing.

Example of directory tree:

INPUT_DIR
├── 2022-10-03
│   ├── WW106_S13_L001_R1_001.fastq.gz -> /data/gpfs/assoc/raw_seq_10202022/WW106_L1_ds.06365230dcfa473a8fb6b8098ab760c4/WW106_S13_L001_R1_001.fastq.gz
│   └── WW106_S13_L001_R2_001.fastq.gz -> /data/gpfs/assoc/raw_seq_10202022/WW106_L1_ds.06365230dcfa473a8fb6b8098ab760c4/WW106_S13_L001_R2_001.fastq.gz
├── 2022-10-10
│   ├── WW113_S17_L001_R1_001.fastq.gz -> /data/gpfs/assoc/raw_seq_11072022/WW113_L1_ds.f0035b32a09a4ba5b32d9f51df134902/WW113_S17_L001_R1_001.fastq.gz
│   └── WW113_S17_L001_R2_001.fastq.gz -> /data/gpfs/assoc/raw_seq_11072022/WW113_L1_ds.f0035b32a09a4ba5b32d9f51df134902/WW113_S17_L001_R2_001.fastq.gz
└── 2022-10-24
    ├── WW117_S21_L001_R1_001.fastq.gz -> /data/gpfs/assoc/raw_seq_11072022/WW117_L1_ds.e1dbdd81b7244f7a9ab5db98e5ddcfdb/WW117_S21_L001_R1_001.fastq.gz
    └── WW117_S21_L001_R2_001.fastq.gz -> /data/gpfs/assoc/raw_seq_11072022/WW117_L1_ds.e1dbdd81b7244f7a9ab5db98e5ddcfdb/WW117_S21_L001_R2_001.fastq.gz

krakendb Location for the uncompressed kraken "standard" database; can be obtained here: https://benlangmead.github.io/aws-indexes/k2

ref Location of the Wuhan genome assembly; can be downloaded from Freyja's project page here: https://raw.githubusercontent.com/andersen-lab/Freyja/main/freyja/data/NC_045512_Hu-1.fasta

projectname The project name will be added to main output filenames (PDFs, aggregate summary TSV, and HTML QC reports). This allows users to run the pipeline with different input directories and projectname/site locations.

Create a "freyja" conda environment

##Create environment name
conda create -n freyja
conda activate freyja

##Add the necessary channels
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

##Install freyja
conda install freyja

Run freyja update to pull the latest covid variant classifications.

Periodic Updates

You may want to run freyja update within the freyja conda environment before running the pipeline to update the covid lineage classifications. The pipeline is designed to create a new aggregate summary TSV, PDF figures, and QC HTML reports every run with the current date in the filenames.

Run the pipeline

Run the pipeline using snakemake. Installation instructions here: https://snakemake.readthedocs.io/en/stable/getting_started/installation.html

The following commands assume the snakemake pipeline is ran on a local computer, thus be sure to configure the --cores variable with the proper amount available on the local system. Snakemake will automatically parallelize jobs that can be ran at once.

conda activate snakemake
snakemake --use-conda -prn --cores 16  ## This command tests and does a dry-run of the pipeline
snakemake --use-conda -pr --cores 16   ## This command runs the pipeline

Acknowledgement

This work was supported by funds from the US Treasury through the Coronavirus Aid, Relief, and Economic Security (CARES) Act and grants from the National Institute of General Medical Sciences (GM103440 and GM104944).

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
envs		envs
README.md		README.md
Snakefile		Snakefile
config.template.yaml		config.template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snakemake_freyja_covidwastewater

Configuration

Periodic Updates

Run the pipeline

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

Nevada-Bioinformatics-Center/snakemake_freyja_covidwastewater

Folders and files

Latest commit

History

Repository files navigation

snakemake_freyja_covidwastewater

Configuration

Periodic Updates

Run the pipeline

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages