Skip to content
Peter edited this page Nov 28, 2019 · 8 revisions

Overview

This repository contains all code for the Guttman Lab SPRITE pipeline.

The main steps of this pipeline are:

  1. Identify barcodes in your sequenced reads.
  2. Align these reads to your genome of interest.
  3. Discard alignments that don't meet certain criteria.
  4. Group alignments into clusters.
  5. Create heatmaps from clusters.

Each of the individual steps are explained in more detail on each steps corresponding documentation page.

All of these steps have been automated using the Snakemake workflow management system. For detailed instruction of how to setup and use the pipeline please see the Snakemake-Pipeline page.

Installation and general requirements

To install, simply clone this repository. The Java code has been packaged as a JAR file which seems to run without problem on Mac and Linux systems.

The SPRITE pipeline has been tested on a high performance computing cluster running CentOS 7 and a local environment with 30GB of RAM, an i7-8750H CPU runningUbuntu 18.04.3 LTS. Local runtime for fastq files with around 45 million reads was approximately 7 hrs.

Java requirements

If you need to create a new JAR file rather than use the provided one, you'll need

The bio library also uses the htsjdk library for parsing BAM files. This workflow doesn't parse BAM files with Java/bio, but your IDE may display a lot of ugly warnings if htsjdk isn't on your build path.

Clone this wiki locally