Skip to content

ding-lab/BreakPointSurveyor-Core

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

BreakPointSurveyor-Core

Core utilities implementing BreakPointSurveyor workflow.

Overview

BreakPointSurveyor (BPS) is a set of core libraries (this project) and workflows which, with optional external tools, evaluate genomic sequence data to discover, analyze, and provide a visual summary of interchromosomal breakpoint events.

The BreakPointSurveyor project provides three reference workflows, each implemented as a separate git branch. These workflows (and the links to view them) are:

  • TCGA_Virus (master branch): Comprehensive workflow and data for one TCGA virus-positive sample (TCGA-BA-4077-01B-01D-2268-08) which has been aligned to a custom reference
  • 1000SV (1000SV branch): Analysis of discordant reads on publicly available human sample
  • Synthetic (Synthetic branch): Creation and analysis of a dataset containing an inter-chromosomal breakpoint

Citation

Matthew A. Wyczalkowski, Kristine M. Wylie, Song Cao, Michael D. McLellan, Jennifer Flynn, Mo Huang, Kai Ye, Xian Fan, Ken Chen, Michael C. Wendl, Li Ding; BreakPoint Surveyor: A Pipeline for Structural Variant Visualization. Bioinformatics 2017. doi: 10.1093/bioinformatics/btx362

Online preprint with supplemental information.

Getting Started

See BreakPointSurveyor documentaton and the installation instructions.

Documentation

Architecture

There are three layers of BreakPointSurveyor (BPS) project:

  • BPS Core: core analysis and plotting, typically in R or Python
  • BPS Workflow: Project- and locale-specific workflows. Mostly as BASH scripts
  • BPS Data: BPS-generated secondary data, graphical objects, and plots

The BreakPointSurveyor provides three example workflows and their data. This project (BreakPointSurveyor-Core) provides the Core layer. It is typically distributed as a submodule of the BreakPointSurveyor project and does not need to be installed separately.

Visualization

Multi-panel figures are generated in three steps:

  1. The data processing normalizes data into standard formats. For instance, breakpoint predictions from different SV callers are normalized into a BPC) file format, while read depth and gene annotation are converted to Depth and BED formats, respectively.
  2. Each dataset is rendered as an image panel saved as a binary "GGP" object. Additional layers, for instance predictions from different SV callers, may be added to an existing GGP object in subsequent processing steps (see details).
  3. Finally, multiple GGP objects are assembled, aligned to common axes, and saved to a PDF format to form a composite figure.

BPS Utilities

BPS Core consists of a number of utilities which are used by Workflow scripts to process and visualize data. They are described below, ordered by directory structure.

src/analysis

Utilities for RPKM expression analysis, read depth, Pindel output processing

  • ExonExpressionAnalyzer.R Evaluates relative gene expression based on RPKM data from case and control. Calculate and write to stdout p-value associated with gene expression in vicinity of integration event. Algorithm details.

  • ExonPicker.R Select exons from genes upstream and downstream of integration event and write BED file describing these.

  • Pindel_RP.Reader.R Create Breakpoint Region file (BPR)) based on output of Pindel RP module.

  • RPKM_Joiner.R Process multiple RPKM files and combine column-wise into one data file.

  • TigraCTXMaker.R Create a breakdancer-style CTX file from either Pindel's RP or BPR) data to be used as Tigra-SV input

  • depthFilter.py Read BAM file and evaluate read depth in a segment. Output is subsampled to give data size, optimized for performance.

  • vafFilter.py Parse VAF as output by Pindel

src/annotation

Ad hoc scripts for processing Ensembl gene/exon names and regions.

  • ChromRenamer.py Translate chromosome names in BED file between two standards using a database. Used for normalizing feature names, as discussed here

  • GTFFilter.py Simple script to read GTF file line by line, test if criteria are met, and either print or discard line. Used to extract gene and exon domains.

  • [TLAExamine.R] (src/annotation/TLAExamine.R) Tool for examining GTF and VCF files. Expands a column of key/value pairs into multiple columns, for examining in e.g. spreadsheet.

src/contig

Utilities related to contig creation with Tigra-SV

Contig alignment improves breakpoint predictions by assembling a consensus sequence (contig) from reads spanning a breakpoint, then re-aligning the contig to the human+virus reference. Contigs are created using Tigra-SV.

Read more about contig workflow here.

src/plot

BPS figure rendering and assembly

Each dataset is rendered as an image panel using the ggplot() function and saved as a binary "GGP" object with saveRDS(). GGP objects can be visualized using the ggp2pdf utility. Additional layers, for instance predictions from different SV callers, may be added to an existing GGP object in a subsequent processing step with data from a different BPC (or BPR) file.

  • AnnotationDrawer.R Create gene annotation GGP files with optional exon definitions for each gene.

  • BreakpointDrawer.R Common BreakpointSurveyor plotting utilities.

  • BreakpointSurveyAssembler.R Create or append various features to breakpoint coordinate GGP file. Chrom A coordinates are plotted on X axis, B on Y.

  • DepthDrawer.R Plot read depth (or related quantites) over a genomic region and add annotation to this plot.

  • DepthUtil.R Common read depth utilities.

  • HistogramDrawer.R Create a histogram of read depth (or estimated copy number) for chrom A and B

  • PvalBubblePlotter.R Visualize gene dysregulation in vicinity of integration event

  • ZoomGGP.R Utility to change plot limits of a GGP file and save as PDF

  • ggp2pdf Convert GGP (ggplot binary) file into pdf

src/util

Common and ad hoc scripts

  • BPS_Util.R Common BreakpointSurveyor utilities.

  • PlotListMaker.py Create a Breakpoint Surveyor PlotList file from Breakpoint Coordinate (BPC) or Breakpoint Region (BPR) data

  • PlotListParser.R Given barcode, chrom, and chrom position, return PlotList name which contains this position.

  • makeBreakpointRegions.py Cluster Breakpoints into Breakpoint regions.

  • processVCF.py Read VCF file and write coordinates of features in various formats

Authors

Matthew A. Wyczalkowski, m.wyczalkowski@wustl.edu

License

This software is licensed under the GNU General Public License v3.0

Acknowledgements

This work was supported by the National Cancer Institute [R01CA178383 and R01CA180006 to Li Ding, R01CA172652 to Ken Chen]; and National Human Genome Research Institute [U01HG006517 to Li Ding].

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published