MARATHON

Copy number variation is an important and abundant source of variation in the human genome, which has been associated with a number of diseases, especially cancer. Massively parallel next-generation sequencing allows copy number profiling with fine resolution. Such efforts, however, have met with mixed successes, with setbacks arising partly from the lack of reliable analytical methods to meet the diverse and unique challenges arising from the myriad experimental designs and study goals in genetic studies. In cancer genomics, detection of somatic copy number changes and profiling of allele-specific copy number (ASCN) are complicated by experimental biases and artifacts as well as normal cell contamination and cancer subclone admixture. Furthermore, careful statistical modeling is warranted to reconstruct tumor phylogeny by both somatic ASCN changes and single nucleotide variants. Here we describe a flexible computational pipeline, MARATHON (copy nuMber vARiAtion and Tumor pHylOgeNy), which integrates multiple related statistical software for copy number profiling and downstream analyses in disease genetic studies.

Manuscript

Urrutia E, Chen H, Zhou Z, Zhang NR, Jiang Y. Integrative pipeline for profiling DNA copy number and inferring tumor phylogeny. Bioinformatics, 34 (12), 2126-2128, 2018. (link)

Questions & Problems

If you have any questions or problems when using MARATHON, you can: (i) open a new issue here; (ii) post in our Google user group https://groups.google.com/d/forum/marathon_genomics or email us at marathon_genomics@googlegroups.com; (iii) email the maintainers of the corresponding packages -- the contact information is shown under Developers & Maintainers. The first two contact options are preferred and we will try our best to reply as soon as possible.

Installation

Installation Option 1: Docker Image - Good for ease of installation

A docker image is available here. This image is an Rstudio GUI built on rocker/tidyverse with MARATHON as well as all of its dependent packages and datasets pre-installed. Note that this can take a while to download the human reference genome as well as the toy sequencing dataset. Instructions for using Docker can be found here.

docker pull lzeppelini/marathon

Installation Option 2: Install to R/RStudio - Good for performance

Install all packages in the latest version of R.

install.packages(c("falcon", "falconx", "devtools"))
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("WES.1KG.WUGSC", "GenomeInfoDbData", "GenomeInfoDb", "VariantAnnotation"))
devtools::install_github(c("yuchaojiang/CODEX/package", "yuchaojiang/CODEX2/package", "yuchaojiang/Canopy/package", "zhouzilu/iCNV", "yuchaojiang/MARATHON/package"))

Pipeline overview

The possible analysis scenarios are listed in Table 1. Figure 1 gives an outline for the relationship between the software: CODEX and CODEX2 perform read depth normalization for total copy number profiling; read depth normalized by CODEX/CODEX2 is received by iCNV, which combines it with allele-specific read counts and microarray data to detect CNVs; FALCON and FALCON-X perform ASCN analysis; and Canopy receives input from FALCON/FALCON-X to perform tumor phylogeny reconstruction.

Figure 1. A flowchart outlining the procedures for profiling CNV, ASCN, and reconstructing tumor phylogeny. CNVs with common and rare population frequencies can be profiled by CODEX and CODEX2, with and without negative control samples. iCNV integrates sequencing and microarray data for CNV detection. ASCNs can be profiled by FALCON and FALCON-X using allelic read counts at germline heterozygous loci. Canopy infers tumor phylogeny using somatic SNVs and ASCNs.

Table 1. Analysis scenarios and pipeline design. The last column shows the sequence of software that should be used for each analysis scenario. * By “normal” we mean samples that are not derived from tumor tissue, which are not expected to carry chromosome-level copy number changes.

Running MARATHON

R notebook with step-by-step demonstration and rich display is available here. Corresponding Rmd script is available here.

Citation

Please cite MARATHON as well as all the dependent packages that you use.

MARATHON: Urrutia et al. 2018 Bioinformatics
Integrative pipeline for profiling DNA copy number and inferring tumor phylogeny (GitHub)
CODEX: Jiang et al. 2015 Nucleic Acids Research
A Normalization and Copy Number Variation Detection Method for Whole Exome Sequencing (Bioconductor, GitHub)
CODEX2: Jiang et al. 2018 Genome Biology
Full-spectrum copy number variation detection by high-throughput DNA sequencing (GitHub)
iCNV: Zhou et al. 2017 Bioinformatics
Integrated copy number variation detection toolset (GitHub)
FALCON: Chen et al. 2015 Nucleic Acids Research
Finding Allele-Specific Copy Number in Next-Generation Sequencing Data (CRAN)
FALCON-X: Chen et al. 2017 Annals of Applied Statistics
Finding Allele-Specific Copy Number in Whole-Exome Sequencing Data (CRAN)
Canopy: Jiang et al. 2016 PNAS
Accessing Intra-Tumor Heterogeneity and Tracking Longitudinal and Spatial Clonal Evolutionary History by Next-Generation Sequencing (CRAN, GitHub)

Developers & Maintainers

Gene Urrutia (gene dot urrutia at gmail dot com)
Innovation, Hill-Rom Corp.
Yuchao Jiang (yuchaoj at email dot unc dot edu)
Department of Biostatistics & Department of Genetics, UNC-Chapel Hill
Hao Chen (hxchen at ucdavis dot edu)
Department of Statistics, UC Davis
Zilu Zhou (zhouzilu at pennmedicine dot upenn dot edu)
Genomics and Computational Biology Graduate Group, UPenn
Nancy R. Zhang (nzh at wharton dot upenn dot edu)
Department of Statistics, UPenn

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
figure		figure
instruction		instruction
notebook		notebook
package		package
script		script
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARATHON

Manuscript

Questions & Problems

Installation

Installation Option 1: Docker Image - Good for ease of installation

Installation Option 2: Install to R/RStudio - Good for performance

Pipeline overview

Running MARATHON

Citation

Developers & Maintainers

About

Releases

Packages

Contributors 3

Languages

License

yuchaojiang/MARATHON

Folders and files

Latest commit

History

Repository files navigation

MARATHON

Manuscript

Questions & Problems

Installation

Installation Option 1: Docker Image - Good for ease of installation

Installation Option 2: Install to R/RStudio - Good for performance

Pipeline overview

Running MARATHON

Citation

Developers & Maintainers

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages