The R code presented here contains functions to time whole genome duplications in real time (scale with age of patients) and molecular time, using all mutations and (C>T)pG. It has a few dependencies and required inputs:
- copy number calls
- SNV driver calls
- SNV calls +indel calls
- libraries: GenomicRanges, DPClust and dpclust3p (https://github.com/Wedge-Oxford/dpclust_smchet_docker), Rsamtools, Biostrings, BSgenome
- genome reference
- clinical annotations
The R code contains functions for plotting and timing, as well as a pipeline which does the following:
- load libraries, genome driver genes
- read in copy number profiles and snv calls
- infer whole genome duplication status (mode of the major allele)
- infer multiplicities of mutations (indels and snvs) using dpclust3p functions
- subset mutations by context (keeps (C>T)pG)
- plot counts of all mutations vs. (C>T)pG vs. age of the patients for real time calibration
- time driver mutations relative to WGD
- plot ploidy vs. fraction LOH
- time whole genome duplications
- plot timing of WGD and driver relative to WGD (using all mutations and only (C>T)pG)
- scale molecular time to real time using patient age
- plot real time timing
- save session and summary data frame