Run Novobreak to detect interchromosomal breakpoints
novoBreak is a structural variant detection tool which outputs VCF files indicating locations of interchromosomal translocations, among other events. We apply it here to detect virus integration in the TCGA-BA-4077-01 dataset.
novoBreak requires both a tumor and normal data file. The TCGA-BA-4077-01 is a tumor sample, but the corresponding normal is not
readily available. Instead, we create a "synthetic normal" consisting of a small number of reads generated by wgsim
(distributed with samtools)
obtained from the human+virus reference. This BAM file is generated in steps 1_make_synthetic_normal.sh
and 2_align_synthetic_reads.sh
.
For performance reasons (speed as well as memory requirements), we focus novoBreak analysis on regions of interest detected by prior analyses,
the integration of HPV16 virus into chromsome 14. Steps 3_make_ROI_BAM.sh
and 4_merge_ROI_BAM.sh
create a "reduced" BAM file with reads
only from the following region of interest,
14:68633616-68791484
gi|310698439|ref|NC_001526.2|:1-7905 (HPV16)
The result of the novoBreak analysis is the VCF file dat/novobreak/TCGA-BA-4077-01B-01D-2268-08/novoBreak.pass.flt.vcf
Binary: https://sourceforge.net/projects/novobreak/
Source: git clone https://git.code.sf.net/p/novobreak/git novobreak-git
Reference:
Define location of installation directory NOVOBREAK_DIR
in ../bps.config
or ../bps.config.local
as,
NOVOBREAK_DIR="/gscuser/mwyczalk/src/novoBreak_distribution_v1.1.3rc"