Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Addition of cnv data to oncoprint #182

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 77 additions & 12 deletions analyses/oncoprint-landscape/01-plot-oncoprint.R
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,19 @@ option_list <- list(
type = "character",
default = NULL,
help = "oncoprint output png file name"
),
optparse::make_option(
c("-l", "--low_segmean_cutoff"),
type = "double",
default = 0.5, # Determined using CNVkit documentation (https://cnvkit.readthedocs.io/en/stable/calling.html)
help = "low segmean cutoff to determine amplification and deletion"
),
optparse::make_option(
c("-s", "--high_segmean_cutoff"),
type = "double",
default = 1, # Determined using CNVkit documentation (https://cnvkit.readthedocs.io/en/stable/calling.html)
help = "high segmean cutoff to distinguish hemizygous deletion from
homozygous deletion"
)
)

Expand Down Expand Up @@ -116,11 +129,57 @@ metadata <- metadata %>%
maf_df <- data.table::fread(maf, stringsAsFactors = FALSE)

# Read in cnv file
if (!is.null(opt$cnv_file)) {
cnv_file <- data.table::fread(opt$cnv_file, stringsAsFactors = FALSE)
# TODO: Filter and set up `cnv_file` to be in the column format -
if (!is.null(opt$cnv_file)) {
cnv_file <-
data.table::fread(opt$cnv_file,
data.table = FALSE,
stringsAsFactors = FALSE)
# TODO: Adapt `cnv_file` once the results of the consensus calls are obtained,
# to be in the column format -
# "Hugo_Symbol, Tumor_Sample_Barcode, Variant_Classification" as required by
# the `read.maf function`

# Create a dataframe with `Tumor_Sample_Barcode` and gene information from maf
# dataframe
select_maf_df <- maf_df %>%
dplyr::select(Tumor_Sample_Barcode, Hugo_Symbol)

# Add gene information to the cnv dataframe
cnv_df <- cnv_file %>%
dplyr::inner_join(select_maf_df, by = c("ID" = "Tumor_Sample_Barcode")) %>%
dplyr::distinct()
Comment on lines +144 to +150
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this section correctly, you are joining the gene symbols from the MAF file to the CNV file using the sample identifier. I think what you want to do is add gene symbols to the SEG file using the genome coordinates that are already in the SEG file (a very brief google suggests bedtools might have functionality to accomplish this and GenomicFeatures also comes to mind). This step should probably be upstream of the plotting, say 01-prepare-cn-for-oncoplot.R, where this script (02-plot-oncoprint.R) then accepts the file that already contains gene symbols. You could potentially do the low_segmean_cutoff and high_segmean_cutoff steps in the preparation step as well. Note hg38 was the build used: https://github.com/AlexsLemonade/OpenPBTA-manuscript/blob/master/content/03.methods.md#somatic-copy-number-variant-calling

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbethell - I have some code in which I created the focal copy number for oncoprints from seg file here if you want to take a look. Sorry it is a bit messy (and still has old GISTIC code - did not have time to clean that up yet, but just posted so you can see it). Note, as @jaclyn-taroni said, that these old data were hg19 and the PBTA data are hg38. The seg file for input to this is here and the output is focal copy number lesions used for oncoprints here. Note, this was also done using array data, not NGS, so we may have to figure out thresholds for cutoffs for accurate assessment of focal CN based on LRR (may advise starting with ATRT's and SMARCB1 deletions and we may not be able to do homozygous/hemizygous, but rather just deletions/amplifications - not sure - haven't dug into the data too much). I ended up doing a lot of manual inspection for driver genes, hence the re-coding in the script. Maybe this should be its own issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing that @jharenza


# Define cutoffs for `Variant_Classification` column
low_segmean_cutoff <- opt$low_segmean_cutoff
high_segmean_cutoff <- opt$high_segmean_cutoff

# Create `Variant_Classification` column
cnv_df <- cnv_df %>%
dplyr::mutate(
Variant_Classification = dplyr::case_when(
seg.mean < -low_segmean_cutoff &
seg.mean > -high_segmean_cutoff ~ "Hem_Deletion",
seg.mean < -high_segmean_cutoff ~ "Hom_Deletion",
seg.mean > low_segmean_cutoff ~ "Amplification"
)
)

# Make the `Variant_Classification` column a factor vector
cnv_df$Variant_Classification <- as.factor(cnv_df$Variant_Classification)

# Select the columns specified as input format for `read.maf` function
cnv_df <- cnv_df %>%
dplyr::select(Hugo_Symbol, ID, Variant_Classification)

# Rename columns
colnames(cnv_df) <-
c("Hugo_Symbol",
"Tumor_Sample_Barcode",
"Variant_Classification")

# Remove NA
cnv_file <- cnv_df %>%
dplyr::filter(!is.na(Variant_Classification))
}

# Read in fusion file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're looking for another way to split things up, you could also break the fusion preparation into its own script (that's probably a separate, downstream PR). Imagine a situation where the MAF file changes but the fusion prep and the CN prep do not, then you could just rerun the plotting part without having to manipulate the fusion and CN files again.

Expand All @@ -133,22 +192,22 @@ if (!is.null(opt$fusion_file)) {
#### Incorporate Fusion Data -------------------------------------------------
# TODO: Once the consensus calls of the fusion data are obtained, this section
# will need to be adapted to the format of the fusion input file. For example,
# the way we separate the genes out of `FusionName` may need to be adapted.
# the way we separate the genes out of `FusionName` may need to be adapted.

# Separate fusion gene partners and add variant classification and center
fus_sep <- fusion_file %>%
# Separate the 5' and 3' genes
tidyr::separate(FusionName, c("Gene1", "Gene2"), sep = "--") %>%
tidyr::separate(FusionName, c("Gene1", "Gene2"), sep = "--") %>%
dplyr::select(Sample, Gene1, Gene2)
reformat_fusion <- fus_sep %>%
# Here we want to tally how many times the 5' gene shows up as a fusion hit

reformat_fusion <- fus_sep %>%
# Here we want to tally how many times the 5' gene shows up as a fusion hit
# in a sample
dplyr::group_by(Sample, Gene1) %>%
dplyr::tally() %>%
# If the sample-5' gene pair shows up more than once, call it a multi hit
# If the sample-5' gene pair shows up more than once, call it a multi hit
# fusion
dplyr::mutate(Variant_Classification =
dplyr::mutate(Variant_Classification =
dplyr::if_else(n == 1, "Fusion", "Multi_Hit_Fusion"),
# Required column for joining with MAF
Variant_Type = "OTHER") %>%
Expand All @@ -172,13 +231,19 @@ maf_object <-
"Frame_Shift_Del",
"Frame_Shift_Ins",
"Splice_Site",
"Translation_Start_Site",
"Nonsense_Mutation",
"Nonstop_Mutation",
"In_Frame_Del",
"In_Frame_Ins",
"Missense_Mutation",
"Stop_Codon_Ins",
"Start_Codon_Del",
"Fusion",
"Multi_Hit",
"Hem_Deletion",
"Hom_Deletion",
"Amplification",
"Multi_Hit_Fusion"
)
)
Expand Down Expand Up @@ -240,7 +305,7 @@ oncoplot(
logColBar = TRUE,
sortByAnnotation = TRUE,
showTumorSampleBarcodes = TRUE,
removeNonMutated = FALSE,
removeNonMutated = TRUE,
annotationFontSize = 0.7,
SampleNamefontSize = 0.5,
fontSize = 0.7,
Expand Down
Binary file modified analyses/oncoprint-landscape/plots/maf_oncoprint.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions analyses/oncoprint-landscape/run-oncoprint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@ wget --no-clobber --directory-prefix=driver-lists -O driver-lists/brain-goi-list
wget --no-clobber --directory-prefix=driver-lists -O driver-lists/brain-goi-list-short.txt https://github.com/marislab/create-pptc-pdx-oncoprints/raw/2c9ed2a2331bcef3003d6aa25a130485d76a3535/data/brain-goi-list.txt

Rscript --vanilla 01-plot-oncoprint.R \
--maf_file ../../data/pbta-snv-mutect2.vep.maf.gz \
--maf_file ../../data/pbta-snv-lancet.vep.maf.gz \
--fusion_file ../../scratch/arriba.tsv \
--cnv_file ../../data/pbta-cnv-cnvkit.seg.gz \
--goi_list driver-lists/brain-goi-list-long.txt \
--png_name maf_oncoprint.png
--png_name maf_oncoprint.png \
--low_segmean_cutoff 0.2
2 changes: 1 addition & 1 deletion analyses/oncoprint-landscape/util/oncoplot-palette.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

# Define a color vector for plots
color_palette <- c("Missense_Mutation" = "#35978f",
"Nonsense_Mutation" = "#000000",
"Nonsense_Mutation" = "#002F6C",
"Frame_Shift_Del" = "#56B4E9",
"Frame_Shift_Ins" = "#FFBBFF",
"Splice_Site" = "#F0E442",
Expand Down