This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 67
Addition of cnv data to oncoprint #182
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -89,6 +89,19 @@ option_list <- list( | |
type = "character", | ||
default = NULL, | ||
help = "oncoprint output png file name" | ||
), | ||
optparse::make_option( | ||
c("-l", "--low_segmean_cutoff"), | ||
type = "double", | ||
default = 0.5, # Determined using CNVkit documentation (https://cnvkit.readthedocs.io/en/stable/calling.html) | ||
help = "low segmean cutoff to determine amplification and deletion" | ||
), | ||
optparse::make_option( | ||
c("-s", "--high_segmean_cutoff"), | ||
type = "double", | ||
default = 1, # Determined using CNVkit documentation (https://cnvkit.readthedocs.io/en/stable/calling.html) | ||
help = "high segmean cutoff to distinguish hemizygous deletion from | ||
homozygous deletion" | ||
) | ||
) | ||
|
||
|
@@ -116,11 +129,57 @@ metadata <- metadata %>% | |
maf_df <- data.table::fread(maf, stringsAsFactors = FALSE) | ||
|
||
# Read in cnv file | ||
if (!is.null(opt$cnv_file)) { | ||
cnv_file <- data.table::fread(opt$cnv_file, stringsAsFactors = FALSE) | ||
# TODO: Filter and set up `cnv_file` to be in the column format - | ||
if (!is.null(opt$cnv_file)) { | ||
cnv_file <- | ||
data.table::fread(opt$cnv_file, | ||
data.table = FALSE, | ||
stringsAsFactors = FALSE) | ||
# TODO: Adapt `cnv_file` once the results of the consensus calls are obtained, | ||
# to be in the column format - | ||
# "Hugo_Symbol, Tumor_Sample_Barcode, Variant_Classification" as required by | ||
# the `read.maf function` | ||
|
||
# Create a dataframe with `Tumor_Sample_Barcode` and gene information from maf | ||
# dataframe | ||
select_maf_df <- maf_df %>% | ||
dplyr::select(Tumor_Sample_Barcode, Hugo_Symbol) | ||
|
||
# Add gene information to the cnv dataframe | ||
cnv_df <- cnv_file %>% | ||
dplyr::inner_join(select_maf_df, by = c("ID" = "Tumor_Sample_Barcode")) %>% | ||
dplyr::distinct() | ||
|
||
# Define cutoffs for `Variant_Classification` column | ||
low_segmean_cutoff <- opt$low_segmean_cutoff | ||
high_segmean_cutoff <- opt$high_segmean_cutoff | ||
|
||
# Create `Variant_Classification` column | ||
cnv_df <- cnv_df %>% | ||
dplyr::mutate( | ||
Variant_Classification = dplyr::case_when( | ||
seg.mean < -low_segmean_cutoff & | ||
seg.mean > -high_segmean_cutoff ~ "Hem_Deletion", | ||
seg.mean < -high_segmean_cutoff ~ "Hom_Deletion", | ||
seg.mean > low_segmean_cutoff ~ "Amplification" | ||
) | ||
) | ||
|
||
# Make the `Variant_Classification` column a factor vector | ||
cnv_df$Variant_Classification <- as.factor(cnv_df$Variant_Classification) | ||
|
||
# Select the columns specified as input format for `read.maf` function | ||
cnv_df <- cnv_df %>% | ||
dplyr::select(Hugo_Symbol, ID, Variant_Classification) | ||
|
||
# Rename columns | ||
colnames(cnv_df) <- | ||
c("Hugo_Symbol", | ||
"Tumor_Sample_Barcode", | ||
"Variant_Classification") | ||
|
||
# Remove NA | ||
cnv_file <- cnv_df %>% | ||
dplyr::filter(!is.na(Variant_Classification)) | ||
} | ||
|
||
# Read in fusion file | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you're looking for another way to split things up, you could also break the fusion preparation into its own script (that's probably a separate, downstream PR). Imagine a situation where the MAF file changes but the fusion prep and the CN prep do not, then you could just rerun the plotting part without having to manipulate the fusion and CN files again. |
||
|
@@ -133,22 +192,22 @@ if (!is.null(opt$fusion_file)) { | |
#### Incorporate Fusion Data ------------------------------------------------- | ||
# TODO: Once the consensus calls of the fusion data are obtained, this section | ||
# will need to be adapted to the format of the fusion input file. For example, | ||
# the way we separate the genes out of `FusionName` may need to be adapted. | ||
# the way we separate the genes out of `FusionName` may need to be adapted. | ||
|
||
# Separate fusion gene partners and add variant classification and center | ||
fus_sep <- fusion_file %>% | ||
# Separate the 5' and 3' genes | ||
tidyr::separate(FusionName, c("Gene1", "Gene2"), sep = "--") %>% | ||
tidyr::separate(FusionName, c("Gene1", "Gene2"), sep = "--") %>% | ||
dplyr::select(Sample, Gene1, Gene2) | ||
reformat_fusion <- fus_sep %>% | ||
# Here we want to tally how many times the 5' gene shows up as a fusion hit | ||
|
||
reformat_fusion <- fus_sep %>% | ||
# Here we want to tally how many times the 5' gene shows up as a fusion hit | ||
# in a sample | ||
dplyr::group_by(Sample, Gene1) %>% | ||
dplyr::tally() %>% | ||
# If the sample-5' gene pair shows up more than once, call it a multi hit | ||
# If the sample-5' gene pair shows up more than once, call it a multi hit | ||
# fusion | ||
dplyr::mutate(Variant_Classification = | ||
dplyr::mutate(Variant_Classification = | ||
dplyr::if_else(n == 1, "Fusion", "Multi_Hit_Fusion"), | ||
# Required column for joining with MAF | ||
Variant_Type = "OTHER") %>% | ||
|
@@ -172,13 +231,19 @@ maf_object <- | |
"Frame_Shift_Del", | ||
"Frame_Shift_Ins", | ||
"Splice_Site", | ||
"Translation_Start_Site", | ||
"Nonsense_Mutation", | ||
"Nonstop_Mutation", | ||
"In_Frame_Del", | ||
"In_Frame_Ins", | ||
"Missense_Mutation", | ||
"Stop_Codon_Ins", | ||
"Start_Codon_Del", | ||
"Fusion", | ||
"Multi_Hit", | ||
"Hem_Deletion", | ||
"Hom_Deletion", | ||
"Amplification", | ||
"Multi_Hit_Fusion" | ||
) | ||
) | ||
|
@@ -240,7 +305,7 @@ oncoplot( | |
logColBar = TRUE, | ||
sortByAnnotation = TRUE, | ||
showTumorSampleBarcodes = TRUE, | ||
removeNonMutated = FALSE, | ||
removeNonMutated = TRUE, | ||
annotationFontSize = 0.7, | ||
SampleNamefontSize = 0.5, | ||
fontSize = 0.7, | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this section correctly, you are joining the gene symbols from the MAF file to the CNV file using the sample identifier. I think what you want to do is add gene symbols to the SEG file using the genome coordinates that are already in the SEG file (a very brief google suggests bedtools might have functionality to accomplish this and
GenomicFeatures
also comes to mind). This step should probably be upstream of the plotting, say01-prepare-cn-for-oncoplot.R
, where this script (02-plot-oncoprint.R
) then accepts the file that already contains gene symbols. You could potentially do thelow_segmean_cutoff
andhigh_segmean_cutoff
steps in the preparation step as well. Notehg38
was the build used: https://github.com/AlexsLemonade/OpenPBTA-manuscript/blob/master/content/03.methods.md#somatic-copy-number-variant-callingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbethell - I have some code in which I created the focal copy number for oncoprints from seg file here if you want to take a look. Sorry it is a bit messy (and still has old GISTIC code - did not have time to clean that up yet, but just posted so you can see it). Note, as @jaclyn-taroni said, that these old data were hg19 and the PBTA data are hg38. The seg file for input to this is here and the output is focal copy number lesions used for oncoprints here. Note, this was also done using array data, not NGS, so we may have to figure out thresholds for cutoffs for accurate assessment of focal CN based on LRR (may advise starting with ATRT's and SMARCB1 deletions and we may not be able to do homozygous/hemizygous, but rather just deletions/amplifications - not sure - haven't dug into the data too much). I ended up doing a lot of manual inspection for driver genes, hence the re-coding in the script. Maybe this should be its own issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for sharing that @jharenza