-
Notifications
You must be signed in to change notification settings - Fork 67
#790 Part1: adding new SNV subtypes for LGAT #842
#790 Part1: adding new SNV subtypes for LGAT #842
Conversation
@kgaonkar6 thanks for this! To answer your question:
Good question and point! Yes, let's filter out synonymous and silent mutations based on the paper. I think that you should keep the notebook for selecting LGAT samples separate from the SNV alteration notebook. I will still comment inline for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general! See my comments inline for a few changes.
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
I used Variant_Classification terms for synonymous SNV as per interaction-plots/scripts/02-process_mutations.R, does this look ok? should I also remove Variant_Classification %in% c("Intron","5`Flank") ? There are some Intron/5 flank snvs in that are getting through in the output. # Variant Classification with Low/Modifier variant consequences
# from maftools http://asia.ensembl.org/Help/Glossary?id=535
synonymous <- c(
"Silent",
"Start_Codon_Ins",
"Start_Codon_SNP",
"Stop_Codon_Del",
"De_novo_Start_InFrame",
"De_novo_Start_OutOfFrame"
) |
Hmm. I tried to look at how they defined the SNVs in the paper, but could not find anything about it in the methods. I just emailed the corresponding author, so hopefully we can hear back from her in a reasonable time frame. To answer your question, I think getting rid of Intron makes sense, but 5' could be promoter and have an effect. Can you check the predicted effects? |
Oh yeah I did check the 1000 pLGG paper but didn't find specific as well. I have here an example where both mut in MAP2K1 in this biospecimen are 5` or Intron the impact is modifier and no predicted imapct from SIFT or polyphen # A tibble: 2 x 6
Tumor_Sample_Barcode IMPACT SIFT PolyPhen Hugo_Symbol Variant_Classification
<chr> <chr> <chr> <chr> <chr> <chr>
1 BS_4QFSH7C4 MODIFIER . . MAP2K1 5'Flank
2 BS_4QFSH7C4 MODIFIER . . MAP2K1 Intron
|
I got this response, but sent a followup asking for more clarification:
|
- add HIST2H3C to JSON file - update/shorten/space out some comments for clarity - update paste0() to paste()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgaonkar6 thanks for this!
Re:
I used Variant_Classification terms for synonymous SNV as per interaction-plots/scripts/02-process_mutations.R, does this look ok? should I also remove Variant_Classification %in% c("Intron","5`Flank") ? There are some Intron/5 flank snvs in that are getting through in the output.
and
Oh yeah I did check the 1000 pLGG paper but didn't find specific as well. I have here an example where both mut in MAP2K1 in this biospecimen are 5` or Intron the impact is modifier and no predicted imapct from SIFT or polyphen
Since they are modifier and have no predicted impact, we should remove, so let's add that and update the output files.
I reviewed and made some minor updates here.
- We missed
HIST2H3C
, which is also in the HGAT subtyping, so I added it here.H3F3B
, however, is missed in HGAT, so we may have to add that when you work on that module - please make a note! - Updated/shortened/spaced out some comments for readability
- Updated paste0() to paste() to shorten
- Added an arrange step at the end so we can easily see changes in the future (we have been adding this to subtyping modules as they come through).
Otherwise, it looks good - I wanted to document that we are not adding NF1 germline here, so that still needs to be added at some point.
Thanks!
Thanks for the review! Code update from the last time you reviewed satisfies the following comments:
And I re-ran the module to create the new output files from your commits:
NF1 gerrmline will come in 04 script that compiles the LGAT subtyping from SNV/CNV and fusion. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more small change, which I think I saw you catch as well (and my bad), and we are good to go. I think the mutation selection here makes sense and looks good.
I also want to make a note that there are some samples which have mutations in multiple groups, so we will have to take this into account with subtypes later.
Nice job!
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
@jharenza @jaclyn-taroni the analysis is now ready for re-review. I re-ran the script with the update of typo for HIST2H3C on 193 but no changes in results since the canonical mutation is K28M for both HIST2H3C and HIST1H3C. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
```{r} | ||
# Filter consensus mutation files for LGAT subset | ||
consensusMutationSubset <- consensusMutation %>% | ||
# find lgat samples | ||
dplyr::filter(Tumor_Sample_Barcode %in% lgat_dna_df$Kids_First_Biospecimen_ID) %>% | ||
# select tumor sample barcode, gene, short protein annotation, domains, and variant classification | ||
dplyr::select(Tumor_Sample_Barcode, | ||
Hugo_Symbol, | ||
HGVSp_Short, | ||
DOMAINS, | ||
Variant_Classification, | ||
IMPACT, | ||
SIFT, | ||
PolyPhen) %>% | ||
dplyr::filter( | ||
# get BRAF mutation status | ||
# canonical mutations V600E | ||
HGVSp_Short %in% snvOI$BRAF_V600E$canonical[!is.na(snvOI$BRAF_V600E$canonical)] & | ||
Hugo_Symbol=="BRAF" | # OR | ||
# hotspot mutations in p.600 and p.599 | ||
grepl(BRAF_hotspot,HGVSp_Short) & | ||
Hugo_Symbol=="BRAF" | # OR | ||
# and kinase domain mutation for non-canonical mutation | ||
# Family: PK_Tyr_Ser-Thr https://pfam.xfam.org/family/PF07714 | ||
grepl("PF07714",DOMAINS) & | ||
Hugo_Symbol=="BRAF" | # OR | ||
|
||
# get NF1 mutation status | ||
Hugo_Symbol %in% snvOI$NF1$gene & | ||
Variant_Classification %in% c("Missense_Mutation","Nonsense_Mutation") | | ||
|
||
# get other MAPK mutation status | ||
# all mutations in MAPK genes | ||
Hugo_Symbol %in% snvOI$MAPK$gene | # OR | ||
|
||
# get RTK mutation status | ||
# all mutations in RTK genes | ||
Hugo_Symbol %in% snvOI$RTK$gene | # OR | ||
|
||
# get FGFR mutation status | ||
# canonical mutations | ||
HGVSp_Short %in% snvOI$FGFR$canonical[!is.na(snvOI$FGFR$canonical)] & | ||
Hugo_Symbol=="FGFR1" | # OR | ||
# hotspot mutations | ||
grepl(FGFR_hotspot,HGVSp_Short) & | ||
Hugo_Symbol=="FGFR1" | # OR | ||
|
||
# get IDH mutation status | ||
# hostspot mutations | ||
grepl(IDH_hotspot,HGVSp_Short) & | ||
Hugo_Symbol %in% snvOI$IDH$gene | # OR | ||
|
||
# get histone mutation status | ||
# H3F3A canonical mutations | ||
HGVSp_Short %in% snvOI$H3F3A$canonical & Hugo_Symbol %in% "H3F3A" | # OR | ||
# H3F3B canonical mutations | ||
HGVSp_Short %in% snvOI$H3F3B$canonical & Hugo_Symbol %in% "H3F3B" | # OR | ||
# HIST1H3B canonical mutations | ||
HGVSp_Short %in% snvOI$HIST1H3B$canonical & Hugo_Symbol %in% "HIST1H3B" | # OR | ||
# HIST1H3C canonical mutations | ||
HGVSp_Short %in% snvOI$HIST1H3C$canonical & Hugo_Symbol %in% "HIST1H3C" | # OR | ||
# HIST2H3C canonical mutations | ||
HGVSp_Short %in% snvOI$HIST2H3C$canonical & Hugo_Symbol %in% "HIST2H3C" | ||
) | ||
|
||
consensusMutationSubset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this is implemented correctly. My personal preference would have been to create a data frame for each of these steps that you then bind all the rows together for potential ease of debugging but that is a personal preference!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would have definitely been good for debugging I guess I liked the way this could be read like a plan (thanks to tidyverse magic :D ).
I'm also tempted to use MultiAssayExperiment next time we need to do something similar with multiple genes. Thoughts?
Purpose/implementation Section
LGAT subtyping is being revamped as per issue. I'm diving into this with staggered PRs per alteration as:
SNV #790 Part1: adding new SNV subtypes for LGAT #842
Fusion #790 Part2: adding Fusion based subtyping for LGAT #847
CNV #790 Part3: adding CNV based subtyping for LGAT #848
What scientific question is your analysis addressing?
As per issue we will be subtyping LGAT based on SNV in the following genes:
LGG, NF1
somatic loss of NF1 via either missense, nonsense mutation
LGG, BRAF V600E
contains BRAF V600E or V599 SNV or non-canonical BRAF alterations such as p.V600ins or p.D594N
LGG, other MAPK
contains KRAS, NRAS, HRAS, MAP2K1, MAP2K2, MAP2K1, ARAF SNV or indel
LGG, RTK
harbors a MET SNV
harbors a KIT SNV or
harbors a PDGFRA SNV
LGG, FGFR
harbors FGFR1 p.N546K, p.K656E, p.N577, or p. K687 hotspot mutations or
LGG, IDH
harbors an IDH R132 mutation
LGG, H3.3
harbors an H3F3A K28M or G35R/V mutation
LGG, H3.1
harbors an HIST1H3B K28M
harbors and HIST1H3C K28M
What was your approach?
I used a list of genes to look for in consensus SNV per subtype ( with additional information about hotspots, canonical mutations ).
The mutation status per subtype is saved as in lgat-subset/LGAT_snv_subset.tsv :
What GitHub issue does your pull request address?
#790
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Is there anything that you want to discuss further?
Does the SNV need some basic filtering to keep only non-synonymous mutations?
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes
Results
What types of results are included (e.g., table, figure)?
table
What is your summary of the results?
We have a total of 293 WGS, 45 biospecimen have BRAF mutation, 10 biospecimen have FGFR mutation, 16 biospecimen have MAPK mutation, 19 have RTK mutation, 5 have NF1 mutation and 1 HIST1H3B mutation.
We didn't find any IDH,H3F3A mutation.
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.