#790 Part1: adding new SNV subtypes for LGAT #842

kgaonkar6 · 2020-11-16T22:15:30Z

Purpose/implementation Section

LGAT subtyping is being revamped as per issue. I'm diving into this with staggered PRs per alteration as:

What scientific question is your analysis addressing?

As per issue we will be subtyping LGAT based on SNV in the following genes:

LGG, NF1
somatic loss of NF1 via either missense, nonsense mutation
LGG, BRAF V600E
contains BRAF V600E or V599 SNV or non-canonical BRAF alterations such as p.V600ins or p.D594N
LGG, other MAPK
contains KRAS, NRAS, HRAS, MAP2K1, MAP2K2, MAP2K1, ARAF SNV or indel
LGG, RTK
harbors a MET SNV
harbors a KIT SNV or
harbors a PDGFRA SNV
LGG, FGFR
harbors FGFR1 p.N546K, p.K656E, p.N577, or p. K687 hotspot mutations or
LGG, IDH
harbors an IDH R132 mutation
LGG, H3.3
harbors an H3F3A K28M or G35R/V mutation
LGG, H3.1
harbors an HIST1H3B K28M
harbors and HIST1H3C K28M

What was your approach?

I used a list of genes to look for in consensus SNV per subtype ( with additional information about hotspots, canonical mutations ).

The mutation status per subtype is saved as in lgat-subset/LGAT_snv_subset.tsv :

	Tumor_Sample_Barcode	BRAF_V600E_mut	FGFR_mut	IDH_mut	H3F3A_mut	HIST1H3B_mut	HIST1H3C_mut	MAPK_mut	RTK_mut	NF1_mut

What GitHub issue does your pull request address?

#790

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

I wanted to save the alterations as a list since there are different conditions per gene and subtype , let me know if you prefer some other way to organize the genes/mutations lists.
each gene has it's own criteria for selection please refer to the description in Updated analysis: LGAT - add additional subtypes #790 by @jharenza . In addition, non-canonical mutation in kinase domain will also be added to BRAF V600E subtype

Is there anything that you want to discuss further?

Does the SNV need some basic filtering to keep only non-synonymous mutations?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

table

What is your summary of the results?

We have a total of 293 WGS, 45 biospecimen have BRAF mutation, 10 biospecimen have FGFR mutation, 16 biospecimen have MAPK mutation, 19 have RTK mutation, 5 have NF1 mutation and 1 HIST1H3B mutation.

We didn't find any IDH,H3F3A mutation.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

…/OpenPBTA-analysis into lgat_add_subtyping_SNV

jharenza · 2020-11-19T03:08:45Z

@kgaonkar6 thanks for this! To answer your question:

Does the SNV need some basic filtering to keep only non-synonymous mutations?

Good question and point! Yes, let's filter out synonymous and silent mutations based on the paper.

I think that you should keep the notebook for selecting LGAT samples separate from the SNV alteration notebook. I will still comment inline for review.

jharenza

Looks good in general! See my comments inline for a few changes.

analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

analyses/molecular-subtyping-LGAT/input/snvOI_list.json

analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

kgaonkar6 · 2020-11-19T20:55:56Z

@kgaonkar6 thanks for this! To answer your question:

Does the SNV need some basic filtering to keep only non-synonymous mutations?

Good question and point! Yes, let's filter out synonymous and silent mutations based on the paper.

I used Variant_Classification terms for synonymous SNV as per interaction-plots/scripts/02-process_mutations.R, does this look ok? should I also remove Variant_Classification %in% c("Intron","5`Flank") ? There are some Intron/5 flank snvs in that are getting through in the output.

# Variant Classification with Low/Modifier variant consequences 
#  from maftools http://asia.ensembl.org/Help/Glossary?id=535
synonymous <- c(
  "Silent",
  "Start_Codon_Ins",
  "Start_Codon_SNP",
  "Stop_Codon_Del",
  "De_novo_Start_InFrame",
  "De_novo_Start_OutOfFrame"
)

jharenza · 2020-11-19T22:05:55Z

@kgaonkar6 thanks for this! To answer your question:

Does the SNV need some basic filtering to keep only non-synonymous mutations?

Good question and point! Yes, let's filter out synonymous and silent mutations based on the paper.

I used Variant_Classification terms for synonymous SNV as per interaction-plots/scripts/02-process_mutations.R, does this look ok? should I also remove Variant_Classification %in% c("Intron","5`Flank") ? There are some Intron/5 flank snvs in that are getting through in the output.
# Variant Classification with Low/Modifier variant consequences 
#  from maftools http://asia.ensembl.org/Help/Glossary?id=535
synonymous <- c(
  "Silent",
  "Start_Codon_Ins",
  "Start_Codon_SNP",
  "Stop_Codon_Del",
  "De_novo_Start_InFrame",
  "De_novo_Start_OutOfFrame"
)

Hmm. I tried to look at how they defined the SNVs in the paper, but could not find anything about it in the methods. I just emailed the corresponding author, so hopefully we can hear back from her in a reasonable time frame.

To answer your question, I think getting rid of Intron makes sense, but 5' could be promoter and have an effect. Can you check the predicted effects?

kgaonkar6 · 2020-11-19T22:18:26Z

Oh yeah I did check the 1000 pLGG paper but didn't find specific as well. I have here an example where both mut in MAP2K1 in this biospecimen are 5` or Intron the impact is modifier and no predicted imapct from SIFT or polyphen

# A tibble: 2 x 6
  Tumor_Sample_Barcode IMPACT   SIFT  PolyPhen Hugo_Symbol Variant_Classification
  <chr>                <chr>    <chr> <chr>    <chr>       <chr>                 
1 BS_4QFSH7C4          MODIFIER .     .        MAP2K1      5'Flank               
2 BS_4QFSH7C4          MODIFIER .     .        MAP2K1      Intron

jharenza · 2020-12-01T19:34:24Z

Hmm. I tried to look at how they defined the SNVs in the paper, but could not find anything about it in the methods. I just emailed the corresponding author, so hopefully we can hear back from her in a reasonable time frame.

I got this response, but sent a followup asking for more clarification:

Hello Jo Lynne,
I was forwarded the email that you sent Cynthia Hawkins regarding the pLGG Cancer Cell paper.
In terms of SNV pipelines, there wasn't really one used for this project. Our approach was primarily a targeted tier-based analysis where we prioritized known drivers of pLGG via specific assays (eg, ddPCR/IHC for BRAF p.V600E or NanoString for KIAA1549-BRAF fusions) (details are in supplemental figure S3 of the paper). We had a few samples that remained uncharacterized after we'd tested for the most likely alterations that we ran RNAseq on. For those, we ran a combination of fusion callers (FusionMap, Defuse, TopHat, and Ericscript) to identify novel fusions. We also ran Mutect on the RNAseq samples to identify any SNVs which were cross-referenced to COSMIC for functional relevance.
If you'd like any further clarification don't hesitate to ask.
All the best,
--
Scott

- add HIST2H3C to JSON file - update/shorten/space out some comments for clarity - update paste0() to paste()

jharenza

@kgaonkar6 thanks for this!

Re:

I used Variant_Classification terms for synonymous SNV as per interaction-plots/scripts/02-process_mutations.R, does this look ok? should I also remove Variant_Classification %in% c("Intron","5`Flank") ? There are some Intron/5 flank snvs in that are getting through in the output.

and

Oh yeah I did check the 1000 pLGG paper but didn't find specific as well. I have here an example where both mut in MAP2K1 in this biospecimen are 5` or Intron the impact is modifier and no predicted imapct from SIFT or polyphen

Since they are modifier and have no predicted impact, we should remove, so let's add that and update the output files.

I reviewed and made some minor updates here.

We missed HIST2H3C, which is also in the HGAT subtyping, so I added it here. H3F3B, however, is missed in HGAT, so we may have to add that when you work on that module - please make a note!
Updated/shortened/spaced out some comments for readability
Updated paste0() to paste() to shorten
Added an arrange step at the end so we can easily see changes in the future (we have been adding this to subtyping modules as they come through).

Otherwise, it looks good - I wanted to document that we are not adding NF1 germline here, so that still needs to be added at some point.

Thanks!

analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

fix typo

…/OpenPBTA-analysis into lgat_add_subtyping_SNV

kgaonkar6 · 2020-12-04T15:34:51Z

Thanks for the review!

Code update from the last time you reviewed satisfies the following comments:

Since they are modifier and have no predicted impact, we should remove, so let's add that and update the output files.

And I re-ran the module to create the new output files from your commits:

We missed HIST2H3C, which is also in the HGAT subtyping, so I added it here. H3F3B, however, is missed in HGAT, so we may have to add that when you work on that module - please make a note!

Updated/shortened/spaced out some comments for readability

Updated paste0() to paste() to shorten

Added an arrange step at the end so we can easily see changes in the future (we have been adding this to subtyping modules as they come through).

NF1 gerrmline will come in 04 script that compiles the LGAT subtyping from SNV/CNV and fusion. Thanks!

jharenza

One more small change, which I think I saw you catch as well (and my bad), and we are good to go. I think the mutation selection here makes sense and looks good.

I also want to make a note that there are some samples which have mutations in multiple groups, so we will have to take this into account with subtypes later.

Nice job!

analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

kgaonkar6 · 2020-12-09T15:00:05Z

@jharenza @jaclyn-taroni the analysis is now ready for re-review.

I re-ran the script with the update of typo for HIST2H3C on 193 but no changes in results since the canonical mutation is K28M for both HIST2H3C and HIST1H3C.

jaclyn-taroni

Looks good!

jaclyn-taroni · 2020-12-21T20:10:52Z

analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

+```{r}
+# Filter consensus mutation files for LGAT subset
+consensusMutationSubset <- consensusMutation %>%
+  # find lgat samples
+  dplyr::filter(Tumor_Sample_Barcode %in% lgat_dna_df$Kids_First_Biospecimen_ID) %>%
+  # select tumor sample barcode, gene, short protein annotation, domains, and variant classification
+  dplyr::select(Tumor_Sample_Barcode,
+                Hugo_Symbol,
+                HGVSp_Short,
+                DOMAINS,
+                Variant_Classification,
+                IMPACT,
+                SIFT,
+                PolyPhen) %>%
+  dplyr::filter(
+    # get BRAF mutation status
+    # canonical mutations V600E
+    HGVSp_Short %in% snvOI$BRAF_V600E$canonical[!is.na(snvOI$BRAF_V600E$canonical)] &
+      Hugo_Symbol=="BRAF" | # OR
+      # hotspot mutations in p.600 and p.599
+      grepl(BRAF_hotspot,HGVSp_Short) &
+      Hugo_Symbol=="BRAF" | # OR
+      # and kinase domain mutation for non-canonical mutation 
+      # Family: PK_Tyr_Ser-Thr https://pfam.xfam.org/family/PF07714
+      grepl("PF07714",DOMAINS) & 
+      Hugo_Symbol=="BRAF" | # OR
+
+      # get NF1 mutation status
+      Hugo_Symbol %in% snvOI$NF1$gene & 
+      Variant_Classification %in% c("Missense_Mutation","Nonsense_Mutation") |
+
+      # get other MAPK mutation status
+      # all mutations in MAPK genes
+      Hugo_Symbol %in% snvOI$MAPK$gene | # OR
+
+      # get RTK mutation status
+      # all mutations in RTK genes
+      Hugo_Symbol %in% snvOI$RTK$gene | # OR
+
+      # get FGFR mutation status
+      # canonical mutations
+      HGVSp_Short %in% snvOI$FGFR$canonical[!is.na(snvOI$FGFR$canonical)] &
+      Hugo_Symbol=="FGFR1" | # OR
+      # hotspot mutations 
+      grepl(FGFR_hotspot,HGVSp_Short) &
+      Hugo_Symbol=="FGFR1" | # OR
+
+      # get IDH mutation status
+      # hostspot mutations
+      grepl(IDH_hotspot,HGVSp_Short) & 
+      Hugo_Symbol %in% snvOI$IDH$gene | # OR
+
+      # get histone mutation status
+      # H3F3A canonical mutations
+      HGVSp_Short %in% snvOI$H3F3A$canonical & Hugo_Symbol %in% "H3F3A" | # OR
+      # H3F3B canonical mutations
+      HGVSp_Short %in% snvOI$H3F3B$canonical & Hugo_Symbol %in% "H3F3B" | # OR
+      # HIST1H3B canonical mutations
+      HGVSp_Short %in% snvOI$HIST1H3B$canonical & Hugo_Symbol %in% "HIST1H3B" | # OR
+      # HIST1H3C canonical mutations
+      HGVSp_Short %in% snvOI$HIST1H3C$canonical & Hugo_Symbol %in% "HIST1H3C" | # OR
+      # HIST2H3C canonical mutations     
+      HGVSp_Short %in% snvOI$HIST2H3C$canonical & Hugo_Symbol %in% "HIST2H3C" 
+  ) 
+
+consensusMutationSubset


It seems like this is implemented correctly. My personal preference would have been to create a data frame for each of these steps that you then bind all the rows together for potential ease of debugging but that is a personal preference!

Yes that would have definitely been good for debugging I guess I liked the way this could be read like a plan (thanks to tidyverse magic :D ).

I'm also tempted to use MultiAssayExperiment next time we need to do something similar with multiple genes. Thoughts?

…g_SNV

kgaonkar6 added 5 commits November 16, 2020 16:18

adding new SNV subtypes for LGAT

d37abea

adding additional subtypes

dc6e743

re-doing subtyping so removing the last step for now

eb65034

updating run_subtyping.sh

271958a

removing subtyping list

2281e95

kgaonkar6 mentioned this pull request Nov 16, 2020

#790 Part2: adding Fusion based subtyping for LGAT #843

Closed

8 tasks

kgaonkar6 and others added 2 commits November 17, 2020 12:01

update README

cd1e4d2

Merge branch 'master' into lgat_add_subtyping_SNV

2f30d89

jharenza assigned jharenza and unassigned jharenza Nov 17, 2020

jharenza self-requested a review November 17, 2020 18:08

kgaonkar6 changed the title ~~adding new SNV subtypes for LGAT~~ #790 Part1: adding new SNV subtypes for LGAT Nov 18, 2020

kgaonkar6 mentioned this pull request Nov 18, 2020

#790 Part3: adding CNV based subtyping for LGAT #845

Closed

8 tasks

kgaonkar6 and others added 4 commits November 18, 2020 19:19

changed to nb

32e05f7

Merge branch 'lgat_add_subtyping_SNV' of https://github.com/kgaonkar6…

93a2e44

…/OpenPBTA-analysis into lgat_add_subtyping_SNV

remove Rscript

05bf366

Update README.md

408166d

jharenza suggested changes Nov 19, 2020

View reviewed changes

kgaonkar6 and others added 6 commits November 19, 2020 09:21

Update analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

4b972e4

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

Update analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

906bfb8

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

Update analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

5fca7ad

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

adding comments remove silent snv

38bab2f

adding sample_id

e12a603

adding H3F3B

dee0179

updates to files

27f270c

Merge branch 'master' into lgat_add_subtyping_SNV

dfad4ab

This was referenced Nov 20, 2020

#790 Part2: adding Fusion based subtyping for LGAT #847

Merged

#790 Part3: adding CNV based subtyping for LGAT #848

Merged

add HIST2H3C; update comments

694e8aa

- add HIST2H3C to JSON file - update/shorten/space out some comments for clarity - update paste0() to paste()

jharenza self-requested a review December 2, 2020 22:26

jharenza suggested changes Dec 2, 2020

View reviewed changes

kgaonkar6 commented Dec 3, 2020

View reviewed changes

analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd Outdated Show resolved Hide resolved

edits to Rmd in bash, H2.3

2c19d91

jharenza reviewed Dec 4, 2020

View reviewed changes

analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd Outdated Show resolved Hide resolved

jharenza and others added 3 commits December 3, 2020 21:30

Update analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

1b660ff

fix typo

Merge branch 'lgat_add_subtyping_SNV' of https://github.com/kgaonkar6…

5838dae

…/OpenPBTA-analysis into lgat_add_subtyping_SNV

remove modifider/low

58373db

kgaonkar6 requested a review from jharenza December 4, 2020 15:31

jharenza approved these changes Dec 4, 2020

View reviewed changes

analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd Outdated Show resolved Hide resolved

Update analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

da46d4e

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

jaclyn-taroni self-requested a review December 8, 2020 01:50

re-run with HIST2H3C

a4b57a0

jaclyn-taroni added the review after release label Dec 13, 2020

jaclyn-taroni removed their request for review December 13, 2020 20:45

jaclyn-taroni approved these changes Dec 21, 2020

View reviewed changes

jharenza mentioned this pull request Dec 22, 2020

Find breaking changes from v18 (part 2) #876

Merged

6 tasks

jaclyn-taroni added 2 commits January 9, 2021 16:50

Merge remote-tracking branch 'upstream/master' into lgat_add_subtypin…

86fd675

…g_SNV

Merge branch 'master' into lgat_add_subtyping_SNV

4898454

jaclyn-taroni merged commit 202bb59 into AlexsLemonade:master Jan 10, 2021

jaclyn-taroni mentioned this pull request Jan 12, 2021

Planned release: V19 #867

Closed

21 tasks

kgaonkar6 deleted the lgat_add_subtyping_SNV branch January 22, 2021 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#790 Part1: adding new SNV subtypes for LGAT #842

#790 Part1: adding new SNV subtypes for LGAT #842

kgaonkar6 commented Nov 16, 2020 •

edited

Loading

jharenza commented Nov 19, 2020 •

edited

Loading

jharenza left a comment

kgaonkar6 commented Nov 19, 2020 •

edited

Loading

jharenza commented Nov 19, 2020

kgaonkar6 commented Nov 19, 2020

jharenza commented Dec 1, 2020

jharenza left a comment •

edited

Loading

kgaonkar6 commented Dec 4, 2020

jharenza left a comment •

edited

Loading

kgaonkar6 commented Dec 9, 2020

jaclyn-taroni left a comment

jaclyn-taroni Dec 21, 2020

kgaonkar6 Dec 22, 2020 •

edited

Loading

#790 Part1: adding new SNV subtypes for LGAT #842

#790 Part1: adding new SNV subtypes for LGAT #842

Conversation

kgaonkar6 commented Nov 16, 2020 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jharenza commented Nov 19, 2020 • edited Loading

jharenza left a comment

Choose a reason for hiding this comment

kgaonkar6 commented Nov 19, 2020 • edited Loading

jharenza commented Nov 19, 2020

kgaonkar6 commented Nov 19, 2020

jharenza commented Dec 1, 2020

jharenza left a comment • edited Loading

Choose a reason for hiding this comment

kgaonkar6 commented Dec 4, 2020

jharenza left a comment • edited Loading

Choose a reason for hiding this comment

kgaonkar6 commented Dec 9, 2020

jaclyn-taroni left a comment

Choose a reason for hiding this comment

jaclyn-taroni Dec 21, 2020

Choose a reason for hiding this comment

kgaonkar6 Dec 22, 2020 • edited Loading

Choose a reason for hiding this comment

kgaonkar6 commented Nov 16, 2020 •

edited

Loading

jharenza commented Nov 19, 2020 •

edited

Loading

kgaonkar6 commented Nov 19, 2020 •

edited

Loading

jharenza left a comment •

edited

Loading

jharenza left a comment •

edited

Loading

kgaonkar6 Dec 22, 2020 •

edited

Loading