#790 Part2: adding Fusion based subtyping for LGAT #847

kgaonkar6 · 2020-11-20T17:15:35Z

⚠️ #842 needs to be merged before this PR, 02-subset-fusion-files-LGAT.R and output can be reviewed independently though

Purpose/implementation Section

LGAT subtyping is being revamped as per issue. I'm diving into this with staggered PRs per alteration as:

What scientific question is your analysis addressing?

As per issue we will be subtyping LGAT based on fusion in the following genes:

LGG, KIAA1549-BRAF
contains KIAA1549-BRAF fusion
LGG, other MAPK
contains non-canonical BRAF fusion other than KIAA1549-BRAF
contains RAF1 fusion
LGG, RTK
harbors a fusion in ALK, ROS1, NTRK1, NTRK2, or NTRK3 or
harbors a PDGFRA fusion
LGG, FGFR
harbors FGFR1-TACC1 fusion
harbors FGFR1 or FGFR2 fusions
LGG, MYB/MYBL1
harbors either a MYB-QKI fusion or other MYB or MYBL1 fusion

What was your approach?

I used a list of genes to look for in presence of fusion or fused genes per subtype.
I'm using file from lgat fusion-summary , add fusion summary for LGAT #830 checks for kinase overlap and reciprocal fusion for LGAT biospecimen fusions.

The fusion status per subtype is saved as lgat-subset/LGAT_fusion_subset.tsv

	Kids_First_Biospecimen_ID	KIAA_BRAF_fus	MAPK_fus	RTK_fus	FGFR_fus	MYB_fus

What GitHub issue does your pull request address?

#790

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

We will have to review/push through #842 since SNV subtyping was done in the previous PR.

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

lgat-subset/LGAT_fusion_subset.tsv

What is your summary of the results?

122 biospecimens have canonical KIAA1549--BRAF, 12 biospecimens have fusion in other MAPK genes, 14 biospecimen have fusions in RTK genes, 3 biospecimens have MYB fusion.
There are no FGFR1--TACC1 or FGFR fusion in the lgat biospecimen list.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

…/OpenPBTA-analysis into lgat_add_subtyping_SNV

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

jharenza

Looks good, please add the ticket mentioned!

jharenza · 2020-12-03T18:36:01Z

analyses/molecular-subtyping-LGAT/02-subset-fusion-files-LGAT.Rmd

+# get putative oncogene fusion list
+putativeFusion <- readr::read_tsv(file.path(root_dir,
+                                            "analyses",
+                                            "fusion-summary",
+                                            "results",
+                                            "fusion_summary_lgat_foi.tsv")) %>%


This will have to change when #849 goes in, to use the file from the data release. Will you create a ticket for this later change?

We would want to use the relative path to fusion_summary_lgat_foi.tsv because those modules will be run with base histology which will then be most updated files to be used in molecular-subtyping-LGAT, right?

Did you come to an agreement?

We agreed to use relative paths in subtyping modules to get the most updated files from fusion-summary, right @jharenza?

yes, relative

analyses/molecular-subtyping-LGAT/02-subset-fusion-files-LGAT.Rmd

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

…ping_Fus

jaclyn-taroni · 2021-01-10T15:04:19Z

analyses/molecular-subtyping-LGAT/run_subtyping.sh

@@ -11,5 +11,6 @@ SUBSET=${OPENPBTA_SUBSET:-1}

 if [ "$SUBSET" -gt "0" ]; then
  Rscript -e "rmarkdown::render('01-subset-files-for-LGAT.Rmd')"
+  Rscript -e "rmarkdown::render('02-subset-fusion-files-LGAT.Rmd')"


Note to myself when I review next week: This effectively means this module is not tested in CI at all - is there an alternative?

Let me know how/if I should add the Part2-3 LGAT PRs outside the if condition. Just going with the condition for subset I'd put all the subsetting files there and would eventually add a final notebook to gather annotation outside this condition.

Just going with the condition for subset I'd put all the subsetting files there and would eventually add a final notebook to gather annotation outside this condition.

Put a different way: what would get tested in CI would be the part that puts it all together using what's committed to the repo.

Makes sense to me!

kgaonkar6 · 2021-01-11T15:01:14Z

@jaclyn-taroni thanks for the review! Just wanted to confirm if we will be adding LGAT subtyping updates from these #790 Part1-3 LGAT PRs for v18 release?

jaclyn-taroni · 2021-01-11T15:09:51Z

Just wanted to confirm if we will be adding LGAT subtyping updates from these #790 Part1-3 LGAT PRs for v18 release?

I don't think we want to hold the release up for these necessarily. To comment more generally – I think we should focus on fixing #889 (note that I merged #860 per your comment #889 (comment)) and then addressing #891 to make sure there were no inadvertent changes getting all the PBTA histologies PRs in -> release. We can review these in parallel for inclusion in v19 (#867).

kgaonkar6 · 2021-01-11T15:51:46Z

Sounds good, then I'll create a new branch from 608e905efb3a7dcdfa7b82c5497f270b8d7f8d2a to get through re-running all steps mentioned in #891 with the new base histology that will fix all of #889 ( Ependymoma and Embryonal broad/short histology inconsistency) .

I'm using that specific commit since that doesn't include the latest merge of LGAT Part1 PR to master. Let me know if I got that right. Thanks!

jaclyn-taroni · 2021-01-11T18:17:01Z

That seems correct to me, thanks!

jaclyn-taroni

This looks good to me! I checked locally to make sure it runs in the project Docker container because this is not run in CI.

My sessionInfo() comment and the outstanding question about what fusion summary files to use should be addressed prior to merging. I had another question about how we handle the canonical fusion. I'm not sure that's worth getting into on this pull request but did want to make a note.

Everything else is a matter of what we write in the notebook to help out future us when we revisit this! 😄

jaclyn-taroni · 2021-01-12T15:27:40Z

analyses/molecular-subtyping-LGAT/02-subset-fusion-files-LGAT.Rmd

+For kinase fusions, the following conditions needed to be satisfied for LGAT as per [PR](https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/830):
+- Added all 3' kinase fusions which are in-frame and retain the kinase domain
+- For 5' kinase fusions, added those which are in-frame and retain the kinase domain
+- For 5' kinase fusions that don't meet 3., check whether it has a reciprocal fusion and whether that reciprocal is in-frame and the kinase domain is retained. Then, add those 5' kinase fusions.


I don't follow what you mean by "don't meet 3." here without following to the linked pull request. I believe the same point is raised in the bulletpoint above this one, so I'd say that instead.

jaclyn-taroni · 2021-01-12T15:37:44Z

analyses/molecular-subtyping-LGAT/run_subtyping.sh

@@ -11,5 +11,6 @@ SUBSET=${OPENPBTA_SUBSET:-1}

 if [ "$SUBSET" -gt "0" ]; then
  Rscript -e "rmarkdown::render('01-subset-files-for-LGAT.Rmd')"
+  Rscript -e "rmarkdown::render('02-subset-fusion-files-LGAT.Rmd')"


Just going with the condition for subset I'd put all the subsetting files there and would eventually add a final notebook to gather annotation outside this condition.

Put a different way: what would get tested in CI would be the part that puts it all together using what's committed to the repo.

Makes sense to me!

jaclyn-taroni · 2021-01-12T15:53:06Z

analyses/molecular-subtyping-LGAT/02-subset-fusion-files-LGAT.Rmd

+      rowSums(dplyr::select(putativeFusion,dplyr::matches(MAPK_fused_gene))) > 0 &
+        # remove biospecimens with canonical fusion
+        # they are a separate subtype as shown above
+        !grepl("1",.$`KIAA1549--BRAF`) ~ "Yes",


You have this information in your fusionOI list.

> fusionOI[[1]] canonical 1 KIAA1549--BRAF

If new literature came out and we had a new canonical fusion (I am not commenting on how likely this is), you would have to update the JSON file and this notebook. If you have collapse steps and use matches() like you do elsewhere in this notebook, I think you would only need to change the JSON file.

I see that this list element is named KIAA1549--BRAF also, though, so the JSON structure itself may not be accommodating for new canonical fusions based on the literature (again, I can't comment on how likely that scenario is).

jaclyn-taroni · 2021-01-12T15:54:03Z

analyses/molecular-subtyping-LGAT/02-subset-fusion-files-LGAT.Rmd

+    # get KIAA1549--BRAF status
+    KIAA_BRAF_fus = case_when(
+      # canonical BRAF fusion 
+      grepl("1",.$`KIAA1549--BRAF`) ~ "Yes",


Same comment about using the canonical part of your list here as below.

Thanks for this suggestion , makes total sense!
I made adjustment in the json file to have an element fusionOI$canonical$fusion that is being collapsed like the other lists and can be more easily updated by adding another canonical fusion for LGAT if we find lit supporting those in the future directly to the json file.

jaclyn-taroni · 2021-01-12T15:57:30Z

analyses/molecular-subtyping-LGAT/02-subset-fusion-files-LGAT.Rmd

+
+# save to subset folder
+write_tsv(subsetFusion,file.path(subset_dir, "LGAT_fusion_subset.tsv"))
+```


Add sessionInfo() here please. That is one of the main ways we can keep an eye on if something was run in the project Docker container. I may have missed that in my review of #842. If that's the case, can you add to the 01 notebook as well please?

Done and done, there is now a sessionInfo in both the scripts

jaclyn-taroni · 2021-01-12T16:01:24Z

analyses/molecular-subtyping-LGAT/02-subset-fusion-files-LGAT.Rmd

+# get putative oncogene fusion list
+putativeFusion <- readr::read_tsv(file.path(root_dir,
+                                            "analyses",
+                                            "fusion-summary",
+                                            "results",
+                                            "fusion_summary_lgat_foi.tsv")) %>%


Did you come to an agreement?

jaclyn-taroni · 2021-01-12T16:10:00Z

analyses/molecular-subtyping-LGAT/02-subset-fusion-files-LGAT.Rmd

+For kinase fusions, the following conditions needed to be satisfied for LGAT as per [PR](https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/830):
+- Added all 3' kinase fusions which are in-frame and retain the kinase domain
+- For 5' kinase fusions, added those which are in-frame and retain the kinase domain
+- For 5' kinase fusions that don't meet 3., check whether it has a reciprocal fusion and whether that reciprocal is in-frame and the kinase domain is retained. Then, add those 5' kinase fusions.
+


Checking my understanding because I did not review #830 - the checks on kinase domain described here currently take place in fusion-summary is that correct? If so, can we add that to the text here please?

Right, I reorganized the text to make that clear that these rules were applied in fusion-summary and output file is being used in this notebook

jaclyn-taroni · 2021-01-12T23:24:23Z

analyses/molecular-subtyping-LGAT/lgat-subset/lgat_metadata.tsv

@@ -1,3 +1,601 @@
+<<<<<<< HEAD


I'm not sure when this merge conflict got introduced, but it does prevent the 02 notebook from running. I was checking again locally because this doesn't get run in CI. We need to fix this before this goes in, but I'm not sure what parts of the conflicts we want to retain.

hmm thinking about this, Part1 PR and Part3 PR , I believe these PRs were created before v18 subtyping was overhauled.

I was using v17 in Part1 which was then staggered into this PR so lgat_metadata.tsv in this PR was v17 version . This might be the reason for the conflict( I checked the changes and it seems the histology column order differs between v18 and v17). Also as these PRs are part of subtyping so I should be using pbta-histologies-base.tsv.
Can I re-run 01 with the above changes in this PR to update to v18?

Can I re-run 01 with the above changes in this PR to update to v18?

Yep, sounds good!

jaclyn-taroni · 2021-01-13T19:36:51Z

Most recent updates look good locally 👍 - will merge this next!

kgaonkar6 and others added 22 commits November 16, 2020 16:18

adding new SNV subtypes for LGAT

d37abea

adding additional subtypes

dc6e743

re-doing subtyping so removing the last step for now

eb65034

updating run_subtyping.sh

271958a

removing subtyping list

2281e95

update README

cd1e4d2

Merge branch 'master' into lgat_add_subtyping_SNV

2f30d89

changed to nb

32e05f7

Merge branch 'lgat_add_subtyping_SNV' of https://github.com/kgaonkar6…

93a2e44

…/OpenPBTA-analysis into lgat_add_subtyping_SNV

remove Rscript

05bf366

Update README.md

408166d

Update analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

4b972e4

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

Update analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

906bfb8

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

Update analyses/molecular-subtyping-LGAT/01-subset-files-for-LGAT.Rmd

5fca7ad

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

adding comments remove silent snv

38bab2f

adding sample_id

e12a603

adding H3F3B

dee0179

updates to files

27f270c

Merge branch 'master' into lgat_add_subtyping_SNV

dfad4ab

adding fusion subtyping

9889e79

fusion list of interest

6c92ee9

adding fusion nb and output

6ae3a1c

kgaonkar6 requested a review from jharenza November 20, 2020 17:17

kgaonkar6 mentioned this pull request Nov 20, 2020

#790 Part1: adding new SNV subtypes for LGAT #842

Merged

8 tasks

nd update in run and README

5112fae

kgaonkar6 mentioned this pull request Nov 20, 2020

#790 Part3: adding CNV based subtyping for LGAT #848

Merged

8 tasks

jharenza approved these changes Dec 3, 2020

View reviewed changes

Update analyses/molecular-subtyping-LGAT/02-subset-fusion-files-LGAT.Rmd

f07c940

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

jaclyn-taroni self-requested a review December 8, 2020 01:50

jaclyn-taroni added the review after release label Dec 13, 2020

jaclyn-taroni removed their request for review December 13, 2020 20:45

jharenza mentioned this pull request Dec 22, 2020

Find breaking changes from v18 (part 2) #876

Merged

6 tasks

jaclyn-taroni self-requested a review January 9, 2021 21:51

Merge remote-tracking branch 'upstream/master' into lgat_update_subty…

d803410

…ping_Fus

jaclyn-taroni reviewed Jan 10, 2021

View reviewed changes

Some reversions to master

d760dcc

jaclyn-taroni approved these changes Jan 12, 2021

View reviewed changes

jaclyn-taroni mentioned this pull request Jan 12, 2021

Planned release: V19 #867

Closed

21 tasks

kgaonkar6 added 2 commits January 12, 2021 14:25

adding sessionInfo updated json

bbe6153

adding sessionInfo updated json after conflict res

35f8eb6

jaclyn-taroni reviewed Jan 12, 2021

View reviewed changes

jaclyn-taroni added don't merge and removed review after release labels Jan 12, 2021

rerun v18 with base histology

37290b3

jaclyn-taroni removed the don't merge label Jan 13, 2021

Merge branch 'master' into lgat_update_subtyping_Fus

98dd941

jaclyn-taroni added the merge next label Jan 13, 2021

jaclyn-taroni merged commit 7792a96 into AlexsLemonade:master Jan 13, 2021

jaclyn-taroni removed the merge next label Jan 13, 2021

kgaonkar6 deleted the lgat_update_subtyping_Fus branch January 22, 2021 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#790 Part2: adding Fusion based subtyping for LGAT #847

#790 Part2: adding Fusion based subtyping for LGAT #847

kgaonkar6 commented Nov 20, 2020 •

edited

Loading

jharenza left a comment

jharenza Dec 3, 2020

kgaonkar6 Dec 9, 2020

jaclyn-taroni Jan 12, 2021

kgaonkar6 Jan 12, 2021

jharenza Jan 12, 2021

jaclyn-taroni Jan 10, 2021

kgaonkar6 Jan 11, 2021

jaclyn-taroni Jan 12, 2021

kgaonkar6 commented Jan 11, 2021

jaclyn-taroni commented Jan 11, 2021

kgaonkar6 commented Jan 11, 2021

jaclyn-taroni commented Jan 11, 2021

jaclyn-taroni left a comment

jaclyn-taroni Jan 12, 2021

jaclyn-taroni Jan 12, 2021

jaclyn-taroni Jan 12, 2021

jaclyn-taroni Jan 12, 2021

jaclyn-taroni Jan 12, 2021

kgaonkar6 Jan 12, 2021

jaclyn-taroni Jan 12, 2021

kgaonkar6 Jan 12, 2021

jaclyn-taroni Jan 12, 2021

jaclyn-taroni Jan 12, 2021

kgaonkar6 Jan 12, 2021

jaclyn-taroni Jan 12, 2021

kgaonkar6 Jan 13, 2021

jaclyn-taroni Jan 13, 2021

jaclyn-taroni commented Jan 13, 2021

#790 Part2: adding Fusion based subtyping for LGAT #847

#790 Part2: adding Fusion based subtyping for LGAT #847

Conversation

kgaonkar6 commented Nov 20, 2020 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jharenza left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgaonkar6 commented Jan 11, 2021

jaclyn-taroni commented Jan 11, 2021

kgaonkar6 commented Jan 11, 2021

jaclyn-taroni commented Jan 11, 2021

jaclyn-taroni left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaclyn-taroni commented Jan 13, 2021

kgaonkar6 commented Nov 20, 2020 •

edited

Loading