Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

#790 Part2: adding Fusion based subtyping for LGAT #847

Merged

Conversation

kgaonkar6
Copy link
Collaborator

@kgaonkar6 kgaonkar6 commented Nov 20, 2020

⚠️ #842 needs to be merged before this PR, 02-subset-fusion-files-LGAT.R and output can be reviewed independently though

Purpose/implementation Section

LGAT subtyping is being revamped as per issue. I'm diving into this with staggered PRs per alteration as:

What scientific question is your analysis addressing?

As per issue we will be subtyping LGAT based on fusion in the following genes:

  • LGG, KIAA1549-BRAF
    contains KIAA1549-BRAF fusion

  • LGG, other MAPK
    contains non-canonical BRAF fusion other than KIAA1549-BRAF
    contains RAF1 fusion

  • LGG, RTK
    harbors a fusion in ALK, ROS1, NTRK1, NTRK2, or NTRK3 or
    harbors a PDGFRA fusion

  • LGG, FGFR
    harbors FGFR1-TACC1 fusion
    harbors FGFR1 or FGFR2 fusions

  • LGG, MYB/MYBL1
    harbors either a MYB-QKI fusion or other MYB or MYBL1 fusion

What was your approach?

  • I used a list of genes to look for in presence of fusion or fused genes per subtype.
  • I'm using file from lgat fusion-summary , add fusion summary for LGAT #830 checks for kinase overlap and reciprocal fusion for LGAT biospecimen fusions.

The fusion status per subtype is saved as lgat-subset/LGAT_fusion_subset.tsv

  Kids_First_Biospecimen_ID KIAA_BRAF_fus MAPK_fus RTK_fus FGFR_fus MYB_fus

What GitHub issue does your pull request address?

#790

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

We will have to review/push through #842 since SNV subtyping was done in the previous PR.

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

lgat-subset/LGAT_fusion_subset.tsv

What is your summary of the results?

122 biospecimens have canonical KIAA1549--BRAF, 12 biospecimens have fusion in other MAPK genes, 14 biospecimen have fusions in RTK genes, 3 biospecimens have MYB fusion.
There are no FGFR1--TACC1 or FGFR fusion in the lgat biospecimen list.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, please add the ticket mentioned!

Comment on lines +70 to +75
# get putative oncogene fusion list
putativeFusion <- readr::read_tsv(file.path(root_dir,
"analyses",
"fusion-summary",
"results",
"fusion_summary_lgat_foi.tsv")) %>%
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have to change when #849 goes in, to use the file from the data release. Will you create a ticket for this later change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would want to use the relative path to fusion_summary_lgat_foi.tsv because those modules will be run with base histology which will then be most updated files to be used in molecular-subtyping-LGAT, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you come to an agreement?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to use relative paths in subtyping modules to get the most updated files from fusion-summary, right @jharenza?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, relative

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
@jaclyn-taroni jaclyn-taroni removed their request for review December 13, 2020 20:45
@jaclyn-taroni jaclyn-taroni self-requested a review January 9, 2021 21:51
@@ -11,5 +11,6 @@ SUBSET=${OPENPBTA_SUBSET:-1}

if [ "$SUBSET" -gt "0" ]; then
Rscript -e "rmarkdown::render('01-subset-files-for-LGAT.Rmd')"
Rscript -e "rmarkdown::render('02-subset-fusion-files-LGAT.Rmd')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself when I review next week: This effectively means this module is not tested in CI at all - is there an alternative?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know how/if I should add the Part2-3 LGAT PRs outside the if condition. Just going with the condition for subset I'd put all the subsetting files there and would eventually add a final notebook to gather annotation outside this condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just going with the condition for subset I'd put all the subsetting files there and would eventually add a final notebook to gather annotation outside this condition.

Put a different way: what would get tested in CI would be the part that puts it all together using what's committed to the repo.

Makes sense to me!

@kgaonkar6
Copy link
Collaborator Author

@jaclyn-taroni thanks for the review! Just wanted to confirm if we will be adding LGAT subtyping updates from these #790 Part1-3 LGAT PRs for v18 release?

@jaclyn-taroni
Copy link
Member

Just wanted to confirm if we will be adding LGAT subtyping updates from these #790 Part1-3 LGAT PRs for v18 release?

I don't think we want to hold the release up for these necessarily. To comment more generally – I think we should focus on fixing #889 (note that I merged #860 per your comment #889 (comment)) and then addressing #891 to make sure there were no inadvertent changes getting all the PBTA histologies PRs in -> release. We can review these in parallel for inclusion in v19 (#867).

@kgaonkar6
Copy link
Collaborator Author

Sounds good, then I'll create a new branch from 608e905efb3a7dcdfa7b82c5497f270b8d7f8d2a to get through re-running all steps mentioned in #891 with the new base histology that will fix all of #889 ( Ependymoma and Embryonal broad/short histology inconsistency) .

I'm using that specific commit since that doesn't include the latest merge of LGAT Part1 PR to master. Let me know if I got that right. Thanks!

@jaclyn-taroni
Copy link
Member

That seems correct to me, thanks!

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! I checked locally to make sure it runs in the project Docker container because this is not run in CI.

My sessionInfo() comment and the outstanding question about what fusion summary files to use should be addressed prior to merging. I had another question about how we handle the canonical fusion. I'm not sure that's worth getting into on this pull request but did want to make a note.

Everything else is a matter of what we write in the notebook to help out future us when we revisit this! 😄

For kinase fusions, the following conditions needed to be satisfied for LGAT as per [PR](https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/830):
- Added all 3' kinase fusions which are in-frame and retain the kinase domain
- For 5' kinase fusions, added those which are in-frame and retain the kinase domain
- For 5' kinase fusions that don't meet 3., check whether it has a reciprocal fusion and whether that reciprocal is in-frame and the kinase domain is retained. Then, add those 5' kinase fusions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow what you mean by "don't meet 3." here without following to the linked pull request. I believe the same point is raised in the bulletpoint above this one, so I'd say that instead.

@@ -11,5 +11,6 @@ SUBSET=${OPENPBTA_SUBSET:-1}

if [ "$SUBSET" -gt "0" ]; then
Rscript -e "rmarkdown::render('01-subset-files-for-LGAT.Rmd')"
Rscript -e "rmarkdown::render('02-subset-fusion-files-LGAT.Rmd')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just going with the condition for subset I'd put all the subsetting files there and would eventually add a final notebook to gather annotation outside this condition.

Put a different way: what would get tested in CI would be the part that puts it all together using what's committed to the repo.

Makes sense to me!

rowSums(dplyr::select(putativeFusion,dplyr::matches(MAPK_fused_gene))) > 0 &
# remove biospecimens with canonical fusion
# they are a separate subtype as shown above
!grepl("1",.$`KIAA1549--BRAF`) ~ "Yes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have this information in your fusionOI list.

> fusionOI[[1]]
       canonical
1 KIAA1549--BRAF

If new literature came out and we had a new canonical fusion (I am not commenting on how likely this is), you would have to update the JSON file and this notebook. If you have collapse steps and use matches() like you do elsewhere in this notebook, I think you would only need to change the JSON file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this list element is named KIAA1549--BRAF also, though, so the JSON structure itself may not be accommodating for new canonical fusions based on the literature (again, I can't comment on how likely that scenario is).

# get KIAA1549--BRAF status
KIAA_BRAF_fus = case_when(
# canonical BRAF fusion
grepl("1",.$`KIAA1549--BRAF`) ~ "Yes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about using the canonical part of your list here as below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this suggestion , makes total sense!
I made adjustment in the json file to have an element fusionOI$canonical$fusion that is being collapsed like the other lists and can be more easily updated by adding another canonical fusion for LGAT if we find lit supporting those in the future directly to the json file.


# save to subset folder
write_tsv(subsetFusion,file.path(subset_dir, "LGAT_fusion_subset.tsv"))
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add sessionInfo() here please. That is one of the main ways we can keep an eye on if something was run in the project Docker container. I may have missed that in my review of #842. If that's the case, can you add to the 01 notebook as well please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and done, there is now a sessionInfo in both the scripts

Comment on lines +70 to +75
# get putative oncogene fusion list
putativeFusion <- readr::read_tsv(file.path(root_dir,
"analyses",
"fusion-summary",
"results",
"fusion_summary_lgat_foi.tsv")) %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you come to an agreement?

Comment on lines 56 to 60
For kinase fusions, the following conditions needed to be satisfied for LGAT as per [PR](https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/830):
- Added all 3' kinase fusions which are in-frame and retain the kinase domain
- For 5' kinase fusions, added those which are in-frame and retain the kinase domain
- For 5' kinase fusions that don't meet 3., check whether it has a reciprocal fusion and whether that reciprocal is in-frame and the kinase domain is retained. Then, add those 5' kinase fusions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking my understanding because I did not review #830 - the checks on kinase domain described here currently take place in fusion-summary is that correct? If so, can we add that to the text here please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I reorganized the text to make that clear that these rules were applied in fusion-summary and output file is being used in this notebook

@jaclyn-taroni jaclyn-taroni mentioned this pull request Jan 12, 2021
21 tasks
@@ -1,3 +1,601 @@
<<<<<<< HEAD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure when this merge conflict got introduced, but it does prevent the 02 notebook from running. I was checking again locally because this doesn't get run in CI. We need to fix this before this goes in, but I'm not sure what parts of the conflicts we want to retain.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm thinking about this, Part1 PR and Part3 PR , I believe these PRs were created before v18 subtyping was overhauled.

I was using v17 in Part1 which was then staggered into this PR so lgat_metadata.tsv in this PR was v17 version . This might be the reason for the conflict( I checked the changes and it seems the histology column order differs between v18 and v17). Also as these PRs are part of subtyping so I should be using pbta-histologies-base.tsv.
Can I re-run 01 with the above changes in this PR to update to v18?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I re-run 01 with the above changes in this PR to update to v18?

Yep, sounds good!

@jaclyn-taroni
Copy link
Member

Most recent updates look good locally 👍 - will merge this next!

@jaclyn-taroni jaclyn-taroni merged commit 7792a96 into AlexsLemonade:master Jan 13, 2021
@kgaonkar6 kgaonkar6 deleted the lgat_update_subtyping_Fus branch January 22, 2021 21:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants