Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Reciprocal and kinase #821

Merged
merged 10 commits into from
Nov 5, 2020

Conversation

kgaonkar6
Copy link
Collaborator

@kgaonkar6 kgaonkar6 commented Oct 21, 2020

Purpose/implementation Section

What scientific question is your analysis addressing?

Add kinase domain retention status for fused genes since this information will be needed to filter BRAF and other kinase gene fusions that we use for LGAT subtyping. We also want to check if the fusion is a reciprocal that is if the fusion callers called GeneX--GeneY and GeneY--GeneX.

What was your approach?

First, I added the LeftBreakpoint and RightBreakpoint column since we need this information to annotate domain retention. (Earlier ,we had removed these columns so that 1 unique fusion row per Sample could be retained)

Then, we will be using fusion_driver function from annoFuse to add kinase domain status per Gene1A (5 Gene) and Gene1B (3 Gene) in columns
DomainRetainedGene1A and DomainRetainedGene1B.For each kinase gene the Domain retention annotation will be as follows

Domain retention annotation Description
DomianRetainedGene1A == Yes LeftBreakpoint downstream of domain end in any fusion
DomianRetainedGene1A == Partial LeftBreakpoint within domain boundaries in any fusion
DomianRetainedGene1A == No LeftBreakpoint upstream of domain start in any fusion
DomianRetainedGene1B == Yes RightBreakpoint upstream of domain start in in-frame fusion
DomianRetainedGene1B == Partial RightBreakpoint within domain boundaries in in-frame fusion
DomianRetainedGene1B == No RightBreakpoint downstream of domain end in any fusion

Within the function the base function pfam domain annotation annotates the retention status of domains per breakpoint and domain ID & Location information from :

Annotation File Source
pfamID http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/pfamDesc.txt.gz UCSC pfamID Description database
Domain Location http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ucscGenePfam.txt.gz UCSC pfamID Description database

For reciprocal status I've added a function to add that information as logical values to a separate column reciprocal_exists. For sample in Sample BS_044XZ8ST we have reciprocal fusion ANTXR1--BRAF and BRAF --ANTXR1 so these fusions will be reciprocal_exists== TRUE

What GitHub issue does your pull request address?

#812

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

I've tried to organize the chunks in 04-project-specific-filtering.Rmd so that there are minimal code changes, please let me know if it is easy enough to follow.

Is there anything that you want to discuss further?

Since I've now added the LeftBreakpoint and RightBreakpoint columns to pbta-fusion-putative-oncogenic.tsv there can be multiple rows per FusionName and Sample if they have multiple breakpoints for the fusion. It doesn't affect the *recurrent-fusion-byhistology.tsv, *recurrent-fused-genes-byhistology.tsv, *recurrent-fusion-bysamplee.tsv and *recurrent-fused-genes-bysampletsv but might affect some other modules that don't unique for FusionName Sample

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

table

What is your summary of the results?

Kinase domain retention information is added per kinase gene fusion

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@kgaonkar6 kgaonkar6 added the ready for review Used to label pull requests that are ready for review label Oct 26, 2020
@jaclyn-taroni
Copy link
Member

Are the changes in analyses/fusion_filtering/results/pbta-fusion-recurrent-fusion-bysample.tsv and analyses/fusion_filtering/results/pbta-fusion-recurrently-fused-genes-bysample.tsv expected or unexpected?

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I had a few questions before I approve.

analyses/fusion_filtering/README.md Outdated Show resolved Hide resolved
# check for fusions have reciprocal fusions in the same Sample
# works only for GeneY -- GeneX ; GeneX -- GeneY matches
recirpocal_fusion <- function(FusionName,Sample,standardFusioncalls ){
Gene1A <- strsplit(FusionName,"--")[[1]][1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me why we're not looking at the Gene2A and Gene2B here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For intergenic fusions, which has Gene1A/Gene2A--Gene1B or Gene1A--Gene1B/Gene2B and similar fusions it get's a little complicated since then we need to check the distance is the same between Gene1A/Gene2A in the reciprocal so I've just stuck to fusions between genes.

kgaonkar6 and others added 4 commits October 29, 2020 13:27
Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
@kgaonkar6
Copy link
Collaborator Author

Are the changes in analyses/fusion_filtering/results/pbta-fusion-recurrent-fusion-bysample.tsv and analyses/fusion_filtering/results/pbta-fusion-recurrently-fused-genes-bysample.tsv expected or unexpected?

This is expected because of the different sample selection issues by running sample() when we have multiple samples per Kids_First_Participant_ID.

@jaclyn-taroni
Copy link
Member

This is expected because of the different sample selection issues by running sample() when we have multiple samples per Kids_First_Participant_ID.

I would have expected sorting + setting a seed to have prevented that, but there may be some subtlety I'm missing. Either way, beyond the scope of this PR.

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kgaonkar6
Copy link
Collaborator Author

Thanks for the review @jaclyn-taroni !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
ready for review Used to label pull requests that are ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants