This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
add fusion summary for LGAT #830
add fusion summary for LGAT #830
Changes from 8 commits
7058aa0
47c0a8f
a5a5438
6a74fd7
2a4fa55
ec13333
7bf382c
7df165c
9ffa3da
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic question: If we have an in-frame fusion but the domain is not retained, we aren't counting it as in frame? So are there fusions that are in-frame but not retaining the domain that will not be in either dataframes being set up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added some checks to check these and commented on the code. You are correct, there are two fusions which are in-frame, but do not retain the kinase domain. In case 1, there is another in-frame fusion with the kinase domain, so it is being captured in the
three_prime_kinase_inframe
list, but in case 2, the in-frame fusion does not retain the kinase domain and so we are not keeping.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking this chunk can still be simplified. In particular, I'm noticing that
five_prime_domain_lost
is fromallFuseLGAT
andfive_prime_reciprocals
is fromfive_prime_domain_lost
but then we are adding back info fromallFuseLGAT
.So basically both
five_prime_domain_lost
andfive_prime_reciprocals
are based onallFuseLGAT
but we have extra manipulation steps. I think we can do this more directly.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I'm trying to make this shorter, I'm a bit confused on these steps.
I tried to boil it down to what looks like the essential steps to get
five_prime_kinase_keep
, but as of now it does not get the same result. I think part of this is I still don't really understand all that is happening in this chunk. Can you take a look at my attempt as well evaluating your original code to see how we can make these steps more efficient and clear?In particular I know the differences have to do with the paired up "select() then distinct()" steps that are done a few timees, but its unclear to me why each of these "select() then distinct()" steps are necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh
This filtering you have here is filtering on the frame-ness/domain retention of the 5' kinase fusions and not their reciprocals (3' kinase fusions), which is why I was making that variable, then re-joining
five_prime_reciprocals
with theallFuseLGAT
.What I am trying to do is find any 5' kinases without the domain retained, but then go back to that original dataframe
allFuseLGAT
and collect the annotations of their reciprocal fusions (ie 3' kinase fusions' frameness and domain retention), assess whether those are in-frame and retain the kinase domain, then if there is a reciprocal that satisfies that, join both the 5' kinase fusion dataframe (domain lost) and 3' kinase fusion dataframe (inframe/domain retained).Actually, I can remove:
and use:
distinct()
is needed because in some cases, there are multiple breakpoints for each fusion and sometimes multiple can have the same frame-ness, so this reduces them to one fusion call per sample.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with commit - @cansavvy let me know if this clarifies!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! Thanks for those updates! I think this looks good!