This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
add fusion summary for LGAT #830
add fusion summary for LGAT #830
Changes from 1 commit
7058aa0
47c0a8f
a5a5438
6a74fd7
2a4fa55
ec13333
7bf382c
7df165c
9ffa3da
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear to me what the pattern is for what columns are being selected here. Can you tell me what the particular pattern or strategy for these columns being selected is?
Maybe we can come up with a more succinct way of selecting these if there's a pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this, since we are specifically looking for in-frame (
Fusion_Type
), kinase (Gene1A_anno
,Gene1B_anno
), whether a reciprocal exists (reciprocal_exists
), and fusions with kinase domains retained (DomainRetainedGene1A
,DomainRetainedGene1B
), these columns required for LGAT fusion compilation. I added the left and right breakpoint columns to make sure that I was seeing unique fusions, although the rows may be unique enough with the combination of columns above. Let me check on this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the number of unique fusions without breakpoints == 265 and with breakpoints == 300, but the results are identical. I will remove those two columns!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't change to readr::read_tsv(), we will need to add one more argument here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm interpreting the end result of this correctly, it looks like we want a vector of nonkinase fuses. If so, we can get to that slightly more directly:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tidyverse doesn't need the
lgatFuses_df$
bits.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also good tip!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a vector of nonkinase fuses then we can use
pull()
and do this in one shot.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use the suggestions above, then this will have to change:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic question: If we have an in-frame fusion but the domain is not retained, we aren't counting it as in frame? So are there fusions that are in-frame but not retaining the domain that will not be in either dataframes being set up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added some checks to check these and commented on the code. You are correct, there are two fusions which are in-frame, but do not retain the kinase domain. In case 1, there is another in-frame fusion with the kinase domain, so it is being captured in the
three_prime_kinase_inframe
list, but in case 2, the in-frame fusion does not retain the kinase domain and so we are not keeping.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can add this on to the set of tidyverse manipulations you have above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put a suggestion up there.