-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fusion frequency tables #49
Conversation
yes
either yes, or create a put-onco-fusion file with annotations - probably the former to keep redundancy at a minimum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for preparing this module @kgaonkar6 !
The code and documentation look good to me.
I wonder if you could add missing Ensembl (ENSG) IDs to the result tables by joining ensg-hugo-rmtl-v1-mapping.tsv
, perhaps by revising the following procedure for adding RMTL. For example, the first line of the TSV file has gene symbol LINC01019
and ENSG ID missing, but ENSG00000248118 LINC01019
mapping exits in ensg-hugo-rmtl-v1-mapping.tsv
.
Adding ENSG IDs will also add more Gene_full_name
s and Protein_RefSeq_ID
, because they are joined by ENSG IDs.
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
…penTargets/OpenPBTA-analysis into kgaonkar6/fusion_freq
Thanks for the reviews ! @jharenza,I've update the reciprocal_exitsts to be TRUE if either 1 gene in (Gene1A or Gene1B) is kinase and will thus have "Yes" or "No" in @logstar, thanks for the updates on comments and the missing ENSG ids, I've updated the gene id match code now. |
@jharenza the annots column is slightly difference in arriba annotation (uniquely has duplication/translocation/deletion values) and StarFusion annotation if Fusion is called in both. This will add another row uniqued by the values in Example: Or should I look into aggregating it by creating a vector and unique it |
I think aggregating all of those per fusion breakpoints/type would be better. |
Just wanted add that the PR is updated with the aggregated |
Hi @jharenza @logstar I've updated the code to implement the annotation via annotator API and also added json files in my latest comments. Even though the RMTL is annotated by the annotator api had to use |
@kgaonkar6 Thank you for the suggestion. I wonder if you have any suggestions on how to annotate RMTL without using the |
To clarify, I'm reading the The RMTL annotation does in-fact come from your code directly so wasn't a comment on your code. Sorry for the confusion. |
Thank you for the clarification @kgaonkar6 ! Sorry that I misunderstood. Regarding fusion calls that have no associated ENSG ID, I am also not sure how to better handle them. Thank you for pointing out this issue! I will keep this issue in mind when reviewing this PR. For fusion calls that have associated ENSG IDs, I think adding the ENSG IDs with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates @kgaonkar6 !
The updated module runs well, and the results are reproduced identically.
Following are my specific suggestions.
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Thanks @logstar , I'll re-request review once I have rerun the module with your suggested changes above. |
Agree with this, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates @kgaonkar6 !
The code updates since the last review look good to me.
The module runs well in the Docker image, and the results are reproduced identically. There is also no duplicated line in the result files anymore.
@kgaonkar6 I am still not seeing the ENSG id in the final tables - can you add this column to the tables? |
@jharenza sorry about that, they are in the results files now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now!
🚨 merge after #50
Purpose/implementation Section
What scientific question is your analysis addressing?
Using
FusionName
_Fusion_Type
as a alteration ID we will generate tables with counts and frequencies per cancer_group and cohort.What was your approach?
The code is adapted from
https://github.com/PediatricOpenTargets/OpenPedCan-analysis/tree/f70645b6c7e4eb15ea29e45e9ebf0adeb5798b9b/analyses/snv-frequencies
by @logstarGiven a alteration dataframe with
Kids_First_Biospecimen_ID
andAlt_ID
get_cg_ch_mut_freq_tbl() gets the counts and frequencies per cancer_group within cohort.What GitHub issue does your pull request address?
d3b-center/ticket-tracker-OPC#70
d3b-center/ticket-tracker-OPC#72
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Reciprocal_exists
is for all fusions should this be restricted to Kinase?annots
andBreakpoint Location
are not in putation-onco-fusion file should we add these back??Is there anything that you want to discuss further?
Wanted to make sure, that the Gene_Symbol is separated as each gene fused in new row, so should there be columns specifying the position of the gene? Because it seems if we add Gene1A,Gene1B, Gene2A, Gene2B we have the same information in wide and also in long formation having a row per gene fused.
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes
Results
What types of results are included (e.g., table, figure)?
tables
What is your summary of the results?
The fusion calls ( currently it's PBTA only from
dev
) which we will updated once #42 goes in.Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.