Fusion frequency tables #49

kgaonkar6 · 2021-07-07T14:26:26Z

🚨 merge after #50

Purpose/implementation Section

What scientific question is your analysis addressing?

UsingFusionName_Fusion_Type as a alteration ID we will generate tables with counts and frequencies per cancer_group and cohort.

What was your approach?

The code is adapted from https://github.com/PediatricOpenTargets/OpenPedCan-analysis/tree/f70645b6c7e4eb15ea29e45e9ebf0adeb5798b9b/analyses/snv-frequencies by @logstar

Given a alteration dataframe with Kids_First_Biospecimen_ID and Alt_ID get_cg_ch_mut_freq_tbl() gets the counts and frequencies per cancer_group within cohort.

What GitHub issue does your pull request address?

d3b-center/ticket-tracker-OPC#70
d3b-center/ticket-tracker-OPC#72

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Reciprocal_exists is for all fusions should this be restricted to Kinase?
annots and Breakpoint Location are not in putation-onco-fusion file should we add these back??

Is there anything that you want to discuss further?

Wanted to make sure, that the Gene_Symbol is separated as each gene fused in new row, so should there be columns specifying the position of the gene? Because it seems if we add Gene1A,Gene1B, Gene2A, Gene2B we have the same information in wide and also in long formation having a row per gene fused.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

tables

What is your summary of the results?

The fusion calls ( currently it's PBTA only from dev) which we will updated once #42 goes in.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

jharenza · 2021-07-07T14:38:19Z

Reciprocal_exists is for all fusions should this be restricted to Kinase?

yes

annots and Breakpoint Location are not in putation-onco-fusion file should we add these back??

either yes, or create a put-onco-fusion file with annotations - probably the former to keep redundancy at a minimum

logstar

Thank you for preparing this module @kgaonkar6 !

The code and documentation look good to me.

I wonder if you could add missing Ensembl (ENSG) IDs to the result tables by joining ensg-hugo-rmtl-v1-mapping.tsv, perhaps by revising the following procedure for adding RMTL. For example, the first line of the TSV file has gene symbol LINC01019 and ENSG ID missing, but ENSG00000248118 LINC01019 mapping exits in ensg-hugo-rmtl-v1-mapping.tsv.

https://github.com/PediatricOpenTargets/OpenPedCan-analysis/blob/4f4e7f0e0d006c077af6b716ac2a16a0efdd4138/analyses/fusion-frequencies/01-fusion-frequencies.R#L228-L241

Adding ENSG IDs will also add more Gene_full_names and Protein_RefSeq_ID, because they are joined by ENSG IDs.

https://github.com/PediatricOpenTargets/OpenPedCan-analysis/blob/4f4e7f0e0d006c077af6b716ac2a16a0efdd4138/analyses/fusion-frequencies/01-fusion-frequencies.R#L265-L268

analyses/fusion-frequencies/01-fusion-frequencies.R

analyses/fusion-frequencies/README.md

analyses/fusion-frequencies/run-frequencies.sh

analyses/fusion-frequencies/utils/freq_counts.R

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

…penTargets/OpenPBTA-analysis into kgaonkar6/fusion_freq

kgaonkar6 · 2021-07-07T17:52:17Z

Thanks for the reviews !

@jharenza,I've update the reciprocal_exitsts to be TRUE if either 1 gene in (Gene1A or Gene1B) is kinase and will thus have "Yes" or "No" in DomainRetainedGene1A or DomainRetainedGene1B.
I'll add the code to retain annots and add Breakpoint location code to the script generating the putative-onco-fusion in
a staggered PR from #40

@logstar, thanks for the updates on comments and the missing ENSG ids, I've updated the gene id match code now.

kgaonkar6 · 2021-07-07T20:11:21Z

@jharenza the annots column is slightly difference in arriba annotation (uniquely has duplication/translocation/deletion values) and StarFusion annotation if Fusion is called in both. This will add another row uniqued by the values in annots, is that acceptable?

Example:
Arriba ["INTERCHROMOSOMAL[chr7--chr2]"],translocation
StarFuson [INTERCHROMOSOMAL[chr7--chr2]]

Or should I look into aggregating it by creating a vector and unique it

jharenza · 2021-07-07T20:36:12Z

@jharenza the annots column is slightly difference in arriba annotation (uniquely has duplication/translocation/deletion values) and StarFusion annotation if Fusion is called in both. This will add another row uniqued by the values in annots, is that acceptable?

Example:
Arriba ["INTERCHROMOSOMAL[chr7--chr2]"],translocation
StarFuson [INTERCHROMOSOMAL[chr7--chr2]]

Or should I look into aggregating it by creating a vector and unique it

I think aggregating all of those per fusion breakpoints/type would be better.

…argets/OpenPBTA-analysis into kgaonkar6/fusion_freq

kgaonkar6 · 2021-07-14T15:00:22Z

Just wanted add that the PR is updated with the aggregated annots aggreagted at non-caller specific columns in #50 as well as Breakpoint location is added.

kgaonkar6 · 2021-07-29T17:52:15Z

Hi @jharenza @logstar I've updated the code to implement the annotation via annotator API and also added json files in my latest comments.

Even though the RMTL is annotated by the annotator api had to use ensg-hugo-rmtl-mapping.tsv to gather the ensg_id , let me know if there is a better way to do this. Thanks!

logstar · 2021-07-29T17:58:04Z

Hi @jharenza @logstar I've updated the code to implement the annotation via annotator API and also added json files in my latest comments.

Even though the RMTL is annotated by the annotator api had to use ensg-hugo-rmtl-mapping.tsv to gather the ensg_id , let me know if there is a better way to do this. Thanks!

@kgaonkar6 Thank you for the suggestion. I wonder if you have any suggestions on how to annotate RMTL without using the ensg-hugo-rmtl-mapping.tsv, so I could improve it accordingly.

kgaonkar6 · 2021-07-29T20:56:06Z

@kgaonkar6 Thank you for the suggestion. I wonder if you have any suggestions on how to annotate RMTL without using the ensg-hugo-rmtl-mapping.tsv, so I could improve it accordingly.

To clarify, I'm reading the ensg-hugo-rmtl-mapping.tsv to add the ensg_id column to the fusion calls. This might be an issue specific for fusion calls since we don't have ENSEMBL ids so wanted to mention it since it might be confusing why I'm reading ensg-hugo-rmtl-mapping.tsv in the code.

The RMTL annotation does in-fact come from your code directly so wasn't a comment on your code. Sorry for the confusion.

logstar · 2021-07-29T21:22:45Z

@kgaonkar6 Thank you for the suggestion. I wonder if you have any suggestions on how to annotate RMTL without using the ensg-hugo-rmtl-mapping.tsv, so I could improve it accordingly.

To clarify, I'm reading the ensg-hugo-rmtl-mapping.tsv to add the ensg_id column to the fusion calls. This might be an issue specific for fusion calls since we don't have ENSEMBL ids so wanted to mention it since it might be confusing why I'm reading ensg-hugo-rmtl-mapping.tsv in the code.

The RMTL annotation does in-fact come from your code directly so wasn't a comment on your code. Sorry for the confusion.

Thank you for the clarification @kgaonkar6 ! Sorry that I misunderstood.

Regarding fusion calls that have no associated ENSG ID, I am also not sure how to better handle them. Thank you for pointing out this issue! I will keep this issue in mind when reviewing this PR.

For fusion calls that have associated ENSG IDs, I think adding the ENSG IDs with ensg-hugo-rmtl-mapping.tsv is the best practice now, because the module developers would be able to carefully handle module specific issues like the one you are encountering.

logstar

Thank you for the updates @kgaonkar6 !

The updated module runs well, and the results are reproduced identically.

Following are my specific suggestions.

analyses/README.md

analyses/fusion-frequencies/01-fusion-frequencies.R

analyses/fusion-frequencies/README.md

analyses/fusion-frequencies/01-fusion-frequencies.R

analyses/fusion-frequencies/run-frequencies.sh

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

kgaonkar6 · 2021-07-30T20:34:50Z

Thanks @logstar , I'll re-request review once I have rerun the module with your suggested changes above.

jharenza · 2021-07-30T20:38:59Z

For fusion calls that have associated ENSG IDs, I think adding the ENSG IDs with ensg-hugo-rmtl-mapping.tsv is the best practice now, because the module developers would be able to carefully handle module specific issues like the one you are encountering.

Agree with this, thanks!

logstar

Thank you for the updates @kgaonkar6 !

The code updates since the last review look good to me.

The module runs well in the Docker image, and the results are reproduced identically. There is also no duplicated line in the result files anymore.

jharenza · 2021-07-30T21:26:42Z

@kgaonkar6 I am still not seeing the ENSG id in the final tables - can you add this column to the tables?

kgaonkar6 · 2021-07-30T21:43:14Z

@jharenza sorry about that, they are in the results files now.

jharenza

LGTM now!

kgaonkar6 and others added 4 commits July 7, 2021 09:33

fusion tables json

db23585

comment update

d701f3a

gene position

b7ac76b

Update README.md

4f4e7f0

jharenza requested review from jharenza and logstar July 7, 2021 14:37

logstar suggested changes Jul 7, 2021

View reviewed changes

kgaonkar6 and others added 13 commits July 7, 2021 12:58

Update analyses/fusion-frequencies/01-fusion-frequencies.R

5e595f4

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

Update analyses/fusion-frequencies/01-fusion-frequencies.R

4c546eb

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

Update analyses/fusion-frequencies/01-fusion-frequencies.R

f0c8cfa

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

Update analyses/fusion-frequencies/README.md

cf30b62

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

Update analyses/fusion-frequencies/README.md

3566415

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

Update analyses/fusion-frequencies/run-frequencies.sh

0b0c6a6

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

Update analyses/fusion-frequencies/utils/freq_counts.R

648c98c

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

Update analyses/fusion-frequencies/README.md

df74d81

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

remove the filter for is.na(rmtl)

a200ceb

Merge branch 'kgaonkar6/fusion_freq' of https://github.com/PediatricO…

63d1170

…penTargets/OpenPBTA-analysis into kgaonkar6/fusion_freq

rerun with ensg update

65f08ce

add kinase domain and reciprocal cols

3310c66

reciprocal exitst update

e2d9f08

kgaonkar6 added 2 commits July 7, 2021 19:52

Merge branch 'add_bk_loc_annots' of https://github.com/PediatricOpenT…

d0af788

…argets/OpenPBTA-analysis into kgaonkar6/fusion_freq

annots and bkloc added

9066369

logstar mentioned this pull request Jul 12, 2021

Annotate SNV table with mutation frequencies #45

Merged

5 tasks

kgaonkar6 added 2 commits July 14, 2021 15:03

update with input parms

0843d39

remove old files

a54d2a2

kgaonkar6 and others added 2 commits July 29, 2021 12:48

rerun without large files

e807e09

Update README.md

46f68eb

kgaonkar6 requested review from jharenza and logstar July 29, 2021 17:49

logstar suggested changes Jul 30, 2021

View reviewed changes

kgaonkar6 and others added 5 commits July 30, 2021 14:21

Update analyses/README.md

820fbf4

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

Update analyses/fusion-frequencies/README.md

45cdab9

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

Update analyses/fusion-frequencies/run-frequencies.sh

f08d642

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

Update analyses/fusion-frequencies/run-frequencies.sh

86097ee

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

Update analyses/fusion-frequencies/01-fusion-frequencies.R

6445db7

Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>

rerun after unique and rm .json

e5ada2c

kgaonkar6 requested a review from logstar July 30, 2021 21:02

logstar approved these changes Jul 30, 2021

View reviewed changes

add ensg id

f3052b8

jharenza approved these changes Jul 30, 2021

View reviewed changes

Merge branch 'dev' into kgaonkar6/fusion_freq

9a838d8

jharenza merged commit af898d0 into dev Jul 30, 2021

This was referenced Jul 30, 2021

Proposed Analysis: Create fusion frequency tables d3b-center/ticket-tracker-OPC#70

Closed

Proposed Analysis: Create JSON files for fusion tables d3b-center/ticket-tracker-OPC#72

Closed

kgaonkar6 deleted the kgaonkar6/fusion_freq branch July 30, 2021 23:30

logstar mentioned this pull request Aug 10, 2021

Updated analysis: update fusiong_filtering module to handle gene symbols that have no mapped ENSG ID d3b-center/ticket-tracker-OPC#153

Closed

logstar mentioned this pull request Dec 16, 2021

Fusion data missing ENSGs d3b-center/ticket-tracker-OPC#264

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fusion frequency tables #49

Fusion frequency tables #49

kgaonkar6 commented Jul 7, 2021 •

edited

Loading

jharenza commented Jul 7, 2021

logstar left a comment

kgaonkar6 commented Jul 7, 2021

kgaonkar6 commented Jul 7, 2021

jharenza commented Jul 7, 2021

kgaonkar6 commented Jul 14, 2021

kgaonkar6 commented Jul 29, 2021

logstar commented Jul 29, 2021

kgaonkar6 commented Jul 29, 2021

logstar commented Jul 29, 2021

logstar left a comment

kgaonkar6 commented Jul 30, 2021 •

edited

Loading

jharenza commented Jul 30, 2021

logstar left a comment

jharenza commented Jul 30, 2021

kgaonkar6 commented Jul 30, 2021

jharenza left a comment

Fusion frequency tables #49

Fusion frequency tables #49

Conversation

kgaonkar6 commented Jul 7, 2021 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jharenza commented Jul 7, 2021

logstar left a comment

Choose a reason for hiding this comment

kgaonkar6 commented Jul 7, 2021

kgaonkar6 commented Jul 7, 2021

jharenza commented Jul 7, 2021

kgaonkar6 commented Jul 14, 2021

kgaonkar6 commented Jul 29, 2021

logstar commented Jul 29, 2021

kgaonkar6 commented Jul 29, 2021

logstar commented Jul 29, 2021

logstar left a comment

Choose a reason for hiding this comment

kgaonkar6 commented Jul 30, 2021 • edited Loading

jharenza commented Jul 30, 2021

logstar left a comment

Choose a reason for hiding this comment

jharenza commented Jul 30, 2021

kgaonkar6 commented Jul 30, 2021

jharenza left a comment

Choose a reason for hiding this comment

kgaonkar6 commented Jul 7, 2021 •

edited

Loading

kgaonkar6 commented Jul 30, 2021 •

edited

Loading