Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fusion frequency tables #49

Merged
merged 54 commits into from
Jul 30, 2021
Merged

Fusion frequency tables #49

merged 54 commits into from
Jul 30, 2021

Conversation

kgaonkar6
Copy link

@kgaonkar6 kgaonkar6 commented Jul 7, 2021

🚨 merge after #50

Purpose/implementation Section

What scientific question is your analysis addressing?

UsingFusionName_Fusion_Type as a alteration ID we will generate tables with counts and frequencies per cancer_group and cohort.

What was your approach?

The code is adapted from https://github.com/PediatricOpenTargets/OpenPedCan-analysis/tree/f70645b6c7e4eb15ea29e45e9ebf0adeb5798b9b/analyses/snv-frequencies by @logstar

Given a alteration dataframe with Kids_First_Biospecimen_ID and Alt_ID get_cg_ch_mut_freq_tbl() gets the counts and frequencies per cancer_group within cohort.

What GitHub issue does your pull request address?

d3b-center/ticket-tracker-OPC#70
d3b-center/ticket-tracker-OPC#72

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

  • Reciprocal_exists is for all fusions should this be restricted to Kinase?
  • annots and Breakpoint Location are not in putation-onco-fusion file should we add these back??

Is there anything that you want to discuss further?

Wanted to make sure, that the Gene_Symbol is separated as each gene fused in new row, so should there be columns specifying the position of the gene? Because it seems if we add Gene1A,Gene1B, Gene2A, Gene2B we have the same information in wide and also in long formation having a row per gene fused.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

tables

What is your summary of the results?

The fusion calls ( currently it's PBTA only from dev) which we will updated once #42 goes in.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jharenza jharenza requested review from jharenza and logstar July 7, 2021 14:37
@jharenza
Copy link
Member

jharenza commented Jul 7, 2021

Reciprocal_exists is for all fusions should this be restricted to Kinase?

yes

annots and Breakpoint Location are not in putation-onco-fusion file should we add these back??

either yes, or create a put-onco-fusion file with annotations - probably the former to keep redundancy at a minimum

Copy link

@logstar logstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for preparing this module @kgaonkar6 !

The code and documentation look good to me.

I wonder if you could add missing Ensembl (ENSG) IDs to the result tables by joining ensg-hugo-rmtl-v1-mapping.tsv, perhaps by revising the following procedure for adding RMTL. For example, the first line of the TSV file has gene symbol LINC01019 and ENSG ID missing, but ENSG00000248118 LINC01019 mapping exits in ensg-hugo-rmtl-v1-mapping.tsv.

https://github.com/PediatricOpenTargets/OpenPedCan-analysis/blob/4f4e7f0e0d006c077af6b716ac2a16a0efdd4138/analyses/fusion-frequencies/01-fusion-frequencies.R#L228-L241

Adding ENSG IDs will also add more Gene_full_names and Protein_RefSeq_ID, because they are joined by ENSG IDs.

https://github.com/PediatricOpenTargets/OpenPedCan-analysis/blob/4f4e7f0e0d006c077af6b716ac2a16a0efdd4138/analyses/fusion-frequencies/01-fusion-frequencies.R#L265-L268

analyses/fusion-frequencies/01-fusion-frequencies.R Outdated Show resolved Hide resolved
analyses/fusion-frequencies/01-fusion-frequencies.R Outdated Show resolved Hide resolved
analyses/fusion-frequencies/01-fusion-frequencies.R Outdated Show resolved Hide resolved
analyses/fusion-frequencies/README.md Outdated Show resolved Hide resolved
analyses/fusion-frequencies/README.md Outdated Show resolved Hide resolved
analyses/fusion-frequencies/README.md Outdated Show resolved Hide resolved
analyses/fusion-frequencies/run-frequencies.sh Outdated Show resolved Hide resolved
analyses/fusion-frequencies/utils/freq_counts.R Outdated Show resolved Hide resolved
kgaonkar6 and others added 13 commits July 7, 2021 12:58
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
@kgaonkar6
Copy link
Author

Thanks for the reviews !

@jharenza,I've update the reciprocal_exitsts to be TRUE if either 1 gene in (Gene1A or Gene1B) is kinase and will thus have "Yes" or "No" in DomainRetainedGene1A or DomainRetainedGene1B.
I'll add the code to retain annots and add Breakpoint location code to the script generating the putative-onco-fusion in
a staggered PR from #40

@logstar, thanks for the updates on comments and the missing ENSG ids, I've updated the gene id match code now.

@kgaonkar6
Copy link
Author

@jharenza the annots column is slightly difference in arriba annotation (uniquely has duplication/translocation/deletion values) and StarFusion annotation if Fusion is called in both. This will add another row uniqued by the values in annots, is that acceptable?

Example:
Arriba ["INTERCHROMOSOMAL[chr7--chr2]"],translocation
StarFuson [INTERCHROMOSOMAL[chr7--chr2]]

Or should I look into aggregating it by creating a vector and unique it

@jharenza
Copy link
Member

jharenza commented Jul 7, 2021

@jharenza the annots column is slightly difference in arriba annotation (uniquely has duplication/translocation/deletion values) and StarFusion annotation if Fusion is called in both. This will add another row uniqued by the values in annots, is that acceptable?

Example:
Arriba ["INTERCHROMOSOMAL[chr7--chr2]"],translocation
StarFuson [INTERCHROMOSOMAL[chr7--chr2]]

Or should I look into aggregating it by creating a vector and unique it

I think aggregating all of those per fusion breakpoints/type would be better.

@kgaonkar6
Copy link
Author

Just wanted add that the PR is updated with the aggregated annots aggreagted at non-caller specific columns in #50 as well as Breakpoint location is added.

@kgaonkar6
Copy link
Author

Hi @jharenza @logstar I've updated the code to implement the annotation via annotator API and also added json files in my latest comments.

Even though the RMTL is annotated by the annotator api had to use ensg-hugo-rmtl-mapping.tsv to gather the ensg_id , let me know if there is a better way to do this. Thanks!

@logstar
Copy link

logstar commented Jul 29, 2021

Hi @jharenza @logstar I've updated the code to implement the annotation via annotator API and also added json files in my latest comments.

Even though the RMTL is annotated by the annotator api had to use ensg-hugo-rmtl-mapping.tsv to gather the ensg_id , let me know if there is a better way to do this. Thanks!

@kgaonkar6 Thank you for the suggestion. I wonder if you have any suggestions on how to annotate RMTL without using the ensg-hugo-rmtl-mapping.tsv, so I could improve it accordingly.

@kgaonkar6
Copy link
Author

@kgaonkar6 Thank you for the suggestion. I wonder if you have any suggestions on how to annotate RMTL without using the ensg-hugo-rmtl-mapping.tsv, so I could improve it accordingly.

To clarify, I'm reading the ensg-hugo-rmtl-mapping.tsv to add the ensg_id column to the fusion calls. This might be an issue specific for fusion calls since we don't have ENSEMBL ids so wanted to mention it since it might be confusing why I'm reading ensg-hugo-rmtl-mapping.tsv in the code.

The RMTL annotation does in-fact come from your code directly so wasn't a comment on your code. Sorry for the confusion.

@logstar
Copy link

logstar commented Jul 29, 2021

@kgaonkar6 Thank you for the suggestion. I wonder if you have any suggestions on how to annotate RMTL without using the ensg-hugo-rmtl-mapping.tsv, so I could improve it accordingly.

To clarify, I'm reading the ensg-hugo-rmtl-mapping.tsv to add the ensg_id column to the fusion calls. This might be an issue specific for fusion calls since we don't have ENSEMBL ids so wanted to mention it since it might be confusing why I'm reading ensg-hugo-rmtl-mapping.tsv in the code.

The RMTL annotation does in-fact come from your code directly so wasn't a comment on your code. Sorry for the confusion.

Thank you for the clarification @kgaonkar6 ! Sorry that I misunderstood.

Regarding fusion calls that have no associated ENSG ID, I am also not sure how to better handle them. Thank you for pointing out this issue! I will keep this issue in mind when reviewing this PR.

For fusion calls that have associated ENSG IDs, I think adding the ENSG IDs with ensg-hugo-rmtl-mapping.tsv is the best practice now, because the module developers would be able to carefully handle module specific issues like the one you are encountering.

Copy link

@logstar logstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates @kgaonkar6 !

The updated module runs well, and the results are reproduced identically.

Following are my specific suggestions.

analyses/README.md Outdated Show resolved Hide resolved
analyses/fusion-frequencies/01-fusion-frequencies.R Outdated Show resolved Hide resolved
analyses/fusion-frequencies/README.md Outdated Show resolved Hide resolved
kgaonkar6 and others added 5 commits July 30, 2021 14:21
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
Co-authored-by: Yuanchao Zhang <logstar@users.noreply.github.com>
@kgaonkar6
Copy link
Author

kgaonkar6 commented Jul 30, 2021

Thanks @logstar , I'll re-request review once I have rerun the module with your suggested changes above.

@jharenza
Copy link
Member

For fusion calls that have associated ENSG IDs, I think adding the ENSG IDs with ensg-hugo-rmtl-mapping.tsv is the best practice now, because the module developers would be able to carefully handle module specific issues like the one you are encountering.

Agree with this, thanks!

@kgaonkar6 kgaonkar6 requested a review from logstar July 30, 2021 21:02
Copy link

@logstar logstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates @kgaonkar6 !

The code updates since the last review look good to me.

The module runs well in the Docker image, and the results are reproduced identically. There is also no duplicated line in the result files anymore.

@jharenza
Copy link
Member

@kgaonkar6 I am still not seeing the ENSG id in the final tables - can you add this column to the tables?

@kgaonkar6
Copy link
Author

@jharenza sorry about that, they are in the results files now.

Copy link
Member

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants