Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Added new fusion summary module #410

Merged

Conversation

dmiller15
Copy link
Contributor

updated anaylsis readme and ci config

Purpose/implementation Section

What scientific question is your analysis addressing?

In biospecimens from ependymoma and non-ATRT embryonal tumors, which ones have fusions of interest?

What was your approach?

Using the biospecimens that were identified to be in one or the other population, we filtered the fusions file. That filtered fusion file was simplified using a list of explicit and generic filters for fusions that have been shown to have relevance in the cancer of that population. The results were summarized in a TSV for each population.

What GitHub issue does your pull request address?

#398

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

The filters are of particular note since those values were mostly gleaned from the ticket. Perhaps there might be better ways of performing the filtering. Also the final summary tables aren't exactly set in stone. Particular for how we represent the generic fusions where we accept any fusion containing a particular gene. I simply left all the generic fusions found as unique columns. The ticket chose to aggregate these values, but the utility of that is up for discussion.

Is there anything that you want to discuss further?

No.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes. I'm confident the tables are presenting the information requested in the ticket.

Results

What types of results are included (e.g., table, figure)?

TSV tables.

What is your summary of the results?

C11orf95--RELA is nearly universal in Ependymoma tumors. Very few biospecimens have the fusions of interest. Almost none of the samples have more than one fusion.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.
  • This analysis is recorded in the table in analyses/README.md.

updated anaylsis readme and ci config
@sjspielman
Copy link
Member

Hi @dmiller15 , thanks for getting started on this analysis. I see in the originating Issue #398, the analysis requested is to search all samples for either classification or reclassification (my emphasis added):

... To generate files that contain information about the presence or absence of specific fusions or genes participating in fusions to be used in generating subtype labels ...

It looks like you've subsetted the data to only certain histologies before searching for fusions of interest. Can you update the code to search all samples even those that have already been subtyped? Thanks!

@dmiller15
Copy link
Contributor Author

Thanks for the input @sjspielman. I've added files for each set of fusions that don't filter the biospecimens beforehand.

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dmiller15 - thanks for this contribution! This is in good shape and is well-documented. I had some comments about some overarching design decisions in addition to the line comments I left:

  • I would remove the filtering by short_histology and broad_histology now that you include the files without this filtering.
  • There are 924 samples in the pbta-fusion-recurrently-fused-genes-bysample.tsv, which is a binary matrix that contains information about the presence or absence of a recurrently fused gene in an RNA-seq sample. This file is very similar to what we want here. The main difference is that the columns of that file are data-driven (e.g., based on the number of samples they appeared in) and here we are specifying the fusions upfront. The files produced here have under 40 samples, and if I followed correctly, I believe this is due to the inclusion of only the samples (Kids_First_Biospecimen_ID) that have at least one of the fusions or genes that are being specified. We want to include all samples. Here's where @kgaonkar6 starts creating the pbta-fusion-recurrently-fused-genes-bysample.tsv matrix for your reference:
    # binary matrix for recurrent fusions found in SAMPLE per broad_histology
  • In some cases, it's not clear that the 5'/3' ordering of the genes matters, so MN1--BEND2 and BEND2--MN1 may be equivalent for the purposes of these files. I've asked @jharenza to weigh in.

analyses/fusion-summary/01-fusion-summary.R Show resolved Hide resolved
analyses/fusion-summary/01-fusion-summary.R Outdated Show resolved Hide resolved
analyses/fusion-summary/01-fusion-summary.R Outdated Show resolved Hide resolved
analyses/fusion-summary/01-fusion-summary.R Show resolved Hide resolved
added RELA gene filtering
no longer drop levels of samples
@dmiller15
Copy link
Contributor Author

The files produced here have under 40 samples, and if I followed correctly, I believe this is due to the inclusion of only the samples (Kids_First_Biospecimen_ID) that have at least one of the fusions or genes that are being specified. We want to include all samples.

@sjspielman With regard to the above, I no longer drop the levels when making the table. You can inspect the newly generated outputs and see 787 samples, which is the number of unique biospecimens available in pbta-fusion-putative-oncogenic.tsv.

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks good to me - thank you for the changes @dmiller15 !

@jaclyn-taroni
Copy link
Member

Realized after approving that we no longer needed the demographic file - so I removed that in bba4e78. I'll merge once CI finishes!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants