Added new fusion summary module #410

dmiller15 · 2020-01-07T20:32:01Z

updated anaylsis readme and ci config

Purpose/implementation Section

What scientific question is your analysis addressing?

In biospecimens from ependymoma and non-ATRT embryonal tumors, which ones have fusions of interest?

What was your approach?

Using the biospecimens that were identified to be in one or the other population, we filtered the fusions file. That filtered fusion file was simplified using a list of explicit and generic filters for fusions that have been shown to have relevance in the cancer of that population. The results were summarized in a TSV for each population.

What GitHub issue does your pull request address?

#398

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

The filters are of particular note since those values were mostly gleaned from the ticket. Perhaps there might be better ways of performing the filtering. Also the final summary tables aren't exactly set in stone. Particular for how we represent the generic fusions where we accept any fusion containing a particular gene. I simply left all the generic fusions found as unique columns. The ticket chose to aggregate these values, but the utility of that is up for discussion.

Is there anything that you want to discuss further?

No.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes. I'm confident the tables are presenting the information requested in the ticket.

Results

What types of results are included (e.g., table, figure)?

TSV tables.

What is your summary of the results?

C11orf95--RELA is nearly universal in Ependymoma tumors. Very few biospecimens have the fusions of interest. Almost none of the samples have more than one fusion.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.
This analysis is recorded in the table in analyses/README.md.

updated anaylsis readme and ci config

sjspielman · 2020-01-07T22:06:52Z

Hi @dmiller15 , thanks for getting started on this analysis. I see in the originating Issue #398, the analysis requested is to search all samples for either classification or reclassification (my emphasis added):

... To generate files that contain information about the presence or absence of specific fusions or genes participating in fusions to be used in generating subtype labels ...

It looks like you've subsetted the data to only certain histologies before searching for fusions of interest. Can you update the code to search all samples even those that have already been subtyped? Thanks!

dmiller15 · 2020-01-08T14:20:39Z

Thanks for the input @sjspielman. I've added files for each set of fusions that don't filter the biospecimens beforehand.

jaclyn-taroni

Hi @dmiller15 - thanks for this contribution! This is in good shape and is well-documented. I had some comments about some overarching design decisions in addition to the line comments I left:

I would remove the filtering by short_histology and broad_histology now that you include the files without this filtering.
There are 924 samples in the pbta-fusion-recurrently-fused-genes-bysample.tsv, which is a binary matrix that contains information about the presence or absence of a recurrently fused gene in an RNA-seq sample. This file is very similar to what we want here. The main difference is that the columns of that file are data-driven (e.g., based on the number of samples they appeared in) and here we are specifying the fusions upfront. The files produced here have under 40 samples, and if I followed correctly, I believe this is due to the inclusion of only the samples (Kids_First_Biospecimen_ID) that have at least one of the fusions or genes that are being specified. We want to include all samples. Here's where @kgaonkar6 starts creating the pbta-fusion-recurrently-fused-genes-bysample.tsv matrix for your reference:

OpenPBTA-analysis/analyses/fusion_filtering/05-recurrent-fusions-per-histology.R

Line 121 in 7917a7f

# binary matrix for recurrent fusions found in SAMPLE per broad_histology
In some cases, it's not clear that the 5'/3' ordering of the genes matters, so MN1--BEND2 and BEND2--MN1 may be equivalent for the purposes of these files. I've asked @jharenza to weigh in.

analyses/fusion-summary/01-fusion-summary.R

added RELA gene filtering no longer drop levels of samples

dmiller15 · 2020-01-09T15:26:42Z

The files produced here have under 40 samples, and if I followed correctly, I believe this is due to the inclusion of only the samples (Kids_First_Biospecimen_ID) that have at least one of the fusions or genes that are being specified. We want to include all samples.

@sjspielman With regard to the above, I no longer drop the levels when making the table. You can inspect the newly generated outputs and see 787 samples, which is the number of unique biospecimens available in pbta-fusion-putative-oncogenic.tsv.

jaclyn-taroni

👍 looks good to me - thank you for the changes @dmiller15 !

…o feat/fusion-summary-init

jaclyn-taroni · 2020-01-09T17:21:26Z

Realized after approving that we no longer needed the demographic file - so I removed that in bba4e78. I'll merge once CI finishes!

added new fusion summary module

fda31f6

updated anaylsis readme and ci config

added files without biospecimen filter

95994d4

dmiller15 and others added 2 commits January 8, 2020 18:06

change relabel of column names

0a2729d

Merge branch 'master' into feat/fusion-summary-init

7fe0813

jaclyn-taroni reviewed Jan 8, 2020

View reviewed changes

jaclyn-taroni mentioned this pull request Jan 9, 2020

Add chr22q loss variable to ATRT molecular subtyping #414

Merged

3 tasks

removed demographic filtering

69b4568

added RELA gene filtering no longer drop levels of samples

jaclyn-taroni approved these changes Jan 9, 2020

View reviewed changes

jaclyn-taroni added 3 commits January 9, 2020 12:02

Merge branch 'master' into feat/fusion-summary-init

e7cac2e

No longer need demographic file

bba4e78

Merge remote-tracking branch 'dmiller15/feat/fusion-summary-init' int…

76a1f5d

…o feat/fusion-summary-init

jaclyn-taroni merged commit a991195 into AlexsLemonade:master Jan 9, 2020

jaclyn-taroni mentioned this pull request Jan 10, 2020

Proposed Analysis: Fusion files specifically for consumption by molecular subtyping analyses #398

Closed

dmiller15 deleted the feat/fusion-summary-init branch January 10, 2020 16:15

jaclyn-taroni mentioned this pull request Jan 23, 2020

Proposed Analysis: Molecularly subtype ependymoma tumors #245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added new fusion summary module #410

Added new fusion summary module #410

dmiller15 commented Jan 7, 2020

sjspielman commented Jan 7, 2020

dmiller15 commented Jan 8, 2020

jaclyn-taroni left a comment

dmiller15 commented Jan 9, 2020

jaclyn-taroni left a comment

jaclyn-taroni commented Jan 9, 2020

Added new fusion summary module #410

Added new fusion summary module #410

Conversation

dmiller15 commented Jan 7, 2020

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

sjspielman commented Jan 7, 2020

dmiller15 commented Jan 8, 2020

jaclyn-taroni left a comment

Choose a reason for hiding this comment

dmiller15 commented Jan 9, 2020

jaclyn-taroni left a comment

Choose a reason for hiding this comment

jaclyn-taroni commented Jan 9, 2020