Planned Analysis: Filter and Annotate Fusions #39

jharenza · 2019-08-07T12:49:55Z

Here, we will filter potential artifacts, filter fusions observed in normal tissue, retain high-confidence calls, and annotate with several databases to create a final list of putative driver fusions.

jharenza · 2019-08-07T12:50:29Z

@kgaonkar6 and I have created a workflow for this, will update soon!

jaclyn-taroni · 2019-08-19T12:24:34Z

Is this pipeline described in AlexsLemonade/OpenPBTA-manuscript#21?

jharenza · 2019-08-19T12:28:40Z

Is this pipeline described in AlexsLemonade/OpenPBTA-manuscript#21?

It is not yet described there. Do you think we should add it to the methods section or part of the analysis/results section? We created a workflow (and @kgaonkar6 created an R package) to annotate fusion gene partners as TSG, oncogene, kinase, TF, receptor, added expression of each gene, filtered out artifacts, fusions in normal tissues, etc, to come up with a high-confidence list of putative driver fusions. It is probably more of a method, but weren't going to add until this PR was finished - still making tweaks.

jaclyn-taroni · 2019-08-19T12:37:47Z

It is probably more of a method, but weren't going to add until this PR was finished - still making tweaks.

Ah, sounds good. This is the order I would expect. Are the Fusion prioritization steps described in AlexsLemonade/OpenPBTA-manuscript#21 then upstream, e.g., they have been performed on the fusion TSV files?

jharenza · 2019-08-19T12:40:30Z

They are all post data download of TSV files.

jharenza · 2019-08-28T14:31:15Z

@jaclyn-taroni @cgreene - seeking advice on this PR. We plan to create a package to do the annotations and prioritization, but have some bugs currently. We were thinking of creating code for the PR that would use the new tool and spit out the results (TXT file and figures), but in the meantime, for this PR, would you rather us contribute the entirety of the code as we have in this repo https://github.com/d3b-center/fusion_filtering_pipeline? It has been a work in progress for several months, so may be a lot to go through for the purposes of the PR. cc: @kgaonkar6

jaclyn-taroni · 2019-08-28T15:28:48Z

It would be great to have that as a reusable analysis workflow. Sounds like that's your goal with creating a package. If you're open to it, we could put that code through code review, as it is often helpful to have some fresh eyes on a piece of work when the goal is to make something more generalizable/reusable.

Before we figure out the mechanics of getting it through review and which repository, etc., I have a few questions. The most important of which is: what is the broad idea of what this pipeline does?

Follow-up questions: What are the inputs to the pipeline? Can you make the files you are using as input public?

jaclyn-taroni · 2019-08-28T15:31:26Z

To clarify, this is very helpful:

Here, we will filter potential artifacts, filter fusions observed in normal tissue, retain high-confidence calls, and annotate with several databases to create a final list of putative driver fusions.

I'm wondering about things like where the fusions observed in normal tissue information is coming from.

jharenza · 2019-08-28T16:01:25Z

Good point - I think a code review would be helpful. The goals of the package would be

annotation
prioritization of candidate fusions

Inputs are the fusion output files from arriba and star-fusion. While we are only using these two algorithms, in the past, we have run 4 other algorithms and plan to add the capability of some of those output files as input to this package.

There are a host of annotation tools and databases used and for normal fusion removal, we are using Fusion Annotator and Arriba has its own blacklist. Now that I am writing this, I think we should remove from STAR-fusion the fusions present in the arriba blacklist. There are a lot of pieces to this, so you will see. Hope to have the PR submitted today or tomorrow with what we have to date.

jaclyn-taroni · 2019-08-28T17:03:37Z

There are a lot of pieces to this, so you will see. Hope to have the PR submitted today or tomorrow with what we have to date.

What will the planned PR consist of? Will it be some wrapper script that calls the code in https://github.com/d3b-center/fusion_filtering_pipeline? As you state, there's quite a bit of code in that repository. It would be infeasible to review it well all at once. Is the plan to submit a draft pull request and that's where we'll discuss splitting it up (per https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/CONTRIBUTING.md#size-and-composition-of-pull-requests)?

jharenza · 2019-08-28T17:17:01Z

Yes, the plan is to submit a bash script to call all of those scripts in the correct order. Also just realized we annotated the arriba fusions to be consistently annotated with star fusion and rather than having users reproduce that via this PR (requires 7GB database download), we will release the annotated fusions in V3. Will try to get this released today with @yuankunzhu.

jaclyn-taroni · 2019-08-28T17:24:03Z

Okay. Having the order will be helpful in figuring out next steps. If more context is needed, we can discuss here or on the pull request.

syzheng · 2019-10-04T21:47:26Z

To clarify, this is very helpful:

Here, we will filter potential artifacts, filter fusions observed in normal tissue, retain high-confidence calls, and annotate with several databases to create a final list of putative driver fusions.

I'm wondering about things like where the fusions observed in normal tissue information is coming from.

it can be from multiple resources. For instance, TCGA normal samples have been analyzed for fusions; but the best source might be GTEx. One issue with using normal sample for filtering is that the normal samples should be analyzed with the same pipeline used for cancer so to minimize tool introduced artifacts.

jharenza · 2019-12-02T21:54:46Z

closed with #294 #300 #277 #267

jharenza added in progress Someone is working on this issue, but feel free to propose an alternative approach! good first issue Good for newcomers labels Aug 7, 2019

jaclyn-taroni removed the good first issue Good for newcomers label Aug 15, 2019

This was referenced Aug 29, 2019

Fusion filtering #89

Closed

(PR 1 of 5) Fusion Analysis: filtering #92

Merged

This was referenced Sep 19, 2019

PR 2 of 5 Genereal filtering for fusion calls #113

Closed

Adding 02-fusion-filtering.R #115

Merged

jharenza mentioned this issue Oct 2, 2019

Planned Analysis: Oncoprint showing landscape of genetic lesions across PBTA. #6

Closed

kgaonkar6 mentioned this issue Oct 8, 2019

03 GTEx/cohort Fusion annotation #151

Merged

10 tasks

jaclyn-taroni added fusion Related to or requires fusion data transcriptomic Related to or requires transcriptomic data labels Oct 26, 2019

This was referenced Nov 15, 2019

Updates: run merged script #267

Merged

04 project specific filtering #277

Merged

jharenza mentioned this issue Nov 25, 2019

Planned data release: V11 #287

Closed

2 tasks

This was referenced Nov 26, 2019

04 project specific filtering #294

Closed

add updated genelistreference.tct and results #300

Merged

jharenza closed this as completed Dec 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planned Analysis: Filter and Annotate Fusions #39

Planned Analysis: Filter and Annotate Fusions #39

jharenza commented Aug 7, 2019

jharenza commented Aug 7, 2019

jaclyn-taroni commented Aug 19, 2019

jharenza commented Aug 19, 2019

jaclyn-taroni commented Aug 19, 2019

jharenza commented Aug 19, 2019

jharenza commented Aug 28, 2019

jaclyn-taroni commented Aug 28, 2019

jaclyn-taroni commented Aug 28, 2019 •

edited

Loading

jharenza commented Aug 28, 2019

jaclyn-taroni commented Aug 28, 2019

jharenza commented Aug 28, 2019

jaclyn-taroni commented Aug 28, 2019

syzheng commented Oct 4, 2019

jharenza commented Dec 2, 2019

Planned Analysis: Filter and Annotate Fusions #39

Planned Analysis: Filter and Annotate Fusions #39

Comments

jharenza commented Aug 7, 2019

jharenza commented Aug 7, 2019

jaclyn-taroni commented Aug 19, 2019

jharenza commented Aug 19, 2019

jaclyn-taroni commented Aug 19, 2019

jharenza commented Aug 19, 2019

jharenza commented Aug 28, 2019

jaclyn-taroni commented Aug 28, 2019

jaclyn-taroni commented Aug 28, 2019 • edited Loading

jharenza commented Aug 28, 2019

jaclyn-taroni commented Aug 28, 2019

jharenza commented Aug 28, 2019

jaclyn-taroni commented Aug 28, 2019

syzheng commented Oct 4, 2019

jharenza commented Dec 2, 2019

jaclyn-taroni commented Aug 28, 2019 •

edited

Loading