Skip to content

Commit

Permalink
Update fusion-summary to include union of biospecimen IDs in fusion c…
Browse files Browse the repository at this point in the history
…allers (AlexsLemonade#478)

* Add note about putative oncogenic fusions single caller

* Add in biospecimens in either caller

Change to notebook for documentation purposes

* Run notebook instead

Also +x

* Remove Rscript

* Minor typo, formatting fixes

* Add in the original caller files to modules at a glance

* Add in 'missing fusions'

To better match former behavior

* Will embryonal step pass in CI?

* Revert "Will embryonal step pass in CI?"

This reverts commit b81379d.

* Skip ependymoma steps in CI

* Forgot the variable in CI

* Forgot to replace NA with 0

* Apply @jashapiro right_join suggestion

* Apply suggestions from code review

Co-Authored-By: jashapiro <jashapiro@gmail.com>

Co-authored-by: jashapiro <jashapiro@gmail.com>
  • Loading branch information
jaclyn-taroni and jashapiro authored Jan 27, 2020
1 parent e6a165e commit 0e642ef
Show file tree
Hide file tree
Showing 10 changed files with 4,280 additions and 1,640 deletions.
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ jobs:

- run:
name: Fusion Summary
command: ./scripts/run_in_ci.sh bash "analyses/fusion-summary/run-new-analysis.sh"
command: OPENPBTA_TESTING=1 ./scripts/run_in_ci.sh bash "analyses/fusion-summary/run-new-analysis.sh"

- run:
name: Molecular subtyping - Non-MB/Non-ATRT Embryonal tumors
Expand Down
2 changes: 1 addition & 1 deletion analyses/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Note that _nearly all_ modules use the harmonized clinical data file (`pbta-hist
| [`create-subset-files`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/create-subset-files) | All files | This module contains the code to create the subset files used in continuous integration | All subset files for continuous integration
| [`focal-cn-file-preparation`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/focal-cn-file-preparation) | `pbta-cnv-cnvkit.seg.gz` <br> `pbta-cnv-controlfreec.tsv.gz` <br> `pbta-gene-expression-rsem-fpkm.polya.rds` <br> `pbta-gene-expression-rsem-fpkm.stranded.rds` | Maps from copy number variant caller segments to gene identifiers; will eventually be updated to use consensus copy number calls ([#186](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/186))| `results/cnvkit_annotated_cn_autosomes.tsv.bz2` <br> `results/cnvkit_annotated_cn_x_and_y.tsv.bz2` <br> `results/controlfreec_annotated_cn_autosomes.tsv.bz2` <br> `results/controlfreec_annotated_cn_x_and_y.tsv.bz2`
| [`fusion_filtering`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion_filtering) | `pbta-fusion-arriba.tsv.gz` <br> `pbta-fusion-starfusion.tsv.gz` | Standardizes, filters, and prioritizes fusion calls | `results/pbta-fusion-putative-oncogenic.tsv` <br> `results/pbta-fusion-recurrent-fusion-byhistology.tsv` <br> `results/pbta-fusion-recurrent-fusion-bysample.tsv` (included in data download)
| [`fusion-summary`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion-summary)| `pbta-histologies.tsv` <br> `pbta-fusion-putative-oncogenic.tsv` | Generate summary tables from fusion files ([#398](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/398)) | `results/fusion_summary_embryonal_foi.tsv` <br> `results/fusion_summary_ependymoma_foi.tsv`
| [`fusion-summary`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion-summary)| `pbta-histologies.tsv` <br> `pbta-fusion-putative-oncogenic.tsv` <br> `pbta-fusion-arriba.tsv.gz` <br> `pbta-fusion-starfusion.tsv.gz` | Generate summary tables from fusion files ([#398](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/398)) | `results/fusion_summary_embryonal_foi.tsv` <br> `results/fusion_summary_ependymoma_foi.tsv`
| [`gene-set-enrichment-analysis`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/gene-set-enrichment-analysis) | `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` | *In progress*. Updated gene set enrichment analysis with appropriate RNA-seq expression data | `results/gsva_scores_stranded.tsv` <br> `results/gsva_scores_polya.tsv` <br> for stranded, polya expression data respectively
| [`immune-deconv`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/immune-deconv) | `pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` | Immune/Stroma characterization across PBTA (part of [#15](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/15)) | `results/deconv-output.RData`
| [`independent-samples`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/independent-samples) | `pbta-histologies.tsv` | Generates independent specimen lists for WGS/WXS samples | `results/independent-specimens.wgs.primary.tsv` <br> `results/independent-specimens.wgs.primary-plus.tsv` <br> `results/independent-specimens.wgswxs.primary.tsv` <br> `results/independent-specimens.wgswxs.primary-plus.tsv` (included in data download)
Expand Down
129 changes: 0 additions & 129 deletions analyses/fusion-summary/01-fusion-summary.R

This file was deleted.

186 changes: 186 additions & 0 deletions analyses/fusion-summary/01-fusion-summary.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
---
title: "Generate Fusion Summary Files"
output: html_notebook
author: Daniel Miller (D3b) and Jaclyn Taroni (CCDL)
date: January 2020
params:
is_ci: 0
---

Generate fusion files specifically for consumption by molecular subtyping analyses

## Set up

```{r}
# if running in CI, we need to skip the EPN steps
if (params$is_ci == 0) running_in_ci <- FALSE
if (params$is_ci == 1) running_in_ci <- TRUE
```

### Libraries and functions

```{r}
library(tidyverse)
```

```{r}
#' Generate filtered fusion frame
#' @param df Unfiltered fusion data frame
#' @param bioid List of biospecimen IDs
#' @param fuses List of explicit fusion names
#' @param genes List of gene names
#' @return the filtered fusion data frame
filterFusion <- function(df, bioid, fuses, genes) {
if (!missing(bioid)) {
df <- filter(df, Sample %in% bioid)
}
if (!missing(fuses) & !missing(genes)) {
df <- filter(df, FusionName %in% fuses |
Gene1A %in% genes |
Gene2A %in% genes |
Gene1B %in% genes |
Gene2B %in% genes)
} else if (!missing(fuses)) {
df <- filter(df, FusionName %in% fuses)
} else if (!missing(genes)) {
df <- filter(df,
Gene1A %in% genes |
Gene2A %in% genes |
Gene1B %in% genes |
Gene2B %in% genes)
}
return(df %>% select(Sample, FusionName))
}
#' Generate matrix with fusion counts
#' @param fuseDF Filtered fusion data frame
#' @param bioid List of biospecimen IDs that should be included in final table
#' @return Data frame that contains fusion counts
prepareOutput <- function(fuseDF, bioid) {
fuseDF %>%
reshape2::dcast(Sample ~ FusionName) %>%
right_join(data.frame(Sample = bioid)) %>%
replace(is.na(.), 0) %>%
rename(Kids_First_Biospecimen_ID = Sample)
}
```

### Read in data

```{r}
dataDir <- file.path("..", "..", "data")
#' The putative oncogenic fusion file is what we'll use to check for the
#' presence or absence of the fusions.
putativeOncogenicDF <-
read_tsv(file.path(dataDir, "pbta-fusion-putative-oncogenic.tsv"))
#' However, some biospecimens are not represented in this filtered, prioritized
#' file but *are* present in the original files -- this will cause them to be
#' "missing" in the final files for consumption which could mislead analysts.
arribaDF <- read_tsv(file.path(dataDir, "pbta-fusion-arriba.tsv.gz"))
starfusionDF <- read_tsv(file.path(dataDir, "pbta-fusion-starfusion.tsv.gz"))
```

### Output

```{r}
resultsDir <- "results"
if (!dir.exists(resultsDir)) {
dir.create(resultsDir)
}
ependFile <- file.path(resultsDir, "fusion_summary_ependymoma_foi.tsv")
embryFile <- file.path(resultsDir, "fusion_summary_embryonal_foi.tsv")
```

## Fusions and genes of interest

Taken from [`AlexsLemonade/OpenPBTA-analysis#245`](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/245) and [`AlexsLemonade/OpenPBTA-analysis#251`](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/251), respectively.

```{r}
#' **Filters**
#'
#' *Fusions Filters*
#' 1: Exact match a list of fusions common in Ependymoma tumors
ependFuses <- c(
"C11orf95--MAML2",
"C11orf95--RELA",
"C11orf95--YAP1",
"LTBP3--RELA",
"PTEN--TAS2R1",
"YAP1--FAM118B",
"YAP1--MAMLD1",
"YAP1--MAMLD2"
)
ependGenes <- c(
"RELA"
)
#' 2: Exact match a list of fusions common in Embryonal tumors
#' as well as fusions containing a particular gene with any other gene
embryFuses <- c(
"CIC--NUTM1",
"MN1--BEND2",
"MN1--CXXC5"
)
embryGenes <- c(
"FOXR2",
"MN1",
"TTYH1"
)
```

### Filter putative oncogenic fusions list

```{r}
allFuseEpend <- filterFusion(df = putativeOncogenicDF,
fuses = ependFuses,
genes = ependGenes)
allFuseEmbry <- filterFusion(df = putativeOncogenicDF,
fuses = embryFuses,
genes = embryGenes)
```

Get the biospecimen IDs that are present in *either* caller file (Arriba, STARFusion).
The fusions in the putative oncogenic fusion file can be retained even if they are not in both callers: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/8fba1753608d8ac0aa3d5d7d63c480b8f00ff0e9/analyses/fusion_filtering/04-project-specific-filtering.Rmd#L242
We use the putative oncogenic file here, therefore any sample that is in either file but does not have a fusion that is relevant to the subtyping tickets is not _missing_ but instead has no evidence of the relevant fusions.

```{r}
specimensUnion<- union(arribaDF$tumor_id, starfusionDF$tumor_id)
```

#### Write non-MB, non-ATRT embryonal fusions to file

```{r}
allFuseEmbry <- allFuseEmbry %>%
prepareOutput(specimensUnion)
```

```{r}
# Are there any missing fusions?
setdiff(embryFuses, colnames(allFuseEmbry))
```

```{r}
allFuseEmbry %>%
mutate(
`CIC--NUTM1` = 0,
`MN1--BEND2` = 0
) %>%
write_tsv(embryFile)
```

#### Write ependymoma fusions to file

```{r}
if (!running_in_ci) {
allFuseEpend %>%
prepareOutput(specimensUnion) %>%
mutate(
`C11orf95--YAP1` = 0,
`LTBP3--RELA` = 0,
`PTEN--TAS2R1` = 0,
`YAP1--MAMLD2` = 0
) %>%
write_tsv(ependFile)
}
```
2,102 changes: 2,102 additions & 0 deletions analyses/fusion-summary/01-fusion-summary.nb.html

Large diffs are not rendered by default.

6 changes: 4 additions & 2 deletions analyses/fusion-summary/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
# Fusion Summary

This module generates summary files for fusions of interest present in biospecimens taken from:

1. Ependymoma tumors
2. Embryonal tumors not from ATRT or MB

To genereate the tables simply run:
To generate the tables run:

```
./run-new-analysis.sh
bash run-new-analysis.sh
```

## General Use
Expand Down
Loading

0 comments on commit 0e642ef

Please sign in to comment.