Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

VAF Cutoff Filter Experiment for SNV callers #173

Merged
merged 176 commits into from
Oct 29, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
176 commits
Select commit Hold shift + click to select a range
37a78fb
Set up the set up
Sep 23, 2019
a25be13
Add circle CI test and Docker config
Sep 23, 2019
4a40580
Add some more comments
Sep 23, 2019
8a4d33b
Merge remote-tracking branch 'upstream/master' into cansav09/snv-call…
Sep 23, 2019
e704c0b
Set up Rprojroot for circle CI test to work better
Sep 23, 2019
56014dd
Fixing Circle CI file.
Sep 23, 2019
03a0770
Change read_tsv to data.table::fread for big file
Sep 23, 2019
391fb52
read in the .gz file
Sep 24, 2019
93cb818
push plot function changes
Sep 24, 2019
908aff9
Fix an error
Sep 25, 2019
ee08152
Merge branch 'master' into cansav09/snv-caller_set_up
cansavvy Sep 26, 2019
6f18dff
Add missing package to Dockerfile
Sep 26, 2019
38cd490
Reduce cosmic file to only the brain sample mutations
Sep 26, 2019
b9d8333
Update README with changes to cosmic file
Sep 26, 2019
a7c5495
re-updated Dockerfile
Sep 26, 2019
efde7ef
Ran a linter on set up script
Sep 26, 2019
e2b43a8
Merge branch 'master' into cansav09/snv-caller_set_up
cansavvy Sep 26, 2019
f4c7534
Comment out of date
cansavvy Sep 26, 2019
83d001a
Get rid of old WGX/WXS bed file set up
Sep 26, 2019
30664ec
Merge branch 'master' into cansav09/snv-caller_set_up
cansavvy Sep 27, 2019
374c6a1
Incorporate initial PR suggestions from @jashapiro and @cbethell
Sep 27, 2019
e99d0b4
Push a working bash script
Sep 27, 2019
5cb619e
Add bash script to circle CI
Sep 27, 2019
5e79092
Add usage section in README and change name of script
Sep 30, 2019
0b9f4e6
Merge branch 'master' into cansav09/snv_calculations
cansavvy Sep 30, 2019
83cad86
Add some more comments
Sep 30, 2019
f383869
Merge remote-tracking branch 'origin/cansav09/snv_calculations' into …
Sep 30, 2019
2e908a1
Correct a couple things in the README
Sep 30, 2019
1146afc
Get rid of remnant comment
Sep 30, 2019
abb7dda
Fix a typo!
Sep 30, 2019
76ec476
Merge branch 'master' into cansav09/snv_calculations
cansavvy Sep 30, 2019
20f5d4c
Add Usage to the TOC
Sep 30, 2019
b63c69f
Merge branch 'master' into cansav09/snv_calculations
cansavvy Sep 30, 2019
cf9999c
Add more documentation to the README
Oct 1, 2019
068b08c
Update Circle CI
Oct 1, 2019
e9a5bde
Push more exact bash script
Oct 1, 2019
efd2264
Fix a couple issues with handling metadata file path
Oct 1, 2019
e732516
Get rid of dev remnants
Oct 1, 2019
bab4319
Couple changes for readability
Oct 1, 2019
655bc00
Merge branch 'master' into cansav09/snv_calculations
cansavvy Oct 1, 2019
748e677
Add some things to README and get rid of part of bash that isn't there
Oct 1, 2019
9511997
Merge remote-tracking branch 'origin/cansav09/snv_calculations' into …
Oct 1, 2019
d25cc45
Found dumb mistake
Oct 1, 2019
ff50ff9
Changed [*] to [+]
Oct 1, 2019
26cc61d
I mean [*] to a [@] which makes more sense.
Oct 1, 2019
f0916eb
Merge branch 'master' into cansav09/snv_calculations
cansavvy Oct 2, 2019
0c0a023
Fix some of the overwrite handling
Oct 3, 2019
4666b00
Merge remote-tracking branch 'origin/cansav09/snv_calculations' into …
Oct 3, 2019
0fa591d
Add template and report script
Oct 3, 2019
84c0f4b
Add run_eval to bash script
Oct 3, 2019
cc5494c
Merge branch 'master' into cansav09/snv-add-eval
cansavvy Oct 4, 2019
fb6ec2a
Get rid of dev remnants
Oct 4, 2019
7e43739
Make better handling of strategy that is called but not there
Oct 4, 2019
1acd00b
Get rid of dev remnant again
Oct 4, 2019
5cce1fe
Get rid of stray `\`
Oct 4, 2019
bffb002
File got misplaced
Oct 4, 2019
b2b2eb4
Get rid of cosmic mutations without proper coordinates
Oct 4, 2019
8800d8d
Fixed a wrinkle with the COSMIC file
Oct 4, 2019
2999c65
Merge branch 'master' into cansav09/snv-add-eval
cansavvy Oct 4, 2019
79fcdaa
re-run a linter on everything.
Oct 4, 2019
4460630
Merge remote-tracking branch 'origin/cansav09/snv-add-eval' into cans…
Oct 4, 2019
689583a
Couple more minor touches
Oct 4, 2019
983eefb
Make set up files not run if they are already existing
cansavvy Oct 7, 2019
fb1a6ad
Fix handling of COSMIC file creation
cansavvy Oct 7, 2019
e67e51e
Incorporate @jashapiro 's suggestions
Oct 8, 2019
87eba81
Add a few more @jashapiro suggestions
Oct 8, 2019
7c0d87c
Merge branch 'master' into cansav09/snv-add-eval
cansavvy Oct 8, 2019
f4cb768
Circle CI does not have a kitematic directory. Get rid
Oct 8, 2019
6d9f09f
Merge remote-tracking branch 'origin/cansav09/snv-add-eval' into cans…
Oct 8, 2019
86b2930
Make warning instead of stop
Oct 8, 2019
c0446e4
Missing `ggplot2::`
Oct 8, 2019
d3647e0
Remove reference files after use to try to reduce memory usage
Oct 8, 2019
435c517
Make one big mutate
Oct 8, 2019
aa466e0
Dumb extra comma
Oct 8, 2019
2ef5558
Add VAF_FILTER option and it's circle CI component
Oct 9, 2019
d2330fd
get rid of typo
Oct 9, 2019
ca4414f
Re-fix Circle CI file
Oct 9, 2019
c123506
Fix WXS if statement
Oct 9, 2019
3faab85
Fix default Circle CI option
Oct 9, 2019
1912d0e
Make indels come last in the barplot
Oct 9, 2019
9eaa124
Fix order of barplot graph
Oct 9, 2019
251366a
Merge branch 'master' into cansav09/snv-add-eval
cansavvy Oct 9, 2019
e5dee1d
Add the comparison notebook
Oct 9, 2019
ccc0c27
Merge branch 'cansav09/snv-add-eval' into snv-comparison
Oct 9, 2019
91c8d99
Add initial comparison notebook
Oct 9, 2019
905bb2a
Notebook adjustments
Oct 17, 2019
ba40502
Fixed dumb label mix-up problem
Oct 17, 2019
1919b05
Further honing things
Oct 17, 2019
0470405
Added the other plots except that one combo plot
Oct 18, 2019
89df186
Add rendered version too
Oct 18, 2019
a1fdac4
Rough draft of all plots here
Oct 18, 2019
eb5ffa1
Get rid of upsettR's blank plot
Oct 18, 2019
288fca2
Refresh the notebook
Oct 18, 2019
0ab7171
Run linter
Oct 18, 2019
5a73511
Merge remote-tracking branch 'origin/snv-comparison' into snv-comparison
Oct 18, 2019
8a942fe
Change name to reflect the data better
Oct 18, 2019
55607e7
Add to CircleCI and Dockerfile
Oct 18, 2019
595bf1e
Merge branch 'master' into snv-comparison
cansavvy Oct 18, 2019
0e186ab
Try to fix busted CircleCI command
Oct 18, 2019
eef394f
Merge remote-tracking branch 'origin/snv-comparison' into snv-comparison
Oct 18, 2019
97815dc
Attempt to fix Dockerfile build problem
Oct 18, 2019
465c69a
Attempt to fix docker build
Oct 21, 2019
d77cf90
Some Docker probs fixed
Oct 21, 2019
b242df2
Push latest notebook and Dockerfile
Oct 21, 2019
e7c3598
Dockerfile appears to be building right.
Oct 21, 2019
bd462d3
Couple more things to finish up Dockerfile adds
Oct 21, 2019
86282aa
development thing was left
Oct 21, 2019
a483e11
Add Template for results discussion
Oct 22, 2019
38f0751
Updates to notebook. Function and etc.
Oct 22, 2019
1e02b71
Refresh notebook
Oct 22, 2019
f836095
Add png saves to each plot
Oct 22, 2019
cd2643a
Added options to calculate_vaf_tmb.R script
Oct 22, 2019
ed4d8f9
Add set up for vaf_filter experiment
Oct 22, 2019
b9d2519
Push separate notebook for vaf_filter exp
Oct 23, 2019
5617f44
Changes to README
Oct 23, 2019
fc5dba9
Merge branch 'master' into snv-comparison
cansavvy Oct 23, 2019
8486fd1
Add more documentation
Oct 23, 2019
cbffd8e
Push VAF cutoff experiment notebook
Oct 23, 2019
d11223c
Update title of vaf notebook
Oct 23, 2019
334e365
Update notebook
Oct 23, 2019
4690bc4
Fix path problem and also extend no_region option
Oct 23, 2019
dbdb9c4
Merge remote-tracking branch 'upstream/master' into snv-comparison
Oct 23, 2019
70e8c35
Fix typo
Oct 23, 2019
518c0d6
Added some documentation
Oct 23, 2019
b23dafa
Prep consensus mutation file saving
Oct 23, 2019
ae6f9fa
Merge remote-tracking branch 'upstream/master' into snv-comparison
Oct 24, 2019
fcd8436
Fix sex chr mislabeling in COSMIC file per @jashapiro 's suggestion
Oct 24, 2019
880d1f0
Get rid of regional analysis. Linter the notebooks Add changes to plots
Oct 24, 2019
6d01ac7
Updated COSMIC file
Oct 24, 2019
a4a6867
Streamline this branch
Oct 25, 2019
5c5eda7
Revert unneeded changes
Oct 25, 2019
b6f86b7
Clean up README
Oct 25, 2019
ccc2d7c
Add --no_region option to README
Oct 25, 2019
dbe7d1e
Merge branch 'snv-caller_revamp' into vaf_filter_experiment
Oct 25, 2019
454158a
Refresh notebook
Oct 25, 2019
5fd84cd
Add these packages to the Dockerfile
Oct 25, 2019
2813d8e
Merge remote-tracking branch 'jaclyn-taroni/master' into snv-caller_r…
Oct 25, 2019
5f5cc22
Merge remote-tracking branch 'jaclyn-taroni/master' into vaf_filter_e…
Oct 25, 2019
48fb2cd
Fix Dockerfile dup problem
Oct 25, 2019
2321453
Get rid of dumb spaces
Oct 25, 2019
dd896fa
edit readme remnant
Oct 25, 2019
c4e0475
Push factor re-level change
Oct 25, 2019
0433e09
Add to CircleCI
Oct 25, 2019
c16127e
refresh R notebook
Oct 25, 2019
29e38d8
Merge remote-tracking branch 'upstream/master' into snv-caller_revamp
Oct 25, 2019
bfd9c5b
Make saving to RDS an option
Oct 25, 2019
9df9084
Add rds saving
Oct 25, 2019
5730483
Rid of dev remnant
Oct 25, 2019
646bfd8
Merge remote-tracking branch 'upstream/master' into vaf_filter_experi…
Oct 25, 2019
d62091a
Merge remote-tracking branch 'upstream/master' into vaf_filter_experi…
Oct 25, 2019
5949068
Fix CircleCI to be it's own test for this
Oct 25, 2019
ee2b0a9
Tidy up spacing
Oct 25, 2019
c6736b2
Move "Add your analysis here" to where it was supposed to be
Oct 25, 2019
40db6f8
Get rid of error causing indent
Oct 25, 2019
16932ca
Update docs
Oct 25, 2019
e89fea7
Merge branch 'master' into rds_option_snv
cansavvy Oct 25, 2019
6691cc4
Get rid of some things that are not needed
Oct 25, 2019
6988103
Make README more parallel to this change
Oct 25, 2019
a8157ca
Neaten up the nb
Oct 25, 2019
c559ce5
switch tsv/rds logic for 01-calc to reflect default
Oct 25, 2019
95366e1
Make region file also save to RDS
Oct 25, 2019
c8de0a9
change to warnings and then default to rds
Oct 25, 2019
1df6185
Merge branch 'rds_option_snv' into vaf_filter_experiment
Oct 28, 2019
43bfc3d
Change notebook to RDS
Oct 28, 2019
c086a03
Have notebook call script to create vaf files
Oct 28, 2019
1c45fb5
Set ggpairs' cardinality_threshold setting to NULL
Oct 28, 2019
ac383f4
Upgroup data.frame so function works again
Oct 28, 2019
8c37735
Try to fix plot saving error
Oct 28, 2019
ca0e883
Forget about saving those plots
Oct 28, 2019
e1e9e32
Merge remote-tracking branch 'upstream/master' into vaf_filter_experi…
Oct 28, 2019
00e0f90
Run through linter
Oct 28, 2019
a07c16d
Update html and fix dir ref
Oct 28, 2019
8bc4392
Change comment.
Oct 28, 2019
42c8c60
Show the name of the bash script
Oct 28, 2019
c3d9a09
Get rid of sample variance section and refresh notebook
Oct 29, 2019
6a58ea8
Merge branch 'master' into vaf_filter_experiment
cansavvy Oct 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,14 @@ jobs:
################################
#### Add your analysis here ####
################################

- run:
name: SNV Caller Evaluation
command: OPENPBTA_VAF_CUTOFF=0.5 ./scripts/run_in_ci.sh bash analyses/snv-callers/run_caller_evals.sh

- run:
name: SNV Caller Analysis
command: OPENPBTA_VAF_CUTOFF=0.5 ./scripts/run_in_ci.sh bash analyses/snv-callers/run_caller_evals.sh
name: SNV Caller VAF Cutoff Experiment
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('analyses/snv-callers/vaf_cutoff_experiment.Rmd', clean = TRUE)"

deploy:
machine:
Expand Down
20 changes: 20 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -78,5 +78,25 @@ RUN apt-get update -qq && apt-get -y --no-install-recommends install \
glmnet \
glmnetUtils

# Install java and rJava for some of the snv plotting comparison packages
RUN apt-get -y update && apt-get install -y \
default-jdk \
r-cran-rjava \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/

# Install for SNV comparison plots
RUN R -e "devtools::install_github('hms-dbmi/UpSetR', ref = 'd5f8f9a3a4d14df95eb823eba37f0285bec3953d', dependencies = TRUE)"
RUN R -e "devtools::install_github('const-ae/ggupset', ref = '7a33263cc5fafdd72a5bfcbebe5185fafe050c73', dependencies = TRUE)"

# GGally and its required packages
RUN apt-get update -qq && apt-get -y --no-install-recommends install \
&& install2.r --error \
lattice \
rpart \
class \
MASS \
GGally

#### Please install your dependencies here
#### Add a comment to indicate what analysis it is required for
39 changes: 39 additions & 0 deletions analyses/snv-callers/scripts/run_caller_evals_vaf_filter_exp.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/bin/bash
# C. Savonen
# CCDL for ALSF 2019

# Purpose:Run TMB calculations at various VAF cutoffs for vaf_cutoff_experiment.Rmd

cd ../..

# Set this so the whole loop stops if there is an error
set -e
set -o pipefail

# VAF cutoffs to use
vaf_cutoff=("0.10" "0.20" "0.30" "0.40")

# Same files as originally run
datasets=("strelka2" "mutect2" "lancet" "vardict")

# WGS files for each caller
wgs_files=("WGS.hg38.strelka2.unpadded.bed" "WGS.hg38.mutect2.unpadded.bed" "WGS.hg38.lancet.300bp_padded.bed" "WGS.hg38.vardict.100bp_padded.bed")

for ((i=0;i<${#vaf_cutoff[@]};i++));
do
for ((j=0;j<${#datasets[@]};j++));
do
echo "Processing dataset: ${datasets[$j]}"
Rscript analyses/snv-callers/scripts/01-calculate_vaf_tmb.R \
--label ${datasets[$j]} \
--output analyses/snv-callers/results/vaf_cutoff_data/cutoff_${vaf_cutoff[$i]} \
--file_format rds \
--maf data/pbta-snv-${datasets[$j]}.vep.maf.gz \
--metadata data/pbta-histologies.tsv \
--bed_wgs data/${wgs_files[$j]} \
--bed_wxs data/WXS.hg38.100bp_padded.bed \
--vaf_filter ${vaf_cutoff[$i]} \
--overwrite \
--no_region
done
done
236 changes: 236 additions & 0 deletions analyses/snv-callers/vaf_cutoff_experiment.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
---
title: "VAF Cutoffs and TMB Calculation"
output:
html_notebook:
toc: TRUE
toc_float: TRUE
author: C. Savonen for ALSF CCDL
date: 2019
---

**Purpose:** How do the TMB calculations change with adjustments to the VAF cutoff
filter? More specifically, if we increase VAF filter cutoff, do the callers' TMB stats
relate to each other more?

### Summary of Findings:

cansavvy marked this conversation as resolved.
Show resolved Hide resolved
VarDict is the least related to the other callers, but increasing the VAF filter cutoff does somewhat
recover how its TMB stats relate to the other callers' TMB stats.
Regardless, VarDict's data should still be dropped as it is too aberrant from the other callers as
well as overly sensitive.

#### Usage

To run this from the command line, use:
```
Rscript -e "rmarkdown::render('analyses/snv-callers/vaf_cutoff_experiment.Rmd',
clean = TRUE)"
```

_This assumes you are in the top directory of the repository._

## Setup

#### Packages and functions

Read in set up script.

```{r}
if (!("GGally" %in% installed.packages())) {
install.packages("GGally")
}
if (!("ggupset" %in% installed.packages())) {
install.packages("ggupset")
}
# Magrittr pipe
`%>%` <- dplyr::`%>%`
```

Set up output directories.

```{r}
base_results_dir <- "results"
base_plots_dir <- "plots"
```

Make new directories for the comparison analysis.

```{r}
vaf_cutoff_results_dir <- file.path(base_results_dir, "vaf_cutoff_data")

# Make caller specific plots folder
if (!dir.exists(vaf_cutoff_results_dir)) {
dir.create(vaf_cutoff_results_dir)
}
```

Function for creating a TMB correlation matrix amongst the callers for a particular cutoff.

```{r}
cor_matrix_plot <- function(cutoff) {
cansavvy marked this conversation as resolved.
Show resolved Hide resolved
# For a given VAF cutoff, calculate TMB correlations amongst the callers and plot it using
# GGally::ggpairs
#
# Args:
# cutoff: The VAF cutoff set to analyze
#
# Returns:
# A correlation matrix plot amongst the callers

# Isolate the TMB stats for this cutoff
cutoff_df <- tmb_filter_long %>%
dplyr::filter(vaf_cutoff == cutoff) %>%
dplyr::select("lancet", "mutect2", "strelka2", "vardict")

# Make the plot
cor_plot <- GGally::ggpairs(cutoff_df, mapping = ggplot2::aes(alpha = 0.05)) +
ggplot2::theme_classic() +
ggplot2::ggtitle(paste0("VAF cutoff: ", cutoff))

# Return the plot
return(cor_plot)
}
```

Function for correlating TMB statistics for each VAF cutoff set.

```{r}
cor_by_vaf <- function(cutoff = 0.1) {
# For a given VAF cutoff, isolate the TMB statistics and correlate them.
#
# Args:
# cutoff: the VAF cutoff to calculate the correlation by.
#
# Returns:
# R values for Spearman's correlation amongst the callers

cutoff_df <- tmb_filter_long %>%
dplyr::filter(vaf_cutoff == cutoff) %>%
dplyr::select(-Tumor_Sample_Barcode, -vaf_cutoff)

# Do correlations between callers
cor_res <- cor(cutoff_df, method = "spearman", use = "na.or.complete")

# This is to remove redundancy as upper correlation matrix == lower
cor_res[upper.tri(cor_res, diag = TRUE)] <- NA

# Format into long format data.frame
cor_df <- reshape2::melt(cor_res, na.rm = TRUE, value.name = "cor") %>%
dplyr::mutate(
compare = paste0(Var1, "-", Var2),
cutoff
)

# Return a data.frame ready for being added to the bigger data.frame for plotting.
return(cor_df)
}
```

## Run the data with different VAF filter cutoffs

We need to re-calculate TMB statistics with various VAF filter cutoffs.
We will run this script to do it: `system("bash scripts/run_caller_evals_vaf_filter_exp.sh")`

```{r include=FALSE}
system("bash scripts/run_caller_evals_vaf_filter_exp.sh")
```

## Import data

Get the lists of each type of file.

```{r}
# Make a list of the vaf filter experiment tmb files
filter_tmb_files <- list.files(vaf_cutoff_results_dir,
pattern = "_tmb.rds$", recursive = TRUE,
full.names = TRUE
)
```

Take a look at the TMBs from different VAF filter cutoffs.

```{r}
filter_tmb_files
```

Convert the VAF cutoffs into their own vector.

```{r}
vaf_filter <- stringr::word(filter_tmb_files, sep = "/", 3)
vaf_filter <- as.numeric(gsub("cutoff_", "", vaf_filter))
vaf_filter
```

Get caller information as a vector.

```{r}
caller_filter <- stringr::word(filter_tmb_files, sep = "/", 4)
caller_filter <- gsub("_tmb.rds", "", caller_filter)
caller_filter
```

Read in the data.
Name each with their caller and VAF filter cutoff.

```{r include=FALSE}
filter_tmb_list <- lapply(filter_tmb_files, function(file_name) {
readr::read_rds(file_name) %>% dplyr::ungroup()
})
names(filter_tmb_list) <- paste0(caller_filter, "_", vaf_filter)
```

Turn this into a data frame.

```{r}
tmb_filter_df <- dplyr::bind_rows(filter_tmb_list, .id = "NAME") %>%
dplyr::mutate(
caller = stringr::word(NAME, sep = "_", 1),
vaf_cutoff = as.numeric(stringr::word(NAME, sep = "_", 2))
)
```

Make this into a long form data.frame for plotting purposes.

```{r}
tmb_filter_long <- tmb_filter_df %>%
dplyr::distinct(Tumor_Sample_Barcode, caller, tmb, vaf_cutoff) %>%
tidyr::spread(caller, tmb)
```

## TMB correlations across callers

First we'll make a series of corrletion matrix plots.

```{r}
lapply(unique(vaf_filter), cor_matrix_plot)
```

Let's make a summary plot of the Spearman's correlations.
Get correlations by vaf cutoff.

```{r}
# Run this on each set
cor_by_vaf_df <- do.call(
"rbind.data.frame",
lapply(unique(vaf_filter), cor_by_vaf)
)
```

Plot the correlations and the vaf_cutoff per each caller combination.

```{r}
ggplot2::ggplot(cor_by_vaf_df, ggplot2::aes(x = reorder(compare, -cor), fill = as.factor(cutoff), y = cor)) +
ggplot2::geom_point(shape = 21, size = 2.5, stroke = .75, color = "black") +
ggplot2::theme_classic() +
ggupset::scale_x_mergelist(sep = "-") +
ggupset::axis_combmatrix(sep = "-") +
ggplot2::xlab("") +
ggplot2::ylab("TMB Spearman Correlation") +
ggplot2::scale_fill_brewer("VAF Filter Cutoff", palette = "YlGnBu")
```

## Session Info

```{r}
sessionInfo()
```
3,300 changes: 3,300 additions & 0 deletions analyses/snv-callers/vaf_cutoff_experiment.nb.html

Large diffs are not rendered by default.