This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 67
PR 2 of 4 for SNV Consensus: Merge callers script #184
Merged
jaclyn-taroni
merged 180 commits into
AlexsLemonade:master
from
cansavvy:merge_callers_script
Oct 30, 2019
Merged
Changes from 173 commits
Commits
Show all changes
180 commits
Select commit
Hold shift + click to select a range
37a78fb
Set up the set up
a25be13
Add circle CI test and Docker config
4a40580
Add some more comments
8a4d33b
Merge remote-tracking branch 'upstream/master' into cansav09/snv-call…
e704c0b
Set up Rprojroot for circle CI test to work better
56014dd
Fixing Circle CI file.
03a0770
Change read_tsv to data.table::fread for big file
391fb52
read in the .gz file
93cb818
push plot function changes
908aff9
Fix an error
ee08152
Merge branch 'master' into cansav09/snv-caller_set_up
cansavvy 6f18dff
Add missing package to Dockerfile
38cd490
Reduce cosmic file to only the brain sample mutations
b9d8333
Update README with changes to cosmic file
a7c5495
re-updated Dockerfile
efde7ef
Ran a linter on set up script
e2b43a8
Merge branch 'master' into cansav09/snv-caller_set_up
cansavvy f4c7534
Comment out of date
cansavvy 83d001a
Get rid of old WGX/WXS bed file set up
30664ec
Merge branch 'master' into cansav09/snv-caller_set_up
cansavvy 374c6a1
Incorporate initial PR suggestions from @jashapiro and @cbethell
e99d0b4
Push a working bash script
5cb619e
Add bash script to circle CI
5e79092
Add usage section in README and change name of script
0b9f4e6
Merge branch 'master' into cansav09/snv_calculations
cansavvy 83cad86
Add some more comments
f383869
Merge remote-tracking branch 'origin/cansav09/snv_calculations' into …
2e908a1
Correct a couple things in the README
1146afc
Get rid of remnant comment
abb7dda
Fix a typo!
76ec476
Merge branch 'master' into cansav09/snv_calculations
cansavvy 20f5d4c
Add Usage to the TOC
b63c69f
Merge branch 'master' into cansav09/snv_calculations
cansavvy cf9999c
Add more documentation to the README
068b08c
Update Circle CI
e9a5bde
Push more exact bash script
efd2264
Fix a couple issues with handling metadata file path
e732516
Get rid of dev remnants
bab4319
Couple changes for readability
655bc00
Merge branch 'master' into cansav09/snv_calculations
cansavvy 748e677
Add some things to README and get rid of part of bash that isn't there
9511997
Merge remote-tracking branch 'origin/cansav09/snv_calculations' into …
d25cc45
Found dumb mistake
ff50ff9
Changed [*] to [+]
26cc61d
I mean [*] to a [@] which makes more sense.
f0916eb
Merge branch 'master' into cansav09/snv_calculations
cansavvy 0c0a023
Fix some of the overwrite handling
4666b00
Merge remote-tracking branch 'origin/cansav09/snv_calculations' into …
0fa591d
Add template and report script
84c0f4b
Add run_eval to bash script
cc5494c
Merge branch 'master' into cansav09/snv-add-eval
cansavvy fb6ec2a
Get rid of dev remnants
7e43739
Make better handling of strategy that is called but not there
1acd00b
Get rid of dev remnant again
5cce1fe
Get rid of stray `\`
bffb002
File got misplaced
b2b2eb4
Get rid of cosmic mutations without proper coordinates
8800d8d
Fixed a wrinkle with the COSMIC file
2999c65
Merge branch 'master' into cansav09/snv-add-eval
cansavvy 79fcdaa
re-run a linter on everything.
4460630
Merge remote-tracking branch 'origin/cansav09/snv-add-eval' into cans…
689583a
Couple more minor touches
983eefb
Make set up files not run if they are already existing
cansavvy fb1a6ad
Fix handling of COSMIC file creation
cansavvy e67e51e
Incorporate @jashapiro 's suggestions
87eba81
Add a few more @jashapiro suggestions
7c0d87c
Merge branch 'master' into cansav09/snv-add-eval
cansavvy f4cb768
Circle CI does not have a kitematic directory. Get rid
6d9f09f
Merge remote-tracking branch 'origin/cansav09/snv-add-eval' into cans…
86b2930
Make warning instead of stop
c0446e4
Missing `ggplot2::`
d3647e0
Remove reference files after use to try to reduce memory usage
435c517
Make one big mutate
aa466e0
Dumb extra comma
2ef5558
Add VAF_FILTER option and it's circle CI component
d2330fd
get rid of typo
ca4414f
Re-fix Circle CI file
c123506
Fix WXS if statement
3faab85
Fix default Circle CI option
1912d0e
Make indels come last in the barplot
9eaa124
Fix order of barplot graph
251366a
Merge branch 'master' into cansav09/snv-add-eval
cansavvy e5dee1d
Add the comparison notebook
ccc0c27
Merge branch 'cansav09/snv-add-eval' into snv-comparison
91c8d99
Add initial comparison notebook
905bb2a
Notebook adjustments
ba40502
Fixed dumb label mix-up problem
1919b05
Further honing things
0470405
Added the other plots except that one combo plot
89df186
Add rendered version too
a1fdac4
Rough draft of all plots here
eb5ffa1
Get rid of upsettR's blank plot
288fca2
Refresh the notebook
0ab7171
Run linter
5a73511
Merge remote-tracking branch 'origin/snv-comparison' into snv-comparison
8a942fe
Change name to reflect the data better
55607e7
Add to CircleCI and Dockerfile
595bf1e
Merge branch 'master' into snv-comparison
cansavvy 0e186ab
Try to fix busted CircleCI command
eef394f
Merge remote-tracking branch 'origin/snv-comparison' into snv-comparison
97815dc
Attempt to fix Dockerfile build problem
465c69a
Attempt to fix docker build
d77cf90
Some Docker probs fixed
b242df2
Push latest notebook and Dockerfile
e7c3598
Dockerfile appears to be building right.
bd462d3
Couple more things to finish up Dockerfile adds
86282aa
development thing was left
a483e11
Add Template for results discussion
38f0751
Updates to notebook. Function and etc.
1e02b71
Refresh notebook
f836095
Add png saves to each plot
cd2643a
Added options to calculate_vaf_tmb.R script
ed4d8f9
Add set up for vaf_filter experiment
b9d2519
Push separate notebook for vaf_filter exp
5617f44
Changes to README
fc5dba9
Merge branch 'master' into snv-comparison
cansavvy 8486fd1
Add more documentation
cbffd8e
Push VAF cutoff experiment notebook
d11223c
Update title of vaf notebook
334e365
Update notebook
4690bc4
Fix path problem and also extend no_region option
dbdb9c4
Merge remote-tracking branch 'upstream/master' into snv-comparison
70e8c35
Fix typo
518c0d6
Added some documentation
b23dafa
Prep consensus mutation file saving
ae6f9fa
Merge remote-tracking branch 'upstream/master' into snv-comparison
fcd8436
Fix sex chr mislabeling in COSMIC file per @jashapiro 's suggestion
880d1f0
Get rid of regional analysis. Linter the notebooks Add changes to plots
6d01ac7
Updated COSMIC file
a4a6867
Streamline this branch
5c5eda7
Revert unneeded changes
b6f86b7
Clean up README
ccc2d7c
Add --no_region option to README
2813d8e
Merge remote-tracking branch 'jaclyn-taroni/master' into snv-caller_r…
c706e16
Merge branch 'snv-caller_revamp' into compare_callers_nb
a8e10c6
Update CircleCI
57e3e2f
Get rid of region analysis to save on memory usage
93ff0e0
Add documentation about comparison to README
271dc2b
Merge remote-tracking branch 'upstream/master' into compare_callers_nb
c167dfb
Merge remote-tracking branch 'upstream/master' into compare_callers_nb
7dcd5d6
Linter and switch to rds
6d0a51d
Upload results
406e821
refresh notebook and results
9ce4c41
Updated notebook
76bca09
Change name of bash script and make it run the notebook too
6910580
Update README
49fddf7
column problem *should* be fixed
aa91345
Merge branch 'master' into compare_callers_nb
cansavvy b66fb62
Push refreshed notebook and results
38d220d
Merge remote-tracking branch 'origin/compare_callers_nb' into compare…
a5fdddc
Add a bit more info in the README
d635d06
Merge branch 'master' into compare_callers_nb
cansavvy 609b319
attempt to fix CircleCI's lack of grid package
80c920c
Add grid to Dockerfile explicitly
30584b0
Made some things a tad less janky
7153398
Push refreshed notebook. Get rid of grid install from Dockerfile
38c254e
Try older version of UpSetR for shiggles
45e9834
Get rid of `library(grid)` so we know if the version of UpSetR has it…
29381a0
Test CRAN installation fo UpSetR
d6a8af1
beginning to reformulate script organization for consensus file creation
9a11173
Push reorganized scripts and notebooks
1bccb5c
renaming and reorganzing and styling
6961855
suppressWarnings for coercing variables. We expect that
912f182
get rid of some development remnants
3bbafa0
Streamline the PR
574e1e4
Get rid of dummy file
2c3ea05
Streamline even more
d71d15b
Get rid of superfluous pound signs
12bb2ca
Actually add merge callers script
35f7b70
get rid of outdated source(functions) command
c192773
Make --overwrite actually functional
f8d7a66
re-run linter after those adds
e8e9460
Merge branch 'master' into merge_callers_script
cansavvy 3cb1245
Get rid of outdated `-w`
f13bea0
Merge remote-tracking branch 'origin/merge_callers_script' into merge…
8d322a3
Fix example
8d55eb9
Found an outdated comment
a323dfa
Address the outdated comments spotted by @jaclyn-taroni
b61d162
Merge remote-tracking 'upstream/master' into merge_callers_script
jaclyn-taroni acab221
Make sure #190 changes are included
jaclyn-taroni File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,304 @@ | ||
# Merge the caller VAF and TMB files | ||
# | ||
# 2019 | ||
# | ||
# C. Savonen for ALSF - CCDL | ||
# | ||
# Purpose: Merge callers' TMB and VAF files into total files with a column `caller` | ||
# to designate their origin. | ||
|
||
# Files Output: | ||
# "all_callers_vaf.<file_format>" - contains all the VAF file information for all callers. | ||
# "all_callers_tmb.<file_format>" - contains all the TMB file information for all callers. | ||
# "mutation_id_list.<file_format>" - a full list of the mutations that can be | ||
# used for an UpSetR graph | ||
# "callers_per_mutation.<file_format>" - contains a breakdown for each mutation of what callers | ||
# called it. Will be used to identify the consensus mutations. | ||
|
||
# Option descriptions | ||
# --vaf : Parent folder containing the vaf and tmb files for each folder. | ||
# <caller_name>_vaf.<file_format> | ||
# <caller_name>_tmb.<file_format> | ||
# --file_format: What type of file format were the vaf and tmb files saved as? Options are | ||
# "rds" or "tsv". Default is "rds". | ||
# --output : Where you would like the output from this script to be stored. | ||
# --overwrite : If TRUE, will overwrite any reports of the same name. Default is | ||
# FALSE | ||
# | ||
# | ||
# Command line example: | ||
# | ||
# Rscript 03-merge_callers.R \ | ||
# -v results \ | ||
# -o strelka2 \ | ||
# -s wxs \ | ||
# -w | ||
# | ||
# Establish base dir | ||
root_dir <- rprojroot::find_root(rprojroot::has_dir(".git")) | ||
|
||
# Magrittr pipe | ||
`%>%` <- dplyr::`%>%` | ||
|
||
# Load library: | ||
library(optparse) | ||
|
||
#--------------------------------Set up options--------------------------------# | ||
# Set up optparse options | ||
option_list <- list( | ||
make_option( | ||
opt_str = c("-v", "--vaf"), type = "character", | ||
default = NULL, help = "Path to folder with the output files | ||
from 01-calculate_vaf_tmb. Should include the VAF, TMB, and | ||
region TSV files", | ||
metavar = "character" | ||
), | ||
make_option( | ||
opt_str = c("-f", "--file_format"), type = "character", default = "rds", | ||
help = "What type of file format were the vaf and tmb files saved as? | ||
Options are 'rds' or 'tsv'. Default is 'rds'.", | ||
metavar = "character" | ||
), | ||
make_option( | ||
opt_str = c("-o", "--output"), type = "character", | ||
default = NULL, help = "Path to folder where you would like the | ||
output from this script to be stored.", | ||
metavar = "character" | ||
), | ||
make_option( | ||
opt_str = c("--overwrite"), action = "store_true", | ||
default = FALSE, help = "If TRUE, will overwrite any reports of | ||
the same name. Default is FALSE", | ||
metavar = "character" | ||
) | ||
) | ||
|
||
# Parse options | ||
opt <- parse_args(OptionParser(option_list = option_list)) | ||
|
||
########################### Check options specified ############################ | ||
# Bring along the file suffix. Make to lower. | ||
file_suffix <- tolower(opt$file_format) | ||
|
||
# Check that the file format is supported | ||
if (!(file_suffix %in% c("rds", "tsv"))) { | ||
warning("Option used for file format (-f) is not supported. Only 'tsv' or 'rds' | ||
files are supported. Defaulting to rds.") | ||
opt$file_format <- "rds" | ||
file_suffix <- "rds" | ||
} | ||
|
||
# Normalize this file path | ||
opt$vaf <- file.path(root_dir, opt$vaf) | ||
|
||
# Check the output directory exists | ||
cansavvy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (!dir.exists(opt$vaf)) { | ||
stop(paste("Error:", opt$vaf, "does not exist")) | ||
} | ||
|
||
# Exclude the non-caller directories | ||
caller_dirs <- grep("vaf_cutoff|consensus", | ||
dir(opt$vaf, full.names = TRUE), | ||
invert = TRUE, | ||
value = TRUE | ||
) | ||
|
||
# Print this out to check | ||
message("Will merge all VAF and TMB files in these folders: \n", paste0(caller_dirs, "\n")) | ||
|
||
# Get a list of vaf files | ||
vaf_files <- sapply(caller_dirs, | ||
list.files, | ||
pattern = paste0("_vaf.", file_suffix), | ||
recursive = TRUE, full.names = TRUE | ||
) | ||
|
||
# Print this out to check | ||
message("Merging these VAF files: \n", paste0(vaf_files, "\n")) | ||
|
||
# Get a list of tmb files | ||
tmb_files <- sapply(caller_dirs, | ||
list.files, | ||
pattern = paste0("_tmb.", file_suffix), | ||
recursive = TRUE, full.names = TRUE | ||
) | ||
|
||
# Print this out to check | ||
message("Merging these TMB files: \n", paste0(tmb_files, "\n")) | ||
|
||
################################### Set Up ##################################### | ||
# Set and make the plots directory | ||
opt$output <- file.path(root_dir, opt$output) | ||
|
||
# Make caller specific plots folder | ||
cansavvy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (!dir.exists(opt$output)) { | ||
dir.create(opt$output, recursive = TRUE) | ||
} | ||
|
||
# Declare output file paths | ||
all_vaf_file <- file.path(opt$output, "all_callers_vaf.rds") | ||
all_tmb_file <- file.path(opt$output, "all_callers_tmb.rds") | ||
mut_id_file <- file.path(opt$output, "mutation_id_list.rds") | ||
call_per_mut_file <- file.path(opt$output, "callers_per_mutation.rds") | ||
|
||
##################### Check for files if overwrite is FALSE #################### | ||
# If overwrite is set to FALSE, check if these exist before continuing | ||
if (!opt$overwrite) { | ||
# Make a list of the output files | ||
output_files <- c(all_vaf_file, all_tmb_file, mut_id_file, call_per_mut_file) | ||
|
||
# Find out which of these exist | ||
existing_files <- file.exists(output_files) | ||
|
||
# If all files exist; stop | ||
if (all(existing_files)) { | ||
stop(cat( | ||
"Stopping; --overwrite is not being used and all output files already exist | ||
in the designated --output directory." | ||
)) | ||
} | ||
# If some files exist, print a warning: | ||
if (any(existing_files)) { | ||
warning(cat( | ||
"Some output files already exist and will not be overwritten unless you use --overwrite: \n", | ||
paste0(output_files[which(existing_files)], "\n") | ||
)) | ||
} | ||
} | ||
|
||
########################### Make Master VAF file ############################### | ||
# If the file exists or the overwrite option is not being used, do not write the | ||
# merged VAF file. | ||
if (file.exists(all_vaf_file) && !opt$overwrite) { | ||
# Stop if this file exists and overwrite is set to FALSE | ||
warning(cat( | ||
"The merged VAF file already exists: \n", | ||
all_vaf_file, "\n", | ||
"Use --overwrite if you want to overwrite it." | ||
)) | ||
} else { | ||
# Get the caller names | ||
caller_names <- stringr::word(vaf_files, sep = "/", -2) | ||
|
||
# Read in vaf files for all callers | ||
if (opt$file_format == "tsv") { | ||
vaf_list <- lapply(vaf_files, readr::read_tsv) | ||
} else { | ||
vaf_list <- lapply(vaf_files, readr::read_rds) | ||
} | ||
|
||
# Read in the other files to match the first | ||
vaf_list <- lapply(vaf_list, function(df) { | ||
# Make it so it is more easily combined with the other files | ||
df %>% | ||
# Attempt to make numeric columns where that doesn' kick back an "NA" | ||
dplyr::mutate_at(dplyr::vars(which(!is.na(as.numeric(t(df[1, ]))))), as.numeric) %>% | ||
# Aliquot id sometimes contains letters and sometimes numbers across the callers | ||
dplyr::mutate( | ||
aliquot_id = as.character(aliquot_id), | ||
variant_qual = as.character(variant_qual) | ||
) %>% | ||
# Turn these columns into characters because otherwise they cause trouble. | ||
dplyr::mutate_at(dplyr::vars(dplyr::contains("AF", ignore.case = FALSE)), as.character) %>% | ||
# Get rid of the few if any duplicate entries. | ||
dplyr::distinct(mutation_id, .keep_all = TRUE) | ||
}) | ||
|
||
# Carry over the callers' names | ||
names(vaf_list) <- caller_names | ||
|
||
# Print progress message | ||
message("Saving master VAF file to: \n", all_vaf_file) | ||
|
||
# Combine and save VAF file | ||
vaf_df <- suppressWarnings(dplyr::bind_rows(vaf_list, .id = "caller")) %>% | ||
cansavvy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
dplyr::mutate(caller = factor(caller)) %>% | ||
# Write to RDS file | ||
readr::write_rds(all_vaf_file) | ||
} | ||
########################### Make Master TMB file ############################### | ||
# If the file exists or the overwrite option is not being used, do not write the | ||
# merged TMB file. | ||
if (file.exists(all_tmb_file) && !opt$overwrite) { | ||
# Stop if this file exists and overwrite is set to FALSE | ||
warning(cat( | ||
"The merged TMB file already exists: \n", | ||
all_tmb_file, "\n", | ||
"Use --overwrite if you want to overwrite it." | ||
)) | ||
} else { | ||
if (opt$file_format == "tsv") { | ||
tmb_list <- lapply(tmb_files, readr::read_tsv) | ||
} else { | ||
tmb_list <- lapply(tmb_files, readr::read_rds) | ||
} | ||
|
||
# Carry over the callers' names | ||
names(tmb_list) <- caller_names | ||
|
||
# Print progress message | ||
message("Saving master TMB file to: \n", all_tmb_file) | ||
|
||
# Combine and save TMB file | ||
tmb_df <- dplyr::bind_rows(tmb_list, .id = "caller") %>% | ||
dplyr::mutate(caller = factor(caller)) %>% | ||
readr::write_rds(all_tmb_file) | ||
} | ||
############################# Make mutation id list ############################ | ||
# If the file exists or the overwrite option is not being used, do not write mutation id file. | ||
if (file.exists(mut_id_file) && !opt$overwrite) { | ||
# Stop if this file exists and overwrite is set to FALSE | ||
warning(cat( | ||
"The mutation id list file already exists: \n", | ||
mut_id_file, "\n", | ||
"Use --overwrite if you want to overwrite it." | ||
)) | ||
} else { | ||
mutation_id_list <- lapply(vaf_list, function(caller) caller$mutation_id) | ||
|
||
# Print progress message | ||
message("Saving: \n", mut_id_file) | ||
|
||
readr::write_rds(mutation_id_list, mut_id_file) | ||
} | ||
############################# Callers per mutation df ########################## | ||
# If the file exists or the overwrite option is not being used, do not write the | ||
# callers per mutation file. | ||
if (file.exists(call_per_mut_file) && !opt$overwrite) { | ||
# Stop if this file exists and overwrite is set to FALSE | ||
warning(cat( | ||
"The mutation id list file already exists: \n", | ||
call_per_mut_file, "\n", | ||
"Use --overwrite if you want to overwrite it." | ||
)) | ||
} else { | ||
# Make a string that says what callers call each mutation | ||
callers_per_mutation <- tapply(vaf_df$caller, | ||
vaf_df$mutation_id, | ||
paste0, | ||
collapse = "-" | ||
) %>% | ||
# Make into a data.frame | ||
as.data.frame() %>% | ||
tibble::rownames_to_column("mutation_id") | ||
|
||
# Obtain the median VAF for each mutation | ||
vaf_med <- tapply( | ||
vaf_df$vaf, | ||
vaf_df$mutation_id, | ||
median | ||
) %>% | ||
# Make into a data.frame | ||
as.data.frame() %>% | ||
tibble::rownames_to_column("mutation_id") | ||
|
||
# Print progress message | ||
message("Saving: \n", call_per_mut_file) | ||
|
||
# Join the median VAF and the callers that call that mutation into one data.frame | ||
callers_per_mutation <- callers_per_mutation %>% | ||
dplyr::inner_join(vaf_med, by = "mutation_id") %>% | ||
# Make column names more sensible | ||
dplyr::rename(caller_combo = "..x", median_vaf = "..y") %>% | ||
readr::write_rds(call_per_mut_file) | ||
} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this
-w
correspond to?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used to have
--overwrite
have a short flag but then I deleted it. Forgot to delete it from the example.