Part 6 of n: SNV Callers - Run evaluations script #144

cansavvy · 2019-10-03T17:50:11Z

Purpose/implementation

This is the fifth PR in a series that are preparing a pipeline to perform an initial analysis and calculation of each MAF file for each SNV caller. At this point, the SNV callers that will be evaluated are MuTect2, Strelka2, VarDict, and Lancet. (VarDict and Lancet are temporarily missing WXS samples).

What these scripts do:
02-run_eval.R is called from the run_caller_evals.sh bash script and makes plots from the files created in 01-calculate_vaf_tmb.R. It uses functions from util/plot_functions.R.

Here is a sample report. It is made of a subset of the Strelka2 data. Just picture that wherever this sample report says "11111.tsv" that would be replaced with whatever the algorithm's name is e.g. "Strelka2".

Issue

For SNV caller comparison #103 and Tumor Mutation Burden #3 and sort of #11

Directions for reviewers

Main things I am looking for advice on:

What do you think about the general structure of the script and its options?
What do you think about how it checks for the files it needs? Is there a better way to do this?
How about the report itself? Is there things that should be changed?
How about the plots? Things that can be tweaked?

Results

No results just yet, because we don't yet have all the data.

Docker and continuous integration

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
(From Part 4 of n: SNV Caller Analysis: Set Up Script #126, no changes have been made to the Dockerfile.)
This analysis has been added to continuous integration.
(It is called from the bash script that is already in the CI test)

PR Checklist

Run a linter
Set the seed (if applicable)
Comments and/or documentation up to date
Double check your paths

…er_set_up

…cansav09/snv_calculations

jashapiro

This looks good, aside from the fact that I couldn't fully test it due to memory limitations in generating the setup files. 😞

It might be nice to add a bit more feedback in the vaf_tmb script so we can see exactly when it fails. it might be possible to tweak things to cut down memory usage, but that can also be saved for a later iteration.

In the mean time, could you include an example result file (or all of them) in the PR? I think it would help to see one even if the data are not finalized.

Suggestion:
Add

set -e
set -o pipefail

to bash script to make it fail if any of the substeps fail.

analyses/snv-callers/run_caller_evals.sh

analyses/snv-callers/scripts/00-set_up.R

analyses/snv-callers/scripts/02-run_eval.R

analyses/snv-callers/util/wrangle_functions.R

cansavvy · 2019-10-08T18:14:07Z

In the mean time, could you include an example result file (or all of them) in the PR? I think it would help to see one even if the data are not finalized.

I had included a sample report in my PR intro. See above

Or do you also want the TMB, VAF, and Region file samples?

Here are example output files:
example_files.zip

…av09/snv-add-eval

jashapiro · 2019-10-08T18:37:52Z

I had included a sample report in my PR intro. See above

Sorry, I had missed that!

Looking at it, a couple more comments:
For the mutation plots, consider sorting del and ins to the beginning or end; their alphabetical order doesn't make much sense.

Consider adding a median line to the TMB plot? Also, I'm going to re-up my call for a sina plot there rather than a jitter. For the why, look at your plot with the lonely Ependymoma point way out to the left. It is hard to know at a glance which data that one goes with. Or maybe add color by tissue (with no legend).

cansavvy · 2019-10-08T19:19:10Z

Consider adding a median line to the TMB plot? Also, I'm going to re-up my call for a sina plot there rather than a jitter. For the why, look at your plot with the lonely Ependymoma point way out to the left. It is hard to know at a glance which data that one goes with. Or maybe add color by tissue (with no legend).

Can I make a motion that we will go back and change aesthetics of the plots when we have the real data? Because for example, there isn't only one Epnedymoma point in reality, but my tester dataset is only plotting 3 samples. So a lot of these plots will look different after the "real" data is ran anyway.

In hopeful preparation for you being okay with this, I've made an issue to track this.

analyses/snv-callers/util/wrangle_functions.R

Candace Savonen and others added 30 commits September 23, 2019 13:54

Set up the set up

37a78fb

Add circle CI test and Docker config

a25be13

Add some more comments

4a40580

Merge remote-tracking branch 'upstream/master' into cansav09/snv-call…

8a4d33b

…er_set_up

Set up Rprojroot for circle CI test to work better

e704c0b

Fixing Circle CI file.

56014dd

Change read_tsv to data.table::fread for big file

03a0770

read in the .gz file

391fb52

push plot function changes

93cb818

Fix an error

908aff9

Merge branch 'master' into cansav09/snv-caller_set_up

ee08152

Add missing package to Dockerfile

6f18dff

Reduce cosmic file to only the brain sample mutations

38cd490

Update README with changes to cosmic file

b9d8333

re-updated Dockerfile

a7c5495

Ran a linter on set up script

efde7ef

Merge branch 'master' into cansav09/snv-caller_set_up

e2b43a8

Comment out of date

f4c7534

Get rid of old WGX/WXS bed file set up

83d001a

Merge branch 'master' into cansav09/snv-caller_set_up

30664ec

Incorporate initial PR suggestions from @jashapiro and @cbethell

374c6a1

Push a working bash script

e99d0b4

Add bash script to circle CI

5cb619e

Add usage section in README and change name of script

5e79092

Merge branch 'master' into cansav09/snv_calculations

0b9f4e6

Add some more comments

83cad86

Merge remote-tracking branch 'origin/cansav09/snv_calculations' into …

f383869

…cansav09/snv_calculations

Correct a couple things in the README

2e908a1

Get rid of remnant comment

1146afc

Fix a typo!

abb7dda

cansavvy added 2 commits October 7, 2019 12:00

Make set up files not run if they are already existing

983eefb

Fix handling of COSMIC file creation

fb1a6ad

jashapiro reviewed Oct 7, 2019

View reviewed changes

Candace Savonen and others added 5 commits October 8, 2019 14:26

Incorporate @jashapiro 's suggestions

e67e51e

Add a few more @jashapiro suggestions

87eba81

Merge branch 'master' into cansav09/snv-add-eval

7c0d87c

Circle CI does not have a kitematic directory. Get rid

f4cb768

Merge remote-tracking branch 'origin/cansav09/snv-add-eval' into cans…

6d9f09f

…av09/snv-add-eval

Candace Savonen added 2 commits October 8, 2019 14:43

Make warning instead of stop

86b2930

Missing ggplot2::

c0446e4

cansavvy mentioned this pull request Oct 8, 2019

OpenPBTA: Adjust plot aesthetics after "real" data has been run cansavvy/openpbta-notebook-concept#3

Closed

Remove reference files after use to try to reduce memory usage

d3647e0

cansavvy commented Oct 8, 2019

View reviewed changes

analyses/snv-callers/util/wrangle_functions.R Show resolved Hide resolved

Candace Savonen and others added 10 commits October 8, 2019 16:19

Make one big mutate

435c517

Dumb extra comma

aa466e0

Add VAF_FILTER option and it's circle CI component

2ef5558

get rid of typo

d2330fd

Re-fix Circle CI file

ca4414f

Fix WXS if statement

c123506

Fix default Circle CI option

3faab85

Make indels come last in the barplot

1912d0e

Fix order of barplot graph

9eaa124

Merge branch 'master' into cansav09/snv-add-eval

251366a

jashapiro approved these changes Oct 9, 2019

View reviewed changes

jaclyn-taroni merged commit 02b3588 into AlexsLemonade:master Oct 9, 2019

cansavvy mentioned this pull request Oct 22, 2019

Proposed Analysis: Getting a consensus set of SNV mutations #161

Closed

cansavvy deleted the cansav09/snv-add-eval branch October 29, 2019 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Part 6 of n: SNV Callers - Run evaluations script #144

Part 6 of n: SNV Callers - Run evaluations script #144

cansavvy commented Oct 3, 2019 •

edited

Loading

jashapiro left a comment

cansavvy commented Oct 8, 2019 •

edited

Loading

jashapiro commented Oct 8, 2019

cansavvy commented Oct 8, 2019 •

edited

Loading

Part 6 of n: SNV Callers - Run evaluations script #144

Part 6 of n: SNV Callers - Run evaluations script #144

Conversation

cansavvy commented Oct 3, 2019 • edited Loading

Purpose/implementation

Issue

Directions for reviewers

Results

Docker and continuous integration

PR Checklist

jashapiro left a comment

Choose a reason for hiding this comment

cansavvy commented Oct 8, 2019 • edited Loading

jashapiro commented Oct 8, 2019

cansavvy commented Oct 8, 2019 • edited Loading

cansavvy commented Oct 3, 2019 •

edited

Loading

cansavvy commented Oct 8, 2019 •

edited

Loading

cansavvy commented Oct 8, 2019 •

edited

Loading