-
Notifications
You must be signed in to change notification settings - Fork 67
Part 6 of n: SNV Callers - Run evaluations script #144
Part 6 of n: SNV Callers - Run evaluations script #144
Conversation
…cansav09/snv_calculations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, aside from the fact that I couldn't fully test it due to memory limitations in generating the setup files. 😞
It might be nice to add a bit more feedback in the vaf_tmb script so we can see exactly when it fails. it might be possible to tweak things to cut down memory usage, but that can also be saved for a later iteration.
In the mean time, could you include an example result file (or all of them) in the PR? I think it would help to see one even if the data are not finalized.
Suggestion:
Add
set -e
set -o pipefail
to bash script to make it fail if any of the substeps fail.
I had included a sample report in my PR intro. See above Or do you also want the TMB, VAF, and Region file samples? Here are example output files: |
…av09/snv-add-eval
Sorry, I had missed that! Looking at it, a couple more comments: Consider adding a median line to the TMB plot? Also, I'm going to re-up my call for a sina plot there rather than a jitter. For the why, look at your plot with the lonely Ependymoma point way out to the left. It is hard to know at a glance which data that one goes with. Or maybe add color by tissue (with no legend). |
Can I make a motion that we will go back and change aesthetics of the plots when we have the real data? Because for example, there isn't only one Epnedymoma point in reality, but my tester dataset is only plotting 3 samples. So a lot of these plots will look different after the "real" data is ran anyway. In hopeful preparation for you being okay with this, I've made an issue to track this. |
Purpose/implementation
This is the fifth PR in a series that are preparing a pipeline to perform an initial analysis and calculation of each MAF file for each SNV caller. At this point, the SNV callers that will be evaluated are MuTect2, Strelka2, VarDict, and Lancet. (VarDict and Lancet are temporarily missing WXS samples).
What these scripts do:
02-run_eval.R
is called from therun_caller_evals.sh
bash script and makes plots from the files created in01-calculate_vaf_tmb.R
. It uses functions fromutil/plot_functions.R
.Here is a sample report. It is made of a subset of the Strelka2 data. Just picture that wherever this sample report says "11111.tsv" that would be replaced with whatever the algorithm's name is e.g. "Strelka2".
Issue
For SNV caller comparison #103 and Tumor Mutation Burden #3 and sort of #11
Directions for reviewers
Main things I am looking for advice on:
Results
No results just yet, because we don't yet have all the data.
Docker and continuous integration
(From Part 4 of n: SNV Caller Analysis: Set Up Script #126, no changes have been made to the Dockerfile.)
(It is called from the bash script that is already in the CI test)
PR Checklist