Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Part 6 of n: SNV Callers - Run evaluations script #144

Merged
merged 82 commits into from
Oct 9, 2019

Conversation

cansavvy
Copy link
Collaborator

@cansavvy cansavvy commented Oct 3, 2019

Purpose/implementation

This is the fifth PR in a series that are preparing a pipeline to perform an initial analysis and calculation of each MAF file for each SNV caller. At this point, the SNV callers that will be evaluated are MuTect2, Strelka2, VarDict, and Lancet. (VarDict and Lancet are temporarily missing WXS samples).

What these scripts do:
02-run_eval.R is called from the run_caller_evals.sh bash script and makes plots from the files created in 01-calculate_vaf_tmb.R. It uses functions from util/plot_functions.R.

Here is a sample report. It is made of a subset of the Strelka2 data. Just picture that wherever this sample report says "11111.tsv" that would be replaced with whatever the algorithm's name is e.g. "Strelka2".

Issue

For SNV caller comparison #103 and Tumor Mutation Burden #3 and sort of #11

Directions for reviewers

Main things I am looking for advice on:

  • What do you think about the general structure of the script and its options?
  • What do you think about how it checks for the files it needs? Is there a better way to do this?
  • How about the report itself? Is there things that should be changed?
  • How about the plots? Things that can be tweaked?

Results

No results just yet, because we don't yet have all the data.

Docker and continuous integration

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
    (From Part 4 of n: SNV Caller Analysis: Set Up Script #126, no changes have been made to the Dockerfile.)
  • This analysis has been added to continuous integration.
    (It is called from the bash script that is already in the CI test)

PR Checklist

  • Run a linter
  • Set the seed (if applicable)
  • Comments and/or documentation up to date
  • Double check your paths

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, aside from the fact that I couldn't fully test it due to memory limitations in generating the setup files. 😞

It might be nice to add a bit more feedback in the vaf_tmb script so we can see exactly when it fails. it might be possible to tweak things to cut down memory usage, but that can also be saved for a later iteration.

In the mean time, could you include an example result file (or all of them) in the PR? I think it would help to see one even if the data are not finalized.

Suggestion:
Add

set -e
set -o pipefail

to bash script to make it fail if any of the substeps fail.

analyses/snv-callers/run_caller_evals.sh Outdated Show resolved Hide resolved
analyses/snv-callers/scripts/00-set_up.R Outdated Show resolved Hide resolved
analyses/snv-callers/scripts/02-run_eval.R Outdated Show resolved Hide resolved
analyses/snv-callers/scripts/02-run_eval.R Show resolved Hide resolved
analyses/snv-callers/scripts/02-run_eval.R Outdated Show resolved Hide resolved
analyses/snv-callers/util/wrangle_functions.R Outdated Show resolved Hide resolved
@cansavvy
Copy link
Collaborator Author

cansavvy commented Oct 8, 2019

In the mean time, could you include an example result file (or all of them) in the PR? I think it would help to see one even if the data are not finalized.

I had included a sample report in my PR intro. See above

Or do you also want the TMB, VAF, and Region file samples?

Here are example output files:
example_files.zip

@jashapiro
Copy link
Member

I had included a sample report in my PR intro. See above

Sorry, I had missed that!

Looking at it, a couple more comments:
For the mutation plots, consider sorting del and ins to the beginning or end; their alphabetical order doesn't make much sense.

Consider adding a median line to the TMB plot? Also, I'm going to re-up my call for a sina plot there rather than a jitter. For the why, look at your plot with the lonely Ependymoma point way out to the left. It is hard to know at a glance which data that one goes with. Or maybe add color by tissue (with no legend).

@cansavvy
Copy link
Collaborator Author

cansavvy commented Oct 8, 2019

Consider adding a median line to the TMB plot? Also, I'm going to re-up my call for a sina plot there rather than a jitter. For the why, look at your plot with the lonely Ependymoma point way out to the left. It is hard to know at a glance which data that one goes with. Or maybe add color by tissue (with no legend).

Can I make a motion that we will go back and change aesthetics of the plots when we have the real data? Because for example, there isn't only one Epnedymoma point in reality, but my tester dataset is only plotting 3 samples. So a lot of these plots will look different after the "real" data is ran anyway.

In hopeful preparation for you being okay with this, I've made an issue to track this.

@jaclyn-taroni jaclyn-taroni merged commit 02b3588 into AlexsLemonade:master Oct 9, 2019
@cansavvy cansavvy deleted the cansav09/snv-add-eval branch October 29, 2019 19:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants