PR 1 of 3 TMB revamp: Separate TMB calculations as "coding only" and "all" #307

cansavvy · 2019-12-03T15:36:03Z

Purpose/implementation Section

What scientific question is your analysis addressing?

Currently the TMB stats calculated use all mutations, but we should also have a separate TMB stats for coding mutations only.

What was your approach?

Strelka and Mutect are only used for the "all mutations" TMB, but the consensus file of Strelka, Lancet, and Mutect is used for the "coding only" mutations this was discussed on #305

This required some borrowing from the 02-merge-callers, and functionalization of split_mnv into its own script since it is now used by more than one script

I'm going to add the doc changes that are associated with this to PR #304

What GitHub issue does your pull request address?

#305

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

-Is the original data handling flow of MNVs and SNVs maintained? I am particularly worried about the integrity of the last chunk of data wrangling on 03-calculate_tmb.R.

Are the Variant Classification groups of IGR and Silent the only ones that are considered as non-coding?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

No new tables yet, but this will require some downstream analyses like tmb_compare_tcga to be updated. Potentially mutational_signatures should be updated as well?

Results

What types of results are included (e.g., table, figure)?

No new results yet.

Reproducibility Checklist

No new packages were needed here.

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

jashapiro

Looks good. The main question I have is the TMB coding definition. I am sure you don't want to weigh into those definitions again, but it would seem logical that the denominator there would be the total coding length, not the sequenced length. Am I mistaken there? If not, you will need to add some logic to recalculate the denominator, but that could be combined with the target bed files to filter to just the coding mutations, without having to look at the Variant_Classification column at all.

analyses/snv-callers/scripts/03-calculate_tmb.R

jashapiro · 2019-12-03T15:56:34Z

analyses/snv-callers/scripts/03-calculate_tmb.R

  wgs_genome_size <- sum(wgs_bed[, 3] - wgs_bed[, 2])
  wxs_exome_size <- sum(wxs_bed[, 3] - wxs_bed[, 2])


Do we need different bed files for coding regions? Intersect the gtf codons with the wgs and wxs bed?

Ah. Yea, I hadn't thought of this. It should change. I think what your describing above is what we should do for the coding only TMBs, but for the all mutations we should use the intersection of mutect and strelka's BED files and add that up. Does that sound right to you?

That does sound correct. I hadn't thought about that either... But yes, we can't call mutations outside the intersection of the two.

Follow up question is, what gtf file do we use?

Lancet methods list Gencode 31 as being used but I don't see the other callers list what Gencode version. I will dig deeper.

Gencode 27 is what is included in the data download at the moment.

The other callers didn't restrict to coding regions, so no GTFs for those...

Per our in person discussion, I will make the TMB denominator fixes in a subsequent PR so this PR doesn't get too out of control.

analyses/snv-callers/scripts/03-calculate_tmb.R

analyses/snv-callers/util/split_mnv.R

@jashapiro

* initial restructure * Add in info from SNV consensus README * Update outdated text * Get rid of unnecessary 00-setup.R script * Take out 00-setup call from bash script * Add output summary, streamline wording * Make @jashapiro suggestions except MNV description * Updates to README to reflect #307 s changes * clarify Lancet statement and put link * Update mutation comparison explanation * Update analyses/snv-callers/README.md Co-Authored-By: jashapiro <jashapiro@gmail.com>

jashapiro

Looks good (pending PRs 2 and 3)

…ng-vs-all

Candace Savonen added 3 commits December 2, 2019 16:17

Beginning of adding in the coding_only file

639ef9b

coding vs all file both write

afdc29d

Make all mutations tmb only from strelka and mutect2

b33a2b5

cansavvy pushed a commit to cansavvy/OpenPBTA-analysis that referenced this pull request Dec 3, 2019

Updates to README to reflect AlexsLemonade#307 s changes

c6e556b

Merge branch 'master' into tmb-coding-vs-all

b7c076b

jashapiro reviewed Dec 3, 2019

View reviewed changes

Push @jashapiro suggestions except denominator stuff

ca482ca

cansavvy marked this pull request as ready for review December 3, 2019 16:45

cansavvy changed the title ~~Draft PR: Separate TMB calculations as "coding only" and "all"~~ PR 1 of 3 TMB revamp: Separate TMB calculations as "coding only" and "all" Dec 3, 2019

Candace Savonen and others added 2 commits December 3, 2019 12:09

Fix object calling error

516bcaf

Merge branch 'master' into tmb-coding-vs-all

ddddddc

jashapiro approved these changes Dec 3, 2019

View reviewed changes

Candace Savonen and others added 6 commits December 3, 2019 13:23

attempt to fix data.frame error

b950c20

Merge remote-tracking branch 'origin/tmb-coding-vs-all' into tmb-codi…

290c7af

…ng-vs-all

copy = TRUE to fix errors

1a1da32

Fix the NA rows that popped up but shouldn't have

877863a

one copy = TRUE was missing :(

c34093b

Merge branch 'master' into tmb-coding-vs-all

c5b8103

jaclyn-taroni merged commit bb0f46c into AlexsLemonade:master Dec 3, 2019

jashapiro mentioned this pull request Dec 4, 2019

Delete accidentally committed file with bad path #310

Merged

2 tasks

cansavvy mentioned this pull request Dec 4, 2019

PR 2 of 3 TMB calculation revamp #311

Merged

2 tasks

jharenza mentioned this pull request Dec 12, 2019

Planned data release: V12 #326

Closed

5 tasks

cansavvy mentioned this pull request Dec 12, 2019

PR 3 of 3: TMB Calculation Revamp Update the README #333

Merged

cansavvy deleted the tmb-coding-vs-all branch December 19, 2019 13:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR 1 of 3 TMB revamp: Separate TMB calculations as "coding only" and "all" #307

PR 1 of 3 TMB revamp: Separate TMB calculations as "coding only" and "all" #307

cansavvy commented Dec 3, 2019 •

edited

Loading

jashapiro left a comment

jashapiro Dec 3, 2019

cansavvy Dec 3, 2019

jashapiro Dec 3, 2019

cansavvy Dec 3, 2019

cansavvy Dec 3, 2019 •

edited

Loading

jaclyn-taroni Dec 3, 2019

jashapiro Dec 3, 2019

cansavvy Dec 3, 2019

jashapiro left a comment

		wgs_genome_size <- sum(wgs_bed[, 3] - wgs_bed[, 2])
		wxs_exome_size <- sum(wxs_bed[, 3] - wxs_bed[, 2])

PR 1 of 3 TMB revamp: Separate TMB calculations as "coding only" and "all" #307

PR 1 of 3 TMB revamp: Separate TMB calculations as "coding only" and "all" #307

Conversation

cansavvy commented Dec 3, 2019 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

Reproducibility Checklist

jashapiro left a comment

Choose a reason for hiding this comment

jashapiro Dec 3, 2019

Choose a reason for hiding this comment

cansavvy Dec 3, 2019

Choose a reason for hiding this comment

jashapiro Dec 3, 2019

Choose a reason for hiding this comment

cansavvy Dec 3, 2019

Choose a reason for hiding this comment

cansavvy Dec 3, 2019 • edited Loading

Choose a reason for hiding this comment

jaclyn-taroni Dec 3, 2019

Choose a reason for hiding this comment

jashapiro Dec 3, 2019

Choose a reason for hiding this comment

cansavvy Dec 3, 2019

Choose a reason for hiding this comment

jashapiro left a comment

Choose a reason for hiding this comment

cansavvy commented Dec 3, 2019 •

edited

Loading

cansavvy Dec 3, 2019 •

edited

Loading