Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

PR 1 of 3 TMB revamp: Separate TMB calculations as "coding only" and "all" #307

Merged
merged 13 commits into from
Dec 3, 2019

Conversation

cansavvy
Copy link
Collaborator

@cansavvy cansavvy commented Dec 3, 2019

Purpose/implementation Section

What scientific question is your analysis addressing?

Currently the TMB stats calculated use all mutations, but we should also have a separate TMB stats for coding mutations only.

What was your approach?

Strelka and Mutect are only used for the "all mutations" TMB, but the consensus file of Strelka, Lancet, and Mutect is used for the "coding only" mutations this was discussed on #305

This required some borrowing from the 02-merge-callers, and functionalization of split_mnv into its own script since it is now used by more than one script

I'm going to add the doc changes that are associated with this to PR #304

What GitHub issue does your pull request address?

#305

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

-Is the original data handling flow of MNVs and SNVs maintained? I am particularly worried about the integrity of the last chunk of data wrangling on 03-calculate_tmb.R.

  • Are the Variant Classification groups of IGR and Silent the only ones that are considered as non-coding?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

No new tables yet, but this will require some downstream analyses like tmb_compare_tcga to be updated. Potentially mutational_signatures should be updated as well?

Results

What types of results are included (e.g., table, figure)?

No new results yet.

Reproducibility Checklist

No new packages were needed here.

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

cansavvy pushed a commit to cansavvy/OpenPBTA-analysis that referenced this pull request Dec 3, 2019
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The main question I have is the TMB coding definition. I am sure you don't want to weigh into those definitions again, but it would seem logical that the denominator there would be the total coding length, not the sequenced length. Am I mistaken there? If not, you will need to add some logic to recalculate the denominator, but that could be combined with the target bed files to filter to just the coding mutations, without having to look at the Variant_Classification column at all.

Comment on lines 180 to 181
wgs_genome_size <- sum(wgs_bed[, 3] - wgs_bed[, 2])
wxs_exome_size <- sum(wxs_bed[, 3] - wxs_bed[, 2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need different bed files for coding regions? Intersect the gtf codons with the wgs and wxs bed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Yea, I hadn't thought of this. It should change. I think what your describing above is what we should do for the coding only TMBs, but for the all mutations we should use the intersection of mutect and strelka's BED files and add that up. Does that sound right to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does sound correct. I hadn't thought about that either... But yes, we can't call mutations outside the intersection of the two.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up question is, what gtf file do we use?

Copy link
Collaborator Author

@cansavvy cansavvy Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lancet methods list Gencode 31 as being used but I don't see the other callers list what Gencode version. I will dig deeper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gencode 27 is what is included in the data download at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other callers didn't restrict to coding regions, so no GTFs for those...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our in person discussion, I will make the TMB denominator fixes in a subsequent PR so this PR doesn't get too out of control.

analyses/snv-callers/scripts/03-calculate_tmb.R Outdated Show resolved Hide resolved
analyses/snv-callers/util/split_mnv.R Show resolved Hide resolved
@cansavvy cansavvy marked this pull request as ready for review December 3, 2019 16:45
@cansavvy cansavvy changed the title Draft PR: Separate TMB calculations as "coding only" and "all" PR 1 of 3 TMB revamp: Separate TMB calculations as "coding only" and "all" Dec 3, 2019
jaclyn-taroni pushed a commit that referenced this pull request Dec 3, 2019
* initial restructure

* Add in info from SNV consensus README

* Update outdated text

* Get rid of unnecessary 00-setup.R script

* Take out 00-setup call from bash script

* Add output summary, streamline wording

* Make @jashapiro suggestions except MNV description

* Updates to README to reflect #307 s changes

* clarify Lancet statement and put link

* Update mutation comparison explanation

* Update analyses/snv-callers/README.md

Co-Authored-By: jashapiro <jashapiro@gmail.com>
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good (pending PRs 2 and 3)

@jaclyn-taroni jaclyn-taroni merged commit bb0f46c into AlexsLemonade:master Dec 3, 2019
@cansavvy cansavvy mentioned this pull request Dec 4, 2019
2 tasks
@jharenza jharenza mentioned this pull request Dec 12, 2019
5 tasks
@cansavvy cansavvy deleted the tmb-coding-vs-all branch December 19, 2019 13:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants