Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Updated analysis: function calculate_tmb returns unfiltered mutation counts #724

Closed
yuankunzhu opened this issue Jul 13, 2020 · 5 comments
Closed
Assignees

Comments

@yuankunzhu
Copy link
Collaborator

yuankunzhu commented Jul 13, 2020

What analysis module should be updated and why?

  • tmb_functions.R from snv-callers/scripts.
  • it returns all mutation counts from input MAF without filtering which caused high mutation counts for WGS coding region output
  • a quick look of the tmb tsv files, it shows mutation_count in the coding region tsv the same number as the total mutation count:
pbta-snv-mutation-tmb-
$ head pbta-snv-mutation-tmb-*.tsv | column -s $'\t' -t
==> pbta-snv-mutation-tmb-all.tsv <==
Tumor_Sample_Barcode                      experimental_strategy  short_histology  mutation_count  region_size  tmb
BS_1Q524P3B                               WGS                    HGAT             4168            2923762389   1.4255604407804017
BS_BD4RQ1G0                               WGS                    Schwannoma       241             2923762389   0.0824280389222833
BS_80X7AVCP                               WGS                    Embryonal Tumor  708             2923762389   0.24215374090031774
BS_1JMTTKMK                               WGS                    Medulloblastoma  3116            2923762389   1.0657500800076132
BS_9DM8H1RX                               WGS                    HGAT             638             2923762389   0.218211986856501
BS_4XPPZTGG                               WGS                    Other            245             2923762389   0.08379613915335854
BS_3J4T2YYW                               WGS                    Medulloblastoma  326             2923762389   0.11150016883263218
BS_A1DV9T7G                               WGS                    Neurofibroma     4357            2923762389   1.4902031766987067
BS_X7QJCVJB                               WGS                    Medulloblastoma  1992            2923762389   0.6813139150754702
==> pbta-snv-mutation-tmb-coding.tsv <==
Tumor_Sample_Barcode                      experimental_strategy  short_histology  mutation_count  region_size  tmb
BS_1Q524P3B                               WGS                    HGAT             4168            35717401     116.69382103137907
BS_BD4RQ1G0                               WGS                    Schwannoma       241             35717401     6.747411436795191
BS_80X7AVCP                               WGS                    Embryonal Tumor  708             35717401     19.822270942950187
BS_1JMTTKMK                               WGS                    Medulloblastoma  3116            35717401     87.24039019524405
BS_9DM8H1RX                               WGS                    HGAT             638             35717401     17.86244189491839
BS_4XPPZTGG                               WGS                    Other            245             35717401     6.8594016681112935
BS_3J4T2YYW                               WGS                    Medulloblastoma  326             35717401     9.127203852262374
BS_A1DV9T7G                               WGS                    Neurofibroma     4357            35717401     121.98535946106492
BS_X7QJCVJB                               WGS                    Medulloblastoma  1992            35717401     55.77113519541917

What changes need to be made? Please provide enough detail for another participant to make the update.

preliminary investigation shows the mutation_count should be calculated with filt_maf_df in the code below, but need a closer look and test.

tmb <- sample_maf_df %>%
dplyr::group_by(
#TODO: Make this column passing stuff more flexible with some tidyeval maybe
Tumor_Sample_Barcode = tumor_sample_barcode,
experimental_strategy,
short_histology
) %>%
# Count number of mutations for that sample
dplyr::summarize(
mutation_count = dplyr::n(),
region_size = bed_size,
tmb = mutation_count / (region_size / 1000000)
)

What input data should be used? Which data were used in the version being updated?

# for PBTA
data/pbta-snv-strelka2.vep.maf.gz
data/pbta-snv-mutect2.vep.maf.gz
data/pbta-histologies.tsv

# for TCGA
data/pbta-tcga-snv-strelka2.vep.maf.gz
data/pbta-tcga-snv-mutect2.vep.maf.gz
data/pbta-tcga-manifest.tsv

# BED
data/gencode.v27.primary_assembly.annotation.gtf.gz
data/WXS.hg38.100bp_padded.bed
scratch/intersect_strelka_mutect_WGS.bed

When do you expect the revised analysis will be completed?

Who will complete the updated analysis?

@jaclyn-taroni
Copy link
Member

Excellent find @yuankunzhu - thank you! As discussed via Slack, @yuankunzhu will file the bug fix this afternoon and we can have @cansavvy and @jashapiro take a look. The CCDL team can look at some of the downstream analyses and fix the plotting outlined in that Notion document.

@yuankunzhu
Copy link
Collaborator Author

Excellent find @yuankunzhu - thank you! As discussed via Slack, @yuankunzhu will file the bug fix this afternoon and we can have @cansavvy and @jashapiro take a look. The CCDL team can look at some of the downstream analyses and fix the plotting outlined in that Notion document.

sounds good @jaclyn-taroni. just filed a simple PR around this at #727

@cansavvy
Copy link
Collaborator

Is there anything left to address for this issue or can we close it?

@yuankunzhu
Copy link
Collaborator Author

I think @jashapiro wants to make sure everything looks ok for the downstream analysis with the updated function from the commit here: #727 (review)?

But yea I'm ok to close it as my initial thought was just to arise the counting issue.

in addition to this, we could consider to open another ticket just for the plotting axes alignment with more specific descriptions if we identified that's an issue to address as well.

@jaclyn-taroni jaclyn-taroni removed their assignment Aug 3, 2020
@cansavvy
Copy link
Collaborator

cansavvy commented Aug 3, 2020

I have the axes alignment bit tracked here: cansavvy/openpbta-notebook-concept#9

So we can close this.

I can also copy over the issue I linked above to this current repository, but I didn't want to clutter up the issues here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants