Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Updated analysis: PBTA vs TCGA TMB analysis #556

Closed
jharenza opened this issue Feb 25, 2020 · 2 comments
Closed

Updated analysis: PBTA vs TCGA TMB analysis #556

jharenza opened this issue Feb 25, 2020 · 2 comments

Comments

@jharenza
Copy link
Collaborator

jharenza commented Feb 25, 2020

What analysis module should be updated and why?

PBTA vs TCGA TMB analysis In the works by @cansavvy #548 based on #551 and/or #257

What changes need to be made? Please provide enough detail for another participant to make the update.

We are seeing higher TMB in pediatric vs adult cancers and suspect variant calling behavior in WXS with lancet, which is complicating results.

What input data should be used? Which data were used in the version being updated?

Trying to figure that out now via #548 (TCGA or PCAWG)?

When do you expect the revised analysis will be completed?

Hopefully, soon!

Who will complete the updated analysis?

combination of CCDL/D3b

Relevant Literature

@cansavvy @jaclyn-taroni @jashapiro @yuankunzhu - commenting on this issue for some additional background I found, from one of the earlier papers that identified an association with PMS2 mutations and hypermutation. Of note, they do additional filtering, which I am not sure yet (have to read) if TCGA had done (which may be a reason we see higher TMBs in pediatric samples in some algorithms).

Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden
Chalmers ZR, Connelly CF, Fabrizio D, Gay L, Ali SM, Ennis R, et al. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med. 2017 Apr 19;9(1):34.

Link
Abstract
Background
High tumor mutational burden (TMB) is an emerging biomarker of sensitivity to immune checkpoint inhibitors and has been shown to be more significantly associated with response to PD-1 and PD-L1 blockade immunotherapy than PD-1 or PD-L1 expression, as measured by immunohistochemistry (IHC). The distribution of TMB and the subset of patients with high TMB has not been well characterized in the majority of cancer types.

Methods
In this study, we compare TMB measured by a targeted comprehensive genomic profiling (CGP) assay to TMB measured by exome sequencing and simulate the expected variance in TMB when sequencing less than the whole exome. We then describe the distribution of TMB across a diverse cohort of 100,000 cancer cases and test for association between somatic alterations and TMB in over 100 tumor types.

Results
We demonstrate that measurements of TMB from comprehensive genomic profiling are strongly reflective of measurements from whole exome sequencing and model that below 0.5 Mb the variance in measurement increases significantly. We find that a subset of patients exhibits high TMB across almost all types of cancer, including many rare tumor types, and characterize the relationship between high TMB and microsatellite instability status. We find that TMB increases significantly with age, showing a 2.4-fold difference between age 10 and age 90 years. Finally, we investigate the molecular basis of TMB and identify genes and mutations associated with TMB level. We identify a cluster of somatic mutations in the promoter of the gene PMS2, which occur in 10% of skin cancers and are highly associated with increased TMB.

Conclusions
These results show that a CGP assay targeting ~1.1 Mb of coding genome can accurately assess TMB compared with sequencing the whole exome. Using this method, we find that many disease types have a substantial portion of patients with high TMB who might benefit from immunotherapy. Finally, we identify novel, recurrent promoter mutations in PMS2, which may be another example of regulatory mutations contributing to tumorigenesis.

TMB calculations
Panel
TMB was defined as the number of somatic, coding, base substitution, and indel mutations per megabase of genome examined. All base substitutions and indels in the coding region of targeted genes, including synonymous alterations, are initially counted before filtering as described below. Synonymous mutations are counted in order to reduce sampling noise. While synonymous mutations are not likely to be directly involved in creating immunogenicity, their presence is a signal of mutational processes that will also have resulted in nonsynonymous mutations and neoantigens elsewhere in the genome. Non-coding alterations were not counted. Alterations listed as known somatic alterations in COSMIC and truncations in tumor suppressor genes were not counted, since our assay genes are biased toward genes with functional mutations in cancer. Alterations predicted to be germline by the somatic-germline-zygosity algorithm were not counted. Alterations that were recurrently predicted to be germline in our cohort of clinical specimens were not counted. Known germline alterations in dbSNP were not counted. Germline alterations occurring with two or more counts in the ExAC database were not counted. To calculate the TMB per megabase, the total number of mutations counted is divided by the size of the coding region of the targeted territory. The nonparametric Mann–Whitney U-test was subsequently used to test for significance in difference of means between two populations.

Alterations predicted to be germline by the somatic-germline-zygosity algorithm were not counted. Alterations that were recurrently predicted to be germline in our cohort of clinical specimens were not counted. Known germline alterations in dbSNP were not counted. Germline alterations occurring with two or more counts in the ExAC database were not counted.
here, they remove germline or predicted germline alterations using dbSNP germline alterations and ExAC frequencies

WES of 29 samples which also had panel seq
WES was performed on 29 samples as previously described for which CGP had also been performed. Briefly, tumors were sequenced using Agilent’s exome enrichment kit (Sure Select V4; with >50% of baits above 25× coverage). The matched blood-derived DNA was also sequenced. Base calls and intensities from the Illumina HiSeq 2500 were processed into FASTQ files using CASAVA. The paired-end FASTQ files were aligned to the genome (to UCSC’s hg19 GRCh37) with BWA (v0.5.9). Duplicate paired-end sequences were removed using Picard MarkDuplicates (v1.35) to reduce potential PCR bias. Aligned reads were realigned for known insertion/deletion events using SRMA (v0.1.155). Base quality scores were recalibrated using the Genome Analysis Toolkit (v1.1-28). Somatic substitutions were identified using MuTect (v1.1.4). Mutations were then filtered against common single-nucleotide polymorphisms (SNPs) found in dbSNP (v132), the 1000 Genomes Project (Feb 2012), a 69-sample Complete Genomics data set, and the Exome Sequencing Project (v6500). <- extra filtering step

TCGA
TCGA data were obtained from public repositories. For this analysis, we used the somatic called variants as determined by TCGA as the raw mutation count. We used 38 Mb as the estimate of the exome size. For the downsampling analysis, we simulated the observed number of mutations/Mb 1000 times using the binomial distribution at whole exome TMB = 100 mutations/Mb, 20 mutations/Mb, and 10 mutations/Mb and did this for megabases of exome sequenced ranging from 0–10 Mb. Melanoma TCGA data were obtained from dbGap accession number phs000452.v1.p1.

For the PPTC PDX paper, we had removed germline variants by way of a panel of normals, since we had tumor-only samples. However, we do mention in the paper that our TMBs are a bit higher because we likely did not remove all of the germline variants in that regard. In OpenPBTA, we should be removing these with the paired normal, but I also wonder if more are coming through for any reason.

@yuankunzhu
Copy link
Collaborator

@cansavvy we are thinking about this filtering:

Mutations were then filtered against common single-nucleotide polymorphisms (SNPs) found in dbSNP (v132), the 1000 Genomes Project (Feb 2012), a 69-sample Complete Genomics data set, and the Exome Sequencing Project (v6500)

@cansavvy
Copy link
Collaborator

cansavvy commented Jan 7, 2021

I think this issue was largely addressed by the more specific follow-up issues #668 , #724 and #726 (which are all closed) so I don't think this issue is relevant anymore. I believe this can be closed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants