Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Part 2 #932 Checking recurrence in combined snv calls #948

Closed
wants to merge 49 commits into from

Conversation

kgaonkar6
Copy link
Collaborator

@kgaonkar6 kgaonkar6 commented Feb 23, 2021

⚠️ merge after #947

Purpose/implementation Section

What scientific question is your analysis addressing?

In #932 we want to gather hotspots that are recurrent in our dataset, these recurrent sites are annotated by the site occurring in MSKCC cancer hostspot or being a Cosmic Census gene list.

What was your approach?

Recurrence sites are counted using an independent sample ids (primary , if no primary found add any secondary sample ) sample set. I'm also using the following columns to calculate recurrence to be equivalent the mafs used in plotVaf() and plottiTv() functions.

      Chromosome = Chromosome with the mutation
      Start_Position = Genomic start position of the mutation
      End_Position = Genomic end position of the mutation
      Amino_Acid_Position = Amino acid position extracted from Protein_position maf column
      Hugo_Symbol = gene sybol
      type = gene annotation type
      gnomad_AF_common = gnomad_AF > 0.001 
      hotspot_database = site found in MSKCC cancer hotspot database or gene in Cancer Census gene list
      Variant_Classification = VEP variant classification
      dbSNP_RS = dbSNP ids
      Reference_Allele = Reference allele at site
      Tumor_Seq_Allele2 = Tumor allele detected at the site
      HGVSp_Short = short protein change annotation 
      Variant_Type = variant type description (SNP, Indel)

What GitHub issue does your pull request address?

#932

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

The counts in plotVaf() per gene will be different compared to what is found in results/snv_recurrence.tsv because of unique VAFs that we find in each Tumor_Sample_Barcode which is not used in recurrence counts.

Is there anything that you want to discuss further?

Additional validation for sites not in hotspot databases, through slack convo we will be deciding to do additional checks using tcga data or add a condition that 2 or more callers call these unique sites how should we go about this, should it be a new PR which does just validation of these sites ?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

table and QC figures

What is your summary of the results?

192 sites are found as recurrent of which 146 are in Cosmic Census genes, 24 are is both MSKCC cancer database ,Cosmic Census gene list and 22 are novel recurrent sites.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@kgaonkar6
Copy link
Collaborator Author

kgaonkar6 commented Feb 24, 2021

I checked if novel sites overlap TCGA using curatedTCGAdata in script here and 2 novel sites seems to be overlap curatedTCGA mutation calls but rs200148949 doesn't seem to be very rare mutations in dbSNP (ALFA) but in gnomad these sites are very rare T=0.0002131.

rs200148949 overlap ACC and KIHC cancer samples in TCGA https://cancer.sanger.ac.uk/cosmic/mutation/overview?id=94757344
MAF:
T=0.033155/298 (ALFA)
T=0.004586/539 (ExAC)
Screen Shot 2021-02-24 at 6 23 07 PM

rs190462445 overlaps Testicular Germ Cell Tumors (I didn't find any mutation data on Cosmic website though https://cancer.sanger.ac.uk/cosmic/study/overview?study_id=665)
MAF:
C=0.0097/87 (ALFA)
Screen Shot 2021-02-24 at 6 30 49 PM

@kgaonkar6 kgaonkar6 closed this Mar 1, 2021
@kgaonkar6 kgaonkar6 deleted the recurrence-check branch May 13, 2021 17:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant