Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Part 1 #819 Combine snv per caller and filter to scavenge hotspots #947

Closed
wants to merge 54 commits into from

Conversation

kgaonkar6
Copy link
Collaborator

@kgaonkar6 kgaonkar6 commented Feb 22, 2021

⚠️ I had to update the sql database creation to include all columns in all callers so I created a new PR which should go in before this PR

Purpose/implementation Section

In general the question in #819 is to check for hotspots in our data that overlap MSKCC cancer database or other known hotspot sites which could have been missed in consensus calls in snv-callers module because the call is found in 1 or 2 out of the 4 callers because of caller limitation.
In this PR I'm using strelka2,mutect2,vardict and lancet calls and gather calls that overlap MSKCC hotspots and known TERT promoter mutations.

What scientific question is your analysis addressing?

Gather calls from each callers and filter for Amino_Acid_Position which are hotspots per gene or have overlap with the genomic region.

What was your approach?

Sql database of all the callers is created using 01-setup_db.py from snv-callers by @jashapiro and @cansavvy.

01-combine-snv.Rmd filters and combines calls from all callers with the below filtering criteria:

Filtering for hotspot overlapping mutations

  • IMPACT == 'HIGH|MODERATE|MODIFIER' to remove any LOW mutations in the given amino acid position in hotspot database
    AND
  • Hugo_Symbol %in% c(hotspot_database_amino_acid$Hugo_Symbol,hotspot_database_genomic$Hugo_Symbol)
    AND
  • ( Amino acid hotspot overlap by matching the Amino_Acid_position in MSKCC cancer hotspot database
    OR
  • TERT promoter region overlap using a genomic region overlap filtering strategy )

What GitHub issue does your pull request address?

#819

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

I checked the paper for a source code : it's in here https://github.com/taylor-lab/hotspots and they also get Amino_Acid_Position that I use to check over from maf column Protein_position https://github.com/taylor-lab/hotspots/blob/733c727bd4b9f7c1a7f4508b9a467b2f31cacf33/funcs.R#L602 so should be consistent. My only concern is that only 2 Indels are captured as overlapping MSKCC does that sound OK?

Is there anything that you want to discuss further?

Do we think the Amino_Acid_Position + Hugo_Symbol matching between over data and MSKCC database is sufficient ( since it seems in most cases of subtyping/hotspot are actually amino acid site only) or should I do a liftover from hg19 to hg38 for genomic region overlap?.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

RDS

What is your summary of the results?

1955 calls in hotspot overlap combined maf RDS per tumor sample per caller (593 unique sites)

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@kgaonkar6 kgaonkar6 changed the title Combine snv per caller and filter for hotspots Part1 #932 Combine snv per caller and filter for hotspots Feb 22, 2021
@kgaonkar6 kgaonkar6 changed the title Part1 #932 Combine snv per caller and filter for hotspots Part 1 #932 Combine snv per caller and filter for hotspots Feb 24, 2021
@kgaonkar6 kgaonkar6 changed the title Part 1 #932 Combine snv per caller and filter for hotspots Part 1 #819 Combine snv per caller and filter to scavenge hotspots Mar 1, 2021
@kgaonkar6 kgaonkar6 added the ready for review Used to label pull requests that are ready for review label Mar 2, 2021
@kgaonkar6 kgaonkar6 removed the ready for review Used to label pull requests that are ready for review label Mar 12, 2021
@kgaonkar6
Copy link
Collaborator Author

Please close if #954 seems more reasonable for gathering the hotspots. Thanks!

@jaclyn-taroni
Copy link
Member

Closing in favor of #956

@kgaonkar6 kgaonkar6 deleted the combine-snv branch May 13, 2021 17:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants