Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

#819 Part2 : Scavenge back hotspots to add to consensus calls #961

Merged
merged 144 commits into from
May 13, 2021

Conversation

kgaonkar6
Copy link
Collaborator

@kgaonkar6 kgaonkar6 commented Mar 18, 2021

🚨 merge #1050 before this PR

Purpose/implementation Section

What scientific question is your analysis addressing?

Combine hotspots from each caller (strelka,mutect,vardict and lancet) and merge with consensus maf file.

What was your approach?

I gathered filtered RDS from scratch/hotspot-detection and filtered the combined hotspot calls to only calls uniquely present and not in consensus maf generated as part of snv-caller module.

This dataframe now contains unique calls that need to be scavenged back, however the following columns are have read support and quality assignment which can be unique to each caller since they are calculated as part of the variant calling process. The values in these columns are substituted by mean values to create 1 unique row for that call.
unique_cols <- c("t_depth",
"n_depth",
"t_ref_count",
"n_ref_count",
"t_alt_count",
"n_alt_count",
"caller",
"vcf_qual")

Now that we have these scavenged hotspots I merge them with consensus maf which is the final result required from this analysis.

What GitHub issue does your pull request address?

#819

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

  • Currently I believe, taking the mean values of the read support and quality seems ok please let me know if you would handle it any other way.
  • Some hotspots are supported by <5 t_alt_depths which might need a review?

Is there anything that you want to discuss further?

The final result of #819 would be a maf format file with scavenged hotspots merged with consensus maf file, but because it is larger than the limit it wont get pushed to remote. I believe eventually this file will be part of the download data, thought?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

table

What is your summary of the results?

Hotspots will be added to consensus maf calls in a next step where we gather hotspot calls missed in the consensus maf.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jharenza
Copy link
Collaborator

Based on offline discussion, we created a hotspots only maf to include all hotspots instead of those newly recovered. To the point of subtyping with the hotspot only maf- this will be a case by case decision, as some modules require kinase domain SNVs or duplications, which require the consensus maf.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on offline discussion, we created a hotspots only maf to include all hotspots instead of those newly recovered.

I see that there was an update to the output file in 7a11201, but I don’t see any accompanying code updates.

From what I can see, the following code is still filtering out consensus SNVs from the output, leaving only the new sites:

https://github.com/kgaonkar6/OpenPBTA-analysis/blob/c4b55df02a4817091912f61405f62f9d51a04594/analyses/hotspots-detection/01-create-hotspot-maf.Rmd#L100-L106

analyses/hotspots-detection/01-create-hotspot-maf.Rmd Outdated Show resolved Hide resolved
analyses/hotspots-detection/01-create-hotspot-maf.Rmd Outdated Show resolved Hide resolved
analyses/hotspots-detection/01-create-hotspot-maf.Rmd Outdated Show resolved Hide resolved
@kgaonkar6 kgaonkar6 requested review from jashapiro and jharenza and removed request for jharenza May 12, 2021 17:23
Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to be merged. Note: the one caller only table is now in the notebook for easy viewing.

@jharenza
Copy link
Collaborator

I am going to merge this so we can get to #959

@jharenza jharenza merged commit e760c46 into AlexsLemonade:master May 13, 2021
jaclyn-taroni added a commit to jaclyn-taroni/OpenPBTA-analysis that referenced this pull request May 13, 2021
@kgaonkar6 kgaonkar6 deleted the combine_hotspots_consensu branch May 13, 2021 17:19
jaclyn-taroni added a commit that referenced this pull request May 13, 2021
Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
merge next ready for review Used to label pull requests that are ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants