-
Notifications
You must be signed in to change notification settings - Fork 67
#819 Part2 : Scavenge back hotspots to add to consensus calls #961
#819 Part2 : Scavenge back hotspots to add to consensus calls #961
Conversation
…A-analysis into recurrence-snv
…A-analysis into recurrence-snv
…A-analysis into recurrence-snv
…A-analysis into recurrence-snv
…A-analysis into recurrence-snv
Based on offline discussion, we created a hotspots only maf to include all hotspots instead of those newly recovered. To the point of subtyping with the hotspot only maf- this will be a case by case decision, as some modules require kinase domain SNVs or duplications, which require the consensus maf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on offline discussion, we created a hotspots only maf to include all hotspots instead of those newly recovered.
I see that there was an update to the output file in 7a11201, but I don’t see any accompanying code updates.
From what I can see, the following code is still filtering out consensus SNVs from the output, leaving only the new sites:
…ar6/OpenPBTA-analysis into combine_hotspots_consensu
This reverts commit ea4f636.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to be merged. Note: the one caller only table is now in the notebook for easy viewing.
I am going to merge this so we can get to #959 |
🚨 merge #1050 before this PR
Purpose/implementation Section
What scientific question is your analysis addressing?
Combine hotspots from each caller (strelka,mutect,vardict and lancet) and merge with consensus maf file.
What was your approach?
I gathered filtered RDS from
scratch/hotspot-detection
and filtered the combined hotspot calls to only calls uniquely present and not in consensus maf generated as part ofsnv-caller
module.This dataframe now contains unique calls that need to be scavenged back, however the following columns are have read support and quality assignment which can be unique to each caller since they are calculated as part of the variant calling process. The values in these columns are substituted by mean values to create 1 unique row for that call.
unique_cols <- c("t_depth",
"n_depth",
"t_ref_count",
"n_ref_count",
"t_alt_count",
"n_alt_count",
"caller",
"vcf_qual")
Now that we have these scavenged hotspots I merge them with consensus maf which is the final result required from this analysis.
What GitHub issue does your pull request address?
#819
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Is there anything that you want to discuss further?
The final result of #819 would be a maf format file with scavenged hotspots merged with consensus maf file, but because it is larger than the limit it wont get pushed to remote. I believe eventually this file will be part of the download data, thought?
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes
Results
What types of results are included (e.g., table, figure)?
table
What is your summary of the results?
Hotspots will be added to consensus maf calls in a next step where we gather hotspot calls missed in the consensus maf.
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.