-
Notifications
You must be signed in to change notification settings - Fork 67
Update MAF fields for v19 SNV consensus #1033
Update MAF fields for v19 SNV consensus #1033
Conversation
I have also added here now the updated consensus comparison notebook based on v19 data. It shows the radically different VarDict files as noted here: #990 (comment) I am not sure whether this justifies changing the consensus algorithm... We could probably go to 3 of 4 callers now, but that would still not compare easily to the tcga data for which we don't have vardict results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jashapiro - I think the column deletions and additions look good here, but just had a question - do you want to rerun the notebook noted belo wwith the new files added as of today?
I have also added here now the updated consensus comparison notebook based on v19 data. It shows the radically different VarDict files as noted here: #990 (comment)
Yes, I am rerunning it as we speak. It was during the rerun where the issue of the |
Just confirming that the consensus results did not change. (For some reason, the md5s of the files changed, but diff finds no changes, so it must be some subtle whitespace alterations.) The comparison results are still running though. |
…piro/OpenPBTA-analysis into jashapiro/update-SNV-consensus-v19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I re-ran and approving because it works with the new formatted mafs in v19 👍
Just wanted to document that I re-ran this module to incorpotate the update from here and the consensus files are now in release-v19-20210423 s3 bucket.
|
Purpose/implementation Section
What scientific question is your analysis addressing?
In the updated MAF files for v19, some of the columns have changed (see #990 (comment)). Mostly this is the removal of columns we were not using, but this broke the consensus script.
There are also some new columns that are unique to specific callers. #990 (comment)
What was your approach?
I modified to consensus script (specifically the building of the SNV database) to only specify column types for the fields that are universal across callers, and to only store those columns, disregarding the columns that are specific to individual callers.
The only added column is
HotspotAllele
. While the approach taken means this script will work with extra columns, it does now mean that theHotspotAllele
column is required, so this will fail with the current CI files, until those have been updated to v19.What GitHub issue does your pull request address?
#990
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Is there anything that you want to discuss further?
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes.
Results
What types of results are included (e.g., table, figure)?
New consensus and tmb files will be uploaded to S3 soon.
What is your summary of the results?
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.