Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Update MAF fields for v19 SNV consensus #1033

Merged

Conversation

jashapiro
Copy link
Member

Purpose/implementation Section

What scientific question is your analysis addressing?

In the updated MAF files for v19, some of the columns have changed (see #990 (comment)). Mostly this is the removal of columns we were not using, but this broke the consensus script.

There are also some new columns that are unique to specific callers. #990 (comment)

What was your approach?

I modified to consensus script (specifically the building of the SNV database) to only specify column types for the fields that are universal across callers, and to only store those columns, disregarding the columns that are specific to individual callers.

The only added column is HotspotAllele. While the approach taken means this script will work with extra columns, it does now mean that the HotspotAllele column is required, so this will fail with the current CI files, until those have been updated to v19.

What GitHub issue does your pull request address?

#990

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes.

Results

What types of results are included (e.g., table, figure)?

New consensus and tmb files will be uploaded to S3 soon.

What is your summary of the results?

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

jashapiro and others added 30 commits September 10, 2019 14:47
@jashapiro jashapiro added the snv Related to or requires SNV data label Apr 27, 2021
@jashapiro jashapiro changed the title Jashapiro/update snv consensus v19 Update MAF fields for SNV consensus module Apr 27, 2021
@jashapiro
Copy link
Member Author

I have also added here now the updated consensus comparison notebook based on v19 data. It shows the radically different VarDict files as noted here: #990 (comment)

I am not sure whether this justifies changing the consensus algorithm... We could probably go to 3 of 4 callers now, but that would still not compare easily to the tcga data for which we don't have vardict results.

@jashapiro jashapiro changed the title Update MAF fields for SNV consensus module Update MAF fields for v19 SNV consensus Apr 27, 2021
@jashapiro jashapiro mentioned this pull request Apr 28, 2021
5 tasks
@jharenza jharenza requested a review from kgaonkar6 April 28, 2021 18:38
@jharenza jharenza self-requested a review April 28, 2021 18:46
Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jashapiro - I think the column deletions and additions look good here, but just had a question - do you want to rerun the notebook noted belo wwith the new files added as of today?

I have also added here now the updated consensus comparison notebook based on v19 data. It shows the radically different VarDict files as noted here: #990 (comment)

@jashapiro
Copy link
Member Author

Hi @jashapiro - I think the column deletions and additions look good here, but just had a question - do you want to rerun the notebook noted belo wwith the new files added as of today?

I have also added here now the updated consensus comparison notebook based on v19 data. It shows the radically different VarDict files as noted here: #990 (comment)

Yes, I am rerunning it as we speak. It was during the rerun where the issue of the experimental_strategy change popped up. #1026 (comment). I don't think any of the consensus results should change, but I will be checking that.

@jashapiro
Copy link
Member Author

Yes, I am rerunning it as we speak. It was during the rerun where the issue of the experimental_strategy change popped up. #1026 (comment). I don't think any of the consensus results should change, but I will be checking that.

Just confirming that the consensus results did not change. (For some reason, the md5s of the files changed, but diff finds no changes, so it must be some subtle whitespace alterations.) The comparison results are still running though.

Copy link
Collaborator

@kgaonkar6 kgaonkar6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I re-ran and approving because it works with the new formatted mafs in v19 👍

@kgaonkar6
Copy link
Collaborator

Just wanted to document that I re-ran this module to incorpotate the update from here and the consensus files are now in release-v19-20210423 s3 bucket.

> git remote add jashapiro https://github.com/jashapiro/OpenPBTA-analysis.git
> git remote -v
jashapiro	https://github.com/jashapiro/OpenPBTA-analysis.git (fetch)
jashapiro	https://github.com/jashapiro/OpenPBTA-analysis.git (push)
origin	https://github.com/kgaonkar6/OpenPBTA-analysis.git (fetch)
origin	https://github.com/kgaonkar6/OpenPBTA-analysis.git (push)
upstream	https://github.com/AlexsLemonade/OpenPBTA-analysis.git (fetch)
upstream	https://github.com/AlexsLemonade/OpenPBTA-analysis.git (push)

> git fetch jashapiro
remote: Enumerating objects: 611, done.
remote: Counting objects: 100% (379/379), done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 611 (delta 360), reused 379 (delta 360), pack-reused 232
Receiving objects: 100% (611/611), 15.16 MiB | 1.16 MiB/s, done.
Resolving deltas: 100% (392/392), completed with 132 local objects.
....

> git checkout -b jashapiro/update-SNV-consensus-v19 jashapiro/jashapiro/update-SNV-consensus-v19
Branch jashapiro/update-SNV-consensus-v19 set up to track remote branch jashapiro/update-SNV-consensus-v19 from jashapiro.

@kgaonkar6 kgaonkar6 mentioned this pull request May 3, 2021
5 tasks
@jashapiro jashapiro requested a review from kgaonkar6 May 4, 2021 20:09
@jashapiro jashapiro merged commit 0ed37a0 into AlexsLemonade:master May 5, 2021
@jashapiro jashapiro deleted the jashapiro/update-SNV-consensus-v19 branch May 5, 2021 13:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
review before release snv Related to or requires SNV data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants