Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify the correct sample name for empty VCFs instead of default #334

Closed
arostamianfar opened this issue Aug 14, 2020 · 5 comments
Closed

Comments

@arostamianfar
Copy link
Contributor

Describe the issue:
DeepVariant currently outputs default as the sample name for empty VCFs (related to the fix in #186 ). Ideally, the sample_name should be transferred from the original BAM file, but if that is too difficult to implement (since there are no examples), I think it would also be ok if we let the user customize this ID. One suggestion is that it can use the value provided by the --sample_name flag (currently, used for make_examples, but I think it can also be reused for postprocess_variants).

Using default causes issues in pipelines where the VCF is used downstream of DeepVariant (e.g. merging the VCF with other callers; or even within DeepVariant in a pipeline that scatters calling across multiple chromosomes and tries to gather them and some of those VCFs are empty).

Setup

  • DeepVariant version: v0.10.0
  • Installation method (Docker, built from source, etc.): docker
@pichuan
Copy link
Collaborator

pichuan commented Aug 14, 2020

Thanks for the suggestion @arostamianfar . Sounds like a good suggestion and should be easy to do : adding a flag to postprocess_variants. If sample_name is specified for run_deepvariant.py, we'll use it for both.

I'll file an internal issue to track this, and we should be able to have this in our next release.

@gunjanbaid
Copy link
Contributor

The functionality has been implemented in the internal codebase and will be available in the next release.

@pichuan
Copy link
Collaborator

pichuan commented Aug 21, 2020

Hi @arostamianfar

Like @gunjanbaid said, in our internal codebase, I added the --sample_name flag to postprocess_variants as well, which will allow users to pass in a different sample_name at that stage.

But, I do want to point out that even with the current version (0.10.0), if you pass in --sample_name in the one-step command like Quick Start, you should be already able to get the VCF output with the specified sample_name.

Because the --sample_name specified in the make_examples stage does get pass on to call_variants, and then to postprocess_variants.

I tried with exactly the steps in https://github.com/google/deepvariant/blob/r0.10/docs/deepvariant-quick-start.md , and added --sample_name:

sudo docker run \
  -v "${INPUT_DIR}":"/input" \
  -v "${OUTPUT_DIR}:/output" \
  google/deepvariant:"${BIN_VERSION}" \
  /opt/deepvariant/bin/run_deepvariant \
  --model_type=WGS \
  --ref=/input/ucsc.hg19.chr20.unittest.fasta \
  --reads=/input/NA12878_S1.chr20.10_10p1mb.bam \
  --regions "chr20:10,000,000-10,010,000" \
  --output_vcf=/output/output.vcf.gz \
  --output_gvcf=/output/output.g.vcf.gz \
  --num_shards=1 \
  --sample_name=FOOBAR

After running this command (with the current version, 0.10.0), I do get FOOBAR set as my sample name.

$ zcat quickstart-output/output.vcf.gz | grep FOOBAR
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  FOOBAR

Just an FYI, and hopefully I didn't misunderstand the issue you brought up. But, in the new code (which we plan to release soon), the --sample_name flag is added to postprocess_variants as well.

@arostamianfar
Copy link
Contributor Author

Thank you for the clarification, @pichuan ! However, my issue isn't with changing the sample_name, but with empty VCF files. In the above example, if you change regions to --regions "chr20:1,000-10,000", you'd get default in the resulting empty VCF file.

I believe passing sample_name to postprocess_variants would fix this problem. Thanks for fixing this!

@pichuan
Copy link
Collaborator

pichuan commented Aug 25, 2020

Ah yes, thanks @arostamianfar for providing the --regions "chr20:1,000-10,000" example and I can now reproduce the issue. It seems like my previous internal "fix" hasn't really fixed it, so I'll send in another fix, which will come out in the upcoming release :)

pichuan added a commit that referenced this issue Sep 3, 2020
…his to run_deepvariant.py as well. This addresses #334.

PiperOrigin-RevId: 327440165
pichuan added a commit that referenced this issue Sep 3, 2020
…e for empty VCFs became `default` if we don't specify it.

PiperOrigin-RevId: 328370355
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants