Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pileup variant for identical genomes #271

Closed
lagphase opened this issue Feb 23, 2024 · 6 comments
Closed

pileup variant for identical genomes #271

lagphase opened this issue Feb 23, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@lagphase
Copy link

Hi,

I have high quality ONT reads from a single cell that passed all the quality check. I assembled these reads into a genome, then used Clair3 on the reads and the assemble genome to see test out Clair3 accuracy. So theoretically I shouldn't see any SNP?

My command was in addition to the required flag --include_all_ctgs --no_phasing_for_fa --haploid_precise --call_snp_only

The output is:

[WARNING] No variant found, output empty vcf file
[WARNING] Copying pileup.vcf.gz to /home/vdpham/Documents/dorado_basecalling/augWGS_2023/barcode11/porechop-flye-medaka-wf-alignment-clair3/clair3_haploidprecise/merge_output.vcf.gz

[INFO] Finish calling, output file: /home/vdpham/Documents/dorado_basecalling/augWGS_2023/barcode11/porechop-flye-medaka-wf-alignment-clair3/clair3_haploidprecise/merge_output.vcf.gz

So it's good that there was no variant found? But when I opened the merge_output.vcf file, it looks like this:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
contig_1 58 . G . 11.56 RefCall P GT:GQ:DP:AD:AF 0/0:11:62:42:0.6774
contig_1 239 . C . 29.69 RefCall P GT:GQ:DP:AD:AF 0/0:29:63:54:0.8571
contig_1 966 . C . 21.30 RefCall P GT:GQ:DP:AD:AF 0/0:21:71:48:0.6761
contig_1 969 . C . 7.85 RefCall P GT:GQ:DP:AD:AF 0/0:7:71:39:0.5493
contig_1 970 . T . 26.99 RefCall P GT:GQ:DP:AD:AF 0/0:26:71:61:0.8592
contig_1 1211 . C . 18.24 RefCall P GT:GQ:DP:AD:AF 0/0:18:73:50:0.6849
contig_1 1214 . C . 16.75 RefCall P GT:GQ:DP:AD:AF 0/0:16:73:42:0.5753
contig_1 1215 . A . 23.37 RefCall P GT:GQ:DP:AD:AF 0/0:23:73:58:0.7945
contig_1 1219 . C . 17.63 RefCall P GT:GQ:DP:AD:AF 0/0:17:73:43:0.5890
contig_1 1222 . T . 23.90 RefCall P GT:GQ:DP:AD:AF 0/0:23:73:59:0.8082
contig_1 1227 . C . 24.19 RefCall P GT:GQ:DP:AD:AF 0/0:24:74:59:0.7973
contig_1 1231 . G . 23.76 RefCall P GT:GQ:DP:AD:AF 0/0:23:74:40:0.5405

How should I interpret this result? No variant found but there are pileup variants? What are pileup variants and are they true variants?

Thank you.

@aquaskyline
Copy link
Member

These are RefCall, meaning that they are not variants. You can simply ignore the RefCall records.

On the other hand, since you have not enabled showing RefCall, but Clair3 is showing them, it means that Clair3 has a bug and we identified it already. The bug doesn't affect the correctness of the called variants, but was causing RefCall to be shown in the final VCF when no variant was called in the full-alignment calling stage.

A fix is scheduled for v1.0.6. Before that, ignoring all lines with the RefCall tag solves your problem.

@aquaskyline aquaskyline added the bug Something isn't working label Feb 23, 2024
@lagphase
Copy link
Author

Hi,

Thanks for your response. I looked at the merge_output.vcf file again and there are 45 variants that are PASS and not RefCall, see below for an example (an extended image of the one in the previous post).

Most of the PASS variant QUAL score is below 10 but some are between 11 - 18. Should I ignore them because of the low QUAL score?

##contig=<ID=contig_1,length=1337779>

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
contig_1 58 . G . 11.56 RefCall P GT:GQ:DP:AD:AF 0/0:11:62:42:0.6774
contig_1 239 . C . 29.69 RefCall P GT:GQ:DP:AD:AF 0/0:29:63:54:0.8571
contig_1 966 . C . 21.30 RefCall P GT:GQ:DP:AD:AF 0/0:21:71:48:0.6761
contig_1 969 . C . 7.85 RefCall P GT:GQ:DP:AD:AF 0/0:7:71:39:0.5493
contig_1 970 . T . 26.99 RefCall P GT:GQ:DP:AD:AF 0/0:26:71:61:0.8592
contig_1 1211 . C . 18.24 RefCall P GT:GQ:DP:AD:AF 0/0:18:73:50:0.6849
contig_1 1214 . C . 16.75 RefCall P GT:GQ:DP:AD:AF 0/0:16:73:42:0.5753
contig_1 1215 . A . 23.37 RefCall P GT:GQ:DP:AD:AF 0/0:23:73:58:0.7945
contig_1 1219 . C . 17.63 RefCall P GT:GQ:DP:AD:AF 0/0:17:73:43:0.5890
contig_1 1222 . T . 23.90 RefCall P GT:GQ:DP:AD:AF 0/0:23:73:59:0.8082
contig_1 1227 . C . 24.19 RefCall P GT:GQ:DP:AD:AF 0/0:24:74:59:0.7973
contig_1 1231 . G . 23.76 RefCall P GT:GQ:DP:AD:AF 0/0:23:74:40:0.5405
contig_1 1236 . T . 27.41 RefCall P GT:GQ:DP:AD:AF 0/0:27:74:55:0.7432
contig_1 1495 . GA G 6.93 PASS P GT:GQ:DP:AD:AF 0/1:6:78:33,27:0.3462
contig_1 1713 . C CT 6.61 PASS P GT:GQ:DP:AD:AF 0/1:6:81:53,14:0.1728
contig_1 2007 . C . 21.95 RefCall P GT:GQ:DP:AD:AF 0/0:21:87:57:0.6552
contig_1 2008 . T . 24.87 RefCall P GT:GQ:DP:AD:AF 0/0:24:87:74:0.8506
contig_1 2436 . G . 20.41 RefCall P GT:GQ:DP:AD:AF 0/0:20:91:68:0.7473

@aquaskyline
Copy link
Member

I think you should ignore them since they are indels with lower quality.

@aquaskyline
Copy link
Member

Fixed in v1.0.6

@bethsampher
Copy link

bethsampher commented May 7, 2024

Hi @aquaskyline ,
I was still having problems with this and noticed the fix only applies when --pileup_only is specified (see diff) and not when no variants are found with the full-alignment model. Please could you fix this?
Thank you,
Beth

@aquaskyline
Copy link
Member

@bethsampher Please send your log file and show some VCF records that can describe your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants