-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WARNING: AD must list numbers #42
Comments
Greetings @ireneIta and thanks for using SNPGenie! Sorry for the issue. Indeed the AD entry seems correct, so it is likely an issue with file encoding. I'd be happy to give it a try myself if you can send a reproducible example (reference / gtf / VCF file combo) — here or private email is fine. It is correct that SNPGenie does not reverse complement the data, and one run of SNPGenie gives results for one strand only — although it is strand-AWARE in the sense that, if there are minus-strand entries in the GTF file, SNPGenie will correctly label those sites as genic and codons from two genes overlapping on opposite strands will still be flagged as such. It's a bit of a clunky fix, and an artifact of having been originally design for viral genomes, but the practical result is that SNPGenie must be run once for each strand with separate input for each strand. Let me know if that helps! |
Dear Chase, thank you also for the clarification on the fw/revcom and for confirming that SNPGenie must be run separately for the two strands. |
Ah, I see! This appears to be due to the merging of the VCF files — if you look at the "sample" columns at the right-most side of the output, the merging have forced each sample to have a record for each site that is variable in even one sample, so that the majority of records are "blank" in the form of "./.:.:.:.:.". Apparently I did not deal with this case for triallelic sites. If it it possible to run SNPGenie in a different mode on every individual VCF, that would circumvent the problem. Another quicker fix worth trying is to manually FIND/REPLACE, on this line only, as follows: FIND: ./.:.:.:.:. I can try to code for this case, but it will be several days before I can get around to it. Please let me know! |
thank you for the hints! I'll keep you posted. |
Please forgive my dropping the ball on this! Did you find a solution that works? If you'd like to to give it a go, I did receive your files but please also provide the exact command-line arguments you ran with them. But perhaps it's already been solved! Best, |
Dear Chase, best, |
Hi Chase,
thank you for this useful tool!
I am having some issues with a set of data. I got a warning, and SNPGenie terminates. I am running SNPGenie with --vcfformat=4; I got the vcf by merging individual vcfs with bcftools.
The warning is
WARNING: the line is:
NC_005967.2 4009 . T A,G 687.24 . BaseQRankSum=-0.486;Dels=0;FS=0;HRun=1;HaplotypeScore=1.6229;MQ=13.77;MQ0=7;MQRankSum=4.129;QD=7.29;ReadPosRankSum=-1.301;DP=1156;AF=1,0.5;AN=64;AC=38,21 GT:AD:DP:GQ:PL 1/1:36,25:61:9.09:477,9,0,.,.,.
For vcfformat==4, AD must list numbers of reads for two SNPs as follows: ,,. SNPGenie TERMINATED.
AD is actually a list of numbers (in the warning= 36,25), so I don't get the message... Could you please help me find out what's the problem?
I have another question: in the SNPGenie_parameters.txt file that is generated over the run, it is reported "COMPLEMENT MODE", which is "Yes" when running the software on "fw" data and is "No" when running it on revcom data. However, to my understanding (and according to the results of the few run), SNPGenie does not reverse-complement the data, so the results are only relative to one strand. Am I correct?
thank you very much!!
best wishes
Irene
The text was updated successfully, but these errors were encountered: