Michelle L. Gaynor, Jacob B. Landis, Tim K. O’Connor, Robert G. Laport, Jeff J. Doyle, Douglas E. Soltis, José Miguel Ponciano, and Pamela S. Soltis
nQuack is a modified statistical framework to predict ploidy level based on sequence data. We build upon Weib et al., 2018 Gaussian Mixture Model approach to estimate ploidy level, which was originally written as a C executable.
Here we provided expanded tools and implementations to improve site-based heterozygosity inferences of ploidal level.
nQuack provides data preparation guidance and tools to decrease noise in input data. These include a maximum sequence coverage quantile filter and sequence error-based filter, to remove biallelic sites that are likely not representative of copy number variance in the nuclear genome. We also consider only the frequency of allele A or B at each site, instead of both, as found in other methods. To learn more about best practices, see our Data Preparation guide.
Our model improves upon the nQuire framework by extending it to higher ploidal levels (pentaploid and hexaploid), correcting the augmented likelihood calculation, implementing more suitable distribution, and allowing additional ‘fixed’ models. We also decrease model selection errors by relying on BIC rather than likelihood ratio tests. To learn more about these methods, see our Model Options guide.
We provide 32 ways to estimates likelihood of a mixture of models with the expectation maximization algorithm (see more here) - 8 expectation maximization implementations with 4 model types each. In total, nQuack offers 128 models.
To examine the utility of this method, we examined 513,792 models based on both simulated and real samples. Before using this method, we suggest you read our manuscript and consider the many limitations to a pattern-based approach for determining ploidal level.
install.packages("devtools")
devtools::install_github("mgaynor1/nQuack")
If you are working on your personal computer, make sure samtools is installed and callable as "samtools" via terminal. If you are working on a cluster, you may need to symbolically-link samtools locally. Though the location of install may differ, here is how I make samtools callable locally on UF's amazing HiPerGator slurm cluster:
mkdir bin
cd bin
ln -s /apps/samtools/1.15/bin/samtools samtools
For implementation, see our Basic Example article.
Gaynor ML, Landis JB, O’Connor TK, Laport RG, Doyle JJ, Soltis DE, Ponciano JM, and Soltis PS. 2024. nQuack: An R package for predicting ploidy level from sequence data using site-based heterozygosity. Applications in Plant Sciences 12(4):e11606. doi: 10.1002/aps3.11606
- If you have sequence data with known plodial level for a mixed-ploidy system, let us know. We would love to collaborate with you. To be included in v2.0, please send me an email at shellyleegaynor at gmail.