-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep duplicates "contigs hitting multiple probes" #328
Comments
Howdy, What types of data are you inputting? If loci are proximate to one another in the assemblies you have, it might be worthwhile to consider following the "harvesting loci from genomes" approach (e.g. Tutorial 3) and reducing the distance sliced from the "core" of each UCE locus identified (within a given contig). Then, input those genome slices to the normal approach. Just keep in mind that if the loci are VERY proximate to one another, you are not getting a independent-ish draw from the genome. |
Thank you, that's a very useful suggestion! I am working with contigs assembled in SPADES from raw next gen sequencing data, trying to identify what UCEs I have represented. Luckily I have many UCEs from all over the genome and they are not ALL very proximate to one another, but there's definitely some that are close enough together that they're getting assembled and then hitting multiple probes. It messes up my analysis to have them all dropped since I have an underestimate of locus representation across taxa. I will try the harvesting loci from genomes approach and see if that solves my issue! |
Hi there, I have a duplicate file from the --keep duplicates flag. However, I'm confused about how to automate retrieving the contigs that map to multiple UCEs. Because I am working with very small genomes, many of my UCEs seem to be close enough together that the assembled contigs cover multiple UCEs, but I would still like to include these loci in my downstream analysis rather than just dropping them. But I'm not sure of the most efficient way to do this. I see you have scripts for the opposite issue (phyluce_assembly_parse_duplicates_file.py retrieves contigs under "probes hitting multiple contigs" rather than "contigs hitting multiple probes" which is what I need). I've tried editing this script to look at contigs hitting multiple probes instead, but I just keep getting blank output files.
Would appreciate any advice!
The text was updated successfully, but these errors were encountered: