design genome analysis workflow #3

mafeeney · 2022-05-06T12:17:25Z

something, MUMmer, bedtools, primer design

widdowquinn · 2022-05-06T12:28:57Z

Assuming we start with a set of genomes that can be divided into pathogens and non-pathogens, we need to:

divide the genomes into the correct groups (e.g. using MLST/other markers, maybe presence/absence of effectors/toxins) in galaxy
perform pairwise genome comparisons of pathogens against each other, and pathogens against non-pathogens (or even a single pathogen genome against all non-pathogen genomes, because of the set arithmetic), with mummer in galaxy
use BEDtools or similar to identify regions common to all pathogens (intersection of regions aligning to a reference pathogen genome, common to all other pathogen genomes) in galaxy
use BEDtools or similar to identify regions common to all pathogens, but also present in at least one non-pathogen (these will be discarded as they are not diagnostic of the pathogens) in galaxy
use a primer design tool to design primers to the reference pathogen genome, and keep only those that amplify a region unique to/diagnostic of pathogens (galaxy)
test the designed primers in silico to ensure they amplify all the known pathogen genomes (galaxy)
test the designed primers in silico to ensure they do not amplify any known non-pathogens (galaxy)

The remaining primer sets after this process are candidate diagnostic primers that positively amplify pathogens, but not non-pathogens. We can then…

test the candidate primers against the RefSeq genome database at NCBI to ensure there is no wider off-target amplification (NCBI)

The last step might be a stretch goal.

Provide feedback