Published at PeerJ https://peerj.com/articles/4892/
BLAST-based validation of metagenomic sequence assignments. Bazinet AL, Ondov BD, Sommer DD, Ratnayake S. PeerJ. 2018 May 28;6:e4892. doi: 10.7717/peerj.4892. eCollection 2018.
Preprint available at https://doi.org/10.1101/181636
Bazinet, A. L., Ondov, B. D., Sommer, D. D., & Ratnayake, S. (2017). BLAST-based validation of metagenomic sequence assignments. bioRxiv:181636.
-
git clone https://github.com/bioforensics/blast-validate.git
-
Install KronaTools
Install from source:
git clone https://github.com/marbl/Krona
cd Krona/KronaTools
./install.pl --prefix <directory that is in your PATH>
./updateTaxonomy.sh
Install using Bioconda:
conda install krona
-
Install NCBI BLAST+
-
Optional: download and install the ART read simulator (needed for parameter optimization)
1.) Simulate reads with ART.
Example:
art_illumina -ss HS25 -i B_anthracis_Ames.fa -p -l 250 -f 10 -m 868 -s 408 -o ba_ART_sim -1 /usr/local/packages/art_bin_MountRainier/Illumina_profiles/HiSeq2500L250R1.txt -2 /usr/local/packages/art_bin_MountRainier/Illumina_profiles/HiSeq2500L250R2.txt
-p = paired-end
-l = length of reads to simulate (250 bp)
-f = fold-coverage (10x)
-m = mean fragment length
-s = fragment length standard deviation
-o = output prefix
-1 / -2 = custom quality profiles for R1 and R2
2.) Convert FASTQ files to FASTA format and combine R1 and R2 into a single file.
Example:
perl scripts/convertFastqToFasta.pl < ba_ART_sim1.fq > ba_ART_sim1.fa; perl scripts/convertFastqToFasta.pl < ba_ART_sim2.fq > ba_ART_sim2.fa; cat ba_ART_sim1.fa ba_ART_sim2.fa > ba_ART_sim.fasta
3.) BLAST all reads against the NCBI nt database.
Example:
blastn -query ba_ART_sim.fasta -task blastn -db /your/path/to/NCBI/nt -outfmt 7 > ba_ART_sim.blast
4.) Run parameter sweep.
Example:
perl scripts/blastValidate.pl -e 0,-1,-2,-4,-8,-16,-32,-64,-128 -b 0,1,2,4,8,16,32,64,128 -x -t 1392 ba_ART_sim.fasta ba_ART_sim.blast
5.) (Optional) Perform steps 1-4 for a near neighbor genome (e.g., a B. cereus genome).
6.) Evaluate results of parameter sweep.
Example:
perl scripts/evaluate_parameter_sweep.pl 1392 /path/to/B_anthracis/LCA/files [/path/to/near neighbor (B. cereus)/LCA/files]
1.) BLAST reads against the NCBI nt database.
Example:
blastn -query subway_analysis/SRR1748708_Kraken_Ba.fasta -task blastn -db /your/path/to/NCBI/nt -outfmt 7 > subway_analysis/SRR1748708_Kraken_Ba.blast
2.) Run the BLAST-validate script using parameter values previously determined to be optimal.
Example:
perl scripts/blastValidate.pl -e -64 -b 8 -x -t 1392 subway_analysis/SRR1748708_Kraken_Ba.fasta subway_analysis/SRR1748708_Kraken_Ba.blast