A tool to identify bacterial hosts for phages on the basis of genomic sequences of bacteriophages and bacteria. Host4Phage uses bacterial CRISPR-Cas system for this purpose. The tool supports multithreading.
Host4phage uses other available tools:
PILER-CR ---> Reference | Source
CRT ---> Reference | Source
MinCED --> Source
CRISPRDetect --> Reference | Source
Kmer-db --> Reference | Source
All the above mentioned tools will be called from the tool/bin folder.
- To run host4phage.py you'll need Python 3.8.8 or greater.
- Python dependencies:
tqdm
-->pip install tqdm
andjoblib
-->pip install joblib
. Check out --> tqdm joblib. - CRT and MinCED tools require Java Runtime Environment.
- CRISPRDetect tool requires the following tools:
clustalw
water
seqret
RNAfold
cd-hit-est
blastn
. Check out--> CRISPRDetect. FASTA
extension for input files is required -->(*.fasta, *.fna, *.fa)
The tool uses two subcommands: spacers
and compare
.
spacers
subcommand is responsible for identifying and extracting spacers.compare
subcommand is responsible for finding common sequences for hosts and bacteriophages by using k-mers.
Host4Phage with spacers
subcommand can be called from the command line in the following way (quick usage):
python tool/host4phage.py spacers -i host_20_test -o output_spacers/piler -m piler
Host4Phage with compare
subcommand can be called from the command line in the following way (quick usage):
python tool/host4phage.py compare -s output_spacers -v virus_20_test -o output_compare
Parameters - spacers
subcommand:
Name | Requiredness | Description |
---|---|---|
-input /-i |
obligatory | Directory path with bacterial genomes - files should contain FASTA extension (*.fasta, *.fna, *.fa) . |
-method /-m |
obligatory | Method for CRISPR sequence identification - piler /crt /minced /crisprdetect . |
-threads /-t |
optional | Number of threads - is adjusted by default to the number of processor threads in a user's computer. |
-output /-o |
optional | Directory path where two subdirectories will be created: output containing result files of the selected method and fasta containing extracted spacers - by default, the directory named spacers will be created. |
Parameters - compare
subcommand:
Name | Requiredness | Description |
---|---|---|
-spacers /-s |
obligatory | Directory path with extracted spacers - you can combine results from all methods for identyfing CRISPR sequences in two ways. The first one is to pass a directory where subdirectories with the result files are located (e.g., the output_spacers directory will contain subdirectories with spacers for all methods and you can use only -s output_spacers ). The second one is to pass paths to the results of each method separately in a single command. Files with spacers should contain FASTA extension (*.fasta, *.fna, *.fa) . |
-virus /-v |
obligatory | Directory path with bacteriophage genomes - files should contain FASTA extension (*.fasta, *.fna, *.fa) . |
-k |
optional | Length of k-mers - viral genomes and CRISPR spacers found in hosts will be divided into sequences of the given length - by default, k = 18. |
-threads /-t |
optional | Number of threads - is adjusted by default to the number of processor threads in a user's computer. |
-output /-o |
optional | Directory path where a file with .CSV extension will be created - by default, the directory will be named comparison . The file will contain number of common k-mers for each bacterial and bacteriophage species. |
You can also find the description of the parameters by using python tool/host4phage.py --help
.