Skip to content

Latest commit

 

History

History
57 lines (43 loc) · 4.54 KB

README.md

File metadata and controls

57 lines (43 loc) · 4.54 KB

Host4Phage

A tool to identify bacterial hosts for phages on the basis of genomic sequences of bacteriophages and bacteria. Host4Phage uses bacterial CRISPR-Cas system for this purpose. The tool supports multithreading.

1. Tools used by Host4phage

Host4phage uses other available tools:

PILER-CR ---> Reference | Source
CRT ---> Reference | Source
MinCED --> Source
CRISPRDetect --> Reference | Source
Kmer-db --> Reference | Source

All the above mentioned tools will be called from the tool/bin folder.

2. Requirements

  • To run host4phage.py you'll need Python 3.8.8 or greater.
  • Python dependencies: tqdm --> pip install tqdm and joblib --> pip install joblib. Check out --> tqdm joblib.
  • CRT and MinCED tools require Java Runtime Environment.
  • CRISPRDetect tool requires the following tools: clustalw water seqret RNAfold cd-hit-est blastn. Check out--> CRISPRDetect.
  • FASTA extension for input files is required --> (*.fasta, *.fna, *.fa)

3. Description and usage

The tool uses two subcommands: spacers and compare.

  • spacers subcommand is responsible for identifying and extracting spacers.
  • compare subcommand is responsible for finding common sequences for hosts and bacteriophages by using k-mers.

Host4Phage with spacers subcommand can be called from the command line in the following way (quick usage):
python tool/host4phage.py spacers -i host_20_test -o output_spacers/piler -m piler

Host4Phage with compare subcommand can be called from the command line in the following way (quick usage):
python tool/host4phage.py compare -s output_spacers -v virus_20_test -o output_compare

Parameters - spacers subcommand:

Name Requiredness Description
-input/-i obligatory Directory path with bacterial genomes - files should
contain FASTA extension (*.fasta, *.fna, *.fa).
-method/-m obligatory Method for CRISPR sequence identification
- piler/crt/minced/crisprdetect.
-threads/-t optional Number of threads - is adjusted by default to the
number of processor threads in a user's computer.
-output/-o optional Directory path where two subdirectories will be created:
output containing result files of the selected method
and fasta containing extracted spacers - by default,
the directory named spacers will be created.

Parameters - compare subcommand:

Name Requiredness Description
-spacers/-s obligatory Directory path with extracted spacers - you can combine
results from all methods for identyfing CRISPR sequences
in two ways. The first one is to pass a directory where
subdirectories with the result files are located (e.g., the
output_spacers directory will contain subdirectories
with spacers for all methods and you can use only
-s output_spacers). The second one is to pass paths
to the results of each method separately in a single
command. Files with spacers should contain FASTA
extension (*.fasta, *.fna, *.fa).
-virus/-v obligatory Directory path with bacteriophage genomes - files should
contain FASTA extension (*.fasta, *.fna, *.fa).
-k optional Length of k-mers - viral genomes and CRISPR spacers
found in hosts will be divided into sequences of the
given length - by default, k = 18.
-threads/-t optional Number of threads - is adjusted by default to the number
of processor threads in a user's computer.
-output/-o optional Directory path where a file with .CSV extension will be
created - by default, the directory will be named
comparison. The file will contain number of common
k-mers for each bacterial and bacteriophage species.


You can also find the description of the parameters by using python tool/host4phage.py --help.