Ligand-binding site prediction tool based on machine learning.
- Java 1.8 or newer for execution
- PyMOL 1.7.x for viewing visualizations
P2RANK requires no installation. Binary packages can be downloaded from the project website.
- Download: http://siret.ms.mff.cuni.cz/p2rank
- Source code: https://github.com/rdk/p2rank
- Datasets: https://github.com/rdk/p2rank-datasets
prank predict -f test_data/1fbl.pdb # predict pockets on a single pdb file
See more usage examples below...
To compile P2RANK you need Gradle (https://gradle.org/). Build with ./make.sh
or gradle assemble
.
P2RANK makes predictions by scoring and clustering points on the protein's Connolly surface. Ligandability score of individual points is determined by a machine learning based model trained on the dataset of known protein-ligand complexes.
Slides: http://bit.ly/p2rank_slides (somewhat dated overview)
Following commands can be executed in the installation directory.
prank help
prank predict test.ds # run on whole dataset (containing list of pdb files)
prank predict -f test_data/1fbl.pdb # run on single pdb file
prank predict -f test_data/1fbl.pdb.gz # run on single gzipped pdb file
prank predict -threads 8 test.ds # specify no. of working threads for parallel processing
prank predict -o output_here test.ds # explicitly specify output directory
prank predict -c predict2.groovy test.ds # specify configuration file (predict2.groovy uses
different prediction model and combination of parameters)
...on a file or a dataset with known ligands.
prank eval-predict -f test_data/1fbl.pdb
prank eval-predict test.ds
For each file in the dataset program produces a CSV file in the output directory named
<pdb_file_name>_predictions.csv
, which contains an ordered list of predicted pockets, their scores, coordinates
of their centroids and list of PDBSerials of adjacent amino acids and solvent exposed atoms.
If coordinates of Connolly points that belong to predicted pockets are needed they can be found
in visualizations/data/<pdb_file_name>_points.pdb
. There "Residue sequence number" (23-26) of HETATM record
corresponds to the rank of corresponding pocket (points with value 0 do not belong to any pocket).
You can override default params with custom config file:
prank predict -c config/example.groovy test.ds
prank predict -c example.groovy test.ds
It is also possible to override the default params on the command line using their full name. To see complete list of params look into config/default.groovy
.
prank predict -seed 151 -threads 8 test.ds
prank predict -c example.groovy -seed 151 -threads 8 test.ds
In addition to predicting new ligand binding sites, P2RANK is also able to rescore pockets predicted by other methods (Fpocket and ConCavity are supported at the moment).
prank rescore test_data/fpocket.ds
prank rescore fpocket.ds # test_data/ is default 'dataset_base_dir'
prank rescore fpocket.ds -o output_dir # test_output/ is default 'output_base_dir'
prank eval-rescore fpocket-pairs.ds
Fpocket is widely used open source ligand binding site prediction program. It is fast, easy to use and well documented. As such, it was a great inspiration for this project. Fpocket is written in C and it is based on a very different algorithm.
Some practical differences:
- Fpocket
- has much smaller memory footprint
- runs faster when executed on a single protein
- produces a high number of less relevant pockets (and since the default scoring function isn't very effective the most relevant pockets often doesn't get to the top)
- contains MDpocket algorithm for pocket predictions from molecular trajectories
- still better documented
- P2RANK
- achieves significantly better identification success rates when considering top-ranked pockets
- produces smaller number of more relevant pockets
- speed:
- slower when running on a single protein (due to JVM startup cost)
- approximately as fast on average running on a big dataset on a single core
- due to parallel implementation potentially much faster on multi core machines
- higher memory footprint (~750m but doesn't grow much with more parallel threads)
Both Fpocket and P2RANK have many configurable parameters that influence behaviour of the algorithm and can be tweaked to achieve better results for particular requirements.
This program builds upon software written by other people, either through library dependencies or through code included in it's source tree (where no library builds were available). Notably:
- FastRandomForest by Fran Supek (https://code.google.com/archive/p/fast-random-forest/)
- KDTree by Rednaxela (http://robowiki.net/wiki/User:Rednaxela/kD-Tree)
- BioJava (https://github.com/biojava)
- Chemistry Development Kit (https://github.com/cdk)
- Weka (http://www.cs.waikato.ac.nz/ml/weka/)
We welcome any bug reports, enhancement requests, and other contributions. To submit a bug report or enhancement request, please use the GitHub issues tracker. For more substantial contributions, please fork this repo, push your changes to your fork, and submit a pull request with a good commit message.