Finding similarities and redundancy in chemical data sets.
git clone https://github.com/moozeq/dd-redun.git
cd dd-redun
git submodule update --init --recursive
pip3 install -r requirements.txt
- Download PDBBind database (e.g. CASF-2016) and move its
coreset
todd-redun/coreset
(you may also used pre-built database from demo/db.smi, in that case skip to 4.)# move coreset from CASF-2016 mv CASF-2016/coreset . # or use pre-built database from demo and skip to 4 cp demo/db.smi .
- In case of CASF-2016 you may need to remove 4mme complex, because ligand from this complex is causing error when creating fingerprints:
rm -rf coreset/4mme
- Run script from below (simply getting smiles and id for each ligand):
for f in `ls coreset/`; do obabel -imol2 coreset/${f}/${f}_ligand.mol2 -osmi | awk '{print $1" "$2}' >> db.smi; done
- Database should be at
dd-redun/db.smi
- For docking functionality whole
coreset
folder must be underdd-redun/coreset
directory
- Run script from below (simply merging all
x_pocket.pdb
files into one file database) or use pre-built database from demo/prots.pdb:# generate prots database for f in `ls coreset/`; do cat coreset/${f}/${f}_pocket.pdb >> prots.pdb; done # or use pre-built database from demo cp demo/prots.pdb .
- Database should be at
dd-redun/prots.pdb
- Build G-LoSA using clang or g++:
g++ glosa.cpp -o glosa
- AutoDock Vina
- ODDT
- PDBBind coreset
./redun.py -h
./scorun.py -h
./sredun.py -h
./redun.py db.smi
./scorun.py db.smi [ints]
./sredun.py prots.pdb
cat db.smi | ./redun.py
cat db.smi | ./scorun.py [ints]
cat prots.pdb | ./sredun.py
Report from project is available here (in Polish).