Skip to content

Finding redundancies and similarities in SMILES datasets

Notifications You must be signed in to change notification settings

moozeq/dd-redun

Repository files navigation

Description

Finding similarities and redundancy in chemical data sets.

Installation

git clone https://github.com/moozeq/dd-redun.git

cd dd-redun
git submodule update --init --recursive
pip3 install -r requirements.txt

Prepare database

Ligands

  1. Download PDBBind database (e.g. CASF-2016) and move its coreset to dd-redun/coreset (you may also used pre-built database from demo/db.smi, in that case skip to 4.)
    # move coreset from CASF-2016
    mv CASF-2016/coreset .
    
    # or use pre-built database from demo and skip to 4
    cp demo/db.smi .
  2. In case of CASF-2016 you may need to remove 4mme complex, because ligand from this complex is causing error when creating fingerprints:
    rm -rf coreset/4mme
  3. Run script from below (simply getting smiles and id for each ligand):
    for f in `ls coreset/`; do obabel -imol2 coreset/${f}/${f}_ligand.mol2 -osmi | awk '{print $1" "$2}' >> db.smi; done
  4. Database should be at dd-redun/db.smi
  5. For docking functionality whole coreset folder must be under dd-redun/coreset directory

Receptors

  1. Run script from below (simply merging all x_pocket.pdb files into one file database) or use pre-built database from demo/prots.pdb:
    # generate prots database
    for f in `ls coreset/`; do cat coreset/${f}/${f}_pocket.pdb >> prots.pdb; done
    
    # or use pre-built database from demo
    cp demo/prots.pdb .
  2. Database should be at dd-redun/prots.pdb
  3. Build G-LoSA using clang or g++:
    g++ glosa.cpp -o glosa

Requirements

Main functionality

Scaffolds

Docking

Receptors

Usage

Help

./redun.py -h
./scorun.py -h
./sredun.py -h

Database file

./redun.py db.smi
./scorun.py db.smi [ints]
./sredun.py prots.pdb

In pipeline

cat db.smi | ./redun.py
cat db.smi | ./scorun.py [ints]
cat prots.pdb | ./sredun.py

Report

Report from project is available here (in Polish).

Selected results plots

About

Finding redundancies and similarities in SMILES datasets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published