Skip to content

Latest commit

 

History

History
142 lines (94 loc) · 3.84 KB

README.md

File metadata and controls

142 lines (94 loc) · 3.84 KB

CardioTox net: A robust predictor for hERG channel blockade via deep learning meta ensembling approaches

Abdul Karim, Matthew Lee, Thomas Balle, and Abdul Sattar

This is complementary code for running the models in the paper submitted to BMC Cheminformatics dated 4th January, 2021.

Installation

Tested on Ubuntu 20.04 with Python 3.7.7

  1. Install conda dependency manager https://docs.conda.io/en/latest/
  2. Restore environment.yml:
conda env create -f environment.yml 
  1. Activate environment:
conda activate cardiotox
  1. Install pyBioMed:
cd PyBioMed
python setup.py install
cd ..
  1. Test model:
python test.py

This will test the model on two external data sets mentioned in the paper.

Usage

Run Ensemble

Single SMILE String

import cardiotox

smile = "CC(=O)SC1CC2=CC(=O)CCC2(C)C2CCC3C(CCC34CCC(=O)O4)C12"

model = cardiotox.load_ensemble()

model.predict(smile)

Multiple SMILE Strings

import cardiotox

smiles = [
    "CC(=O)SC1CC2=CC(=O)CCC2(C)C2CCC3C(CCC34CCC(=O)O4)C12",
    "CCCCCCCCCC[N+](CC)(CC)CC"
]

model = cardiotox.load_ensemble()

model.predict(smiles)

Run Individual Models

Import the model you want

from cardiotox import DescModel, SVModel, FVModel,  FingerprintModel

Run the model the same way as ensemble

from cardiotox import SVModel

smile = "CCCCCCCCCC[N+](CC)(CC)CC"

model = SVModel()

model.predict(smile)

Run Preprocessing

Each model performs its own preprocessing. When 'predict' is called, the preprocessing is performed before running the model. This can be accessed by calling the 'preprocess_smile' function.

from cardiotox import SVModel

smile = "CCCCCCCCCC[N+](CC)(CC)CC"

model = SVModel()

preprocessed_smile = model.preprocess_smile([smile]) # Expects a list of smiles

model.predict_preprocessed(preprocessed_smile)

Pairwise Tanimoto similarity

We make sure that none of the molecule in both test sets (test set-I, test set-II) are similar to trainining set (training) and to each other as well.

Pairwise Tanimoto similarity bins

Results

We compared our method using the test set-I and test set-II with other state of the art methods as follows.

Test set-I Test set-II
Methods MCC NPV ACC PPV SPE SEN
CardioTox 0.599 0.688 0.810 0.893 0.786 0.833
DeepHIT 0.476 0.643 0.773 0.833 0.643 0.833
CardPred 0.193 0.643 0.614 0.760 0.571 0.633
OCHEM Predictor-I 0.149 0.333 0.364 1.000 1.000 0.067
OCHEM Predictor-II 0.164 0.351 0.432 0.857 0.929 0.200
Pred-hERG 4.2 0.306 0.538 0.705 0.774 0.500 0.800
Methods MCC NPV ACC PPV SPE SEN
CardioTox 0.469 0.947 0.758 0.478 0.600 0.917
DeepHIT 0.398 0.941 0.721 0.417 0.533 0.909
CardPred 0.049 0.750 0.527 0.294 0.600 0.454
OCHEM Predictor-I 0.372 0.800 0.648 0.666 0.933 0.364
OCHEM Predictor-II 0.310 0.794 0.632 0.571 0.900 0.364
Pred-hERG 4.2 0.146 0.813 0.580 0.320 0.433 0.727

Note: Only suitable for SMILES with Maximum number of 1's in MorganFingerprint <= 93.