Runsheng Song
runsheng@umail.ucsb.edu
A framework to create Species Sensitivity Distributions (SSD) using pre-trained QSAR models
QSAR models were developed using Neural Networks in Tensorflow + Keras. Descriptors were calculated using Rdkit and Mordred, and optimized using tree-based feature selection.
All QSAR models have been cross-validated
Current toxicity endpoint is LC50
- Anaconda Python 2.7
- Recommend using Linux or MacOS.
- Install rdkit with conda first(save ur life):
conda install -c rdkit rdkit=2017.03.1
- Install QSAR_SSD_Toolbox via pip:
pip install QSAR_SSD_Toolbox
- Install the requirments.txt if some packages are missing via
pip install -r requirements.txt
from QSAR_SSD_Toolbox.src.qsar import qsar
SMILEs = 'CCCC' # The input SMILEs
this_q = qsar("Lepomis Macrochirus") # the name of the species, see below for avaliable species
print this_q.predict(SMILEs) # return a list of predicted LC50 values for the given species
from QSAR_SSD_Toolbox.src.qsar import run_all
SMILEs = ['CCCC'] # The input SMILEs must be a list
this_q = run_all.run(SMILEs) # return a pandas dataframe for the input chemicals on corrosponding species.
from QSAR_SSD_Toolbox.src.ssd import ssd_generator
from scipy.stats import lognorm
this_ssd = ssd_generator()
this_ssd.generate(this_q, dist=lognorm, run_bootstrap=True, bootstrap_time=1000, display_range=[0.8,100]) # this will return a plot with bootstrap and baseline SSD curves. For more information about bootstrap in SSD refer to this blog: https://edild.github.io/ssd/
- Other water fleas model:
This model include the experimental data (LC50) of different kind of water fleas, except Daphnia magna
R^2 on testing chemicals: 0.61