Unbiased Curriculum Learning Enhanced Global-Local Graph Neural Network for Protein Thermodynamic Stability Prediction (UCL-GLGNN, ThermoGNN) Bioinformatics 2023
ThermoGNN is a computational biology tool to predict the changes in thermodynamic stability of protein structure upon point mutations with Siamese Graph Attention Network. ThermoGNN constructs the residue interaction network around the mutation site, and model on the impact of point mutation on the interactions among the neighborhood residues. ThermoGNN integrates physicochemical properties of amino acids, multiple alignment profiles and energy scores.
The dataset after processing can be download from Google Drive or Baidu Drive.
Several third-party software and python libraries are required to use ThermoGNN for thermodynamic stability predictions. We outline the steps to install them in this section.
Apply for a License and download Rosetta 3.12 from https://www.rosettacommons.org/software/license-and-download.
It is recommended to download HH-suite3 via conda, and the Uniclust30 database can be downloaded at http://wwwuser.gwdg.de/~compbiol/uniclust/2020_06/.
conda install -c conda-forge -c bioconda hhsuite
It is recommended to use Anaconda to install PyTorch, PyTorch Geometrics and other required Python libraries. If you want to train the model on GPU, the version of CUDA mut
conda create -n ThermoGNN python=3.9
source activate ThermoGNN
conda install pytorch torchvision torchaudio cudatoolkit=$YOUR_CUDA_VERSION -c pytorch
conda install pytorch-geometric -c rusty1s -c conda-forge
conda install -c bioconda biopython
pip install -r requirements.txt
pip install wandb # for visualization
-
Use Rosetta to refine the structures.
# relax.sh python ThermoGNN/tools/relax.py -i input-pdb \ -l mutant_list.txt \ --rosetta-bin relax.static.linuxgccrelease \ -o data/pdbs/demo/
input-pdb
denotes the directory storing your prepared single-chain pdb structures.mutation_list.txt
records the mutations in which each line is in the format of1a23A 51 H L
. -
Generate MSA profiles by hhblits.
# hhblits.sh for pdb_dir in data/pdbs/demo/* do [[ -e $pdb_dir ]] python ThermoGNN/tools/hhblits.py -i $pdb_dir \ -db hhsuite_db/UniRef30_2020_06 \ -o data/hhm/demo/ \ --cpu 40 done
hhsuite_db/UniRef30_2020_06
is the path to the hhsuite database downloaded before. -
Generate residue interaction networks, and predict the ddG of candidate proteins and mutations.
python predict.py -l mutant_list.txt \ --model GAT \ --split demo # .pdb in data/pdbs/demo/, .hhm in data/hhm/demo/
You can view the predicted ddGs in
prediction.csv
in default. -
Train your own ThermoGNN model.
You can modify the hyperparameters in
run.sh
, and train the new model.
If you find this work useful, you can cite us via the following bibtex:
@article{gong2023unbiased,
author = {Gong, Haifan and Zhang, Yumeng and Dong, Chenhe and Wang, Yue and Chen, Guanqi and Liang, Bilin and Li, Haofeng and Liu, Lanxuan and Xu, Jie and Li, Guanbin},
title = {Unbiased Curriculum Learning Enhanced Global-Local Graph Neural Network for Protein Thermodynamic Stability Prediction},
journal = {Bioinformatics},
pages = {btad589},
year = {2023},
month = {09},
issn = {1367-4811},
doi = {10.1093/bioinformatics/btad589}
}