This is a graph-based representation learning method for predicting protein functions. We use both network information and node attributes to improve the performance. Protein-protein interaction (PPIs) networks and sequence similarity networks are used to construct graphs, which are used to propagate node attribtues, according to the definition of graph convolutional networks.
We use amino acid sequence (CT encoding), subcellular location (bag-of-words encoding) and protein domains (bag-of-words encoding) as the node attributes (initial feature representation).
The auto-encoder part of our model is improved based on the implementation by T. N. Kifp. You can find the source code here.
If you found Graph2GO is useful for your research, please consider citing our work:
@article{10.1093/gigascience/giaa081,
author = {Fan, Kunjie and Guan, Yuanfang and Zhang, Yan},
title = "{Graph2GO: a multi-modal attributed network embedding method for inferring protein functions}",
journal = {GigaScience},
volume = {9},
number = {8},
year = {2020},
month = {08},
issn = {2047-217X},
doi = {10.1093/gigascience/giaa081},
url = {https://doi.org/10.1093/gigascience/giaa081}
}
- Python 3.6
- TensorFlow
- Keras
- networkx
- scipy
- numpy
- pickle
- scikit-learn
- pandas
You can download the data of all six species from here data. Please Download the datasets and put the data folder in the same path as thee src folder.
unzip data.zip
cd src/Graph2GO
python main.py
Note there are several parameters can be tuned: --ppi_attributes, --simi_attributes, --species, --thr_ppi, --thr_evalue, etc. Please refer to the main.py file for detailed description of all parameters