Data and code for Deep learning regression model for antimicrobial peptide design. This repository contains code for training a model to predict antimicrobial activity of peptides against various bacteria including E. coli and P. aeruginosa.
GRAMPA (link to csv file) is a database of peptides and their antimicrobial activity against various bacteria. The database contains the following key columns:
- bacterium: the target bacterium.
- sequence: the sequence of amino acids that make up the peptide. strain: the strain of bacterium, when available.
- value: the MIC of the peptide on the bacterium.
The database also contains the following auxiliary columns:
- database: the database from which the row's information was scraped.
- url_source: a link to the database page from which the row's information was scraped.
- modifications: modifications that have been applied to the sequence.
- unit: the unit of measurement of MIC, always uM.
- is_modified: a binary column stating whether or not the sequence was modified.
- has_unusual_modification: a binary column stating whether or not the sequence was modified in any way other than by c-terminal amidation.
- has_cterminal_amidation: a binary column stating whether or not the sequence was modified with c-terminal amidation.
- datasource_has_modifications: a column stating whether the database for that row contained modification information. When this column is False, the sequence may have been modified irrespective of the value of
is_modified
.
To train a model for E. coli that has a 1:1 ratio of random negative examples and runs for 60 epochs:
git clone git@github.com:zswitten/Antimicrobial-Peptides.git
cd Antimicrobial-Peptides
pip install -r requirements.txt
python src/train_model.py --negatives=1 --bacterium='E. coli' --epochs=60
This notebook contains code for reproducing the figures in the paper.