Peptimizer is a repository based on machine learning algorithms for the optimization of peptides, and functional polymers in general. The codebase has been designed to be used for optimization of functionality and synthetic accessibility.
Based on our work on generating novel and highly efficient cell-penetrating peptides (link), we provide a generator-predictor-optimizer framework for the discovery of novel functional polymers. A tutorial notebook demonstrating the usage is presented in Tutorial_CPP.ipynb.
- Generator is trained on the library of polymer sequences in an unsupervised fashion using recurrent neural networks, and is used to sample similar-looking polymers.
- Predictor is a convolutional neural network model trained on sequence-activity relationships. The sequences are represented as matrices of monomer fingerprint bit-vectors. Each fingerprint bti-vector is a topological exploration of the monomer graph, where atoms are treated as nodes and bonds as edges. This is used to estimate the activities for unknown sequences.
- Optimizer is based on genetic algorithms, and optimizes sequences by evaluating single-residue and multi-residue mutations against an objective function.
Based on our work on optimization of synthetic accessibility for polymers synthesized using flow chemistry (link), we provide a predictor-optimizer framework. A tutorial notebook demonstrating the usage is presented in Tutorial_Synthesis.ipynb.
- Predictor is trained over experimental synthesis parameters such as pre-synthesized chain, incoming monomer, temperature, flow rate and catalysts. The different variables are represented as fingerprint, continuous and categorical features.
- Optimizer for synthesis is a brute-force optimization code that evaluates single-point mutants of the wild-type sequence for higher theoretical yield.
Using gradient activation maps, we provide monomer and sub-structure level insight into the functionality of different sequences. For example, in the case of functionality-based models, this enables to find the specific monomers (and their substructures) which contribute positively or negatively to the activity.
The package requires:
Optimization of functionality codebase -
@article{Schissel2020,
author = {Schissel, Carly K and Mohapatra, Somesh and Wolfe, Justin M and Fadzen, Colin M and Bellovoda, Kamela and Wu, Chia-Ling and Wood, Jenna A. and Malmberg, Annika B. and Loas, Andrei and G{\'{o}}mez-Bombarelli, Rafael and Pentelute, Bradley L.},
doi = {10.1101/2020.04.10.036566},
file = {:Users/somesh/Downloads/Articles/2020.04.10.036566v1.full.pdf:pdf},
journal = {bioRxiv},
title = {Interpretable Deep Learning for De Novo Design of Cell-Penetrating Abiotic Polymers},
url = {https://www.biorxiv.org/content/10.1101/2020.04.10.036566v1},
year = {2020}
}
Optimization of synthetic accessibility codebase -
@article{Mohapatra2020,
annote = {doi: 10.1021/acscentsci.0c00979},
author = {Mohapatra, Somesh and Hartrampf, Nina and Poskus, Mackenzie and Loas, Andrei and G{\'{o}}mez-Bombarelli, Rafael and Pentelute, Bradley L},
doi = {10.1021/acscentsci.0c00979},
issn = {2374-7943},
journal = {ACS Central Science},
month = {nov},
publisher = {American Chemical Society},
title = {{Deep Learning for Prediction and Optimization of Fast-Flow Peptide Synthesis}},
url = {https://doi.org/10.1021/acscentsci.0c00979},
year = {2020}
}
MIT License