Notebook | Description |
---|---|
ESM-MSA sampler | uses the ESM-MSA model (a transformer-based neural network trained on protein multiple sequence alignments) to generate new protein sequences by iteratively mutating sequences from an input alignment. |
Metrics | Calculates various sequence- and structure-based quality scores for proteins, such as those produced by generative models. |
conda env create --name protein_scoring -f conda_env.yml
jupyter lab
- Source data: AlphaFold2 predicted structures, Full sequence lists, Tables of metrics, Tables of experimental results, Phylogenetic Trees. Jupyter notebooks under "notebooks_for_figures" will automatically download the necessary data from Zenodo, but if you want it for some other purpose, it's available at this link.
- protein_gibbs_sampler: command line tools for generating new sequences using ESM-MSA sampling (used in the notebook above).
- Johnson, Sean R., Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, and Kevin K. Yang. “Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks.” Nature Biotech, April 23, 2024. https://doi.org/10.1038/s41587-024-02214-2.