Skip to content

Generating and scoring novel enzyme sequences with a variety of models and metrics

License

Notifications You must be signed in to change notification settings

seanrjohnson/protein_scoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks

Colab notebooks

Notebook Description
ESM-MSA sampler uses the ESM-MSA model (a transformer-based neural network trained on protein multiple sequence alignments) to generate new protein sequences by iteratively mutating sequences from an input alignment.
Metrics Calculates various sequence- and structure-based quality scores for proteins, such as those produced by generative models.

Figures

Setup

conda env create --name protein_scoring -f conda_env.yml

jupyter lab

Related data and repositories

  • Source data: AlphaFold2 predicted structures, Full sequence lists, Tables of metrics, Tables of experimental results, Phylogenetic Trees. Jupyter notebooks under "notebooks_for_figures" will automatically download the necessary data from Zenodo, but if you want it for some other purpose, it's available at this link.
  • protein_gibbs_sampler: command line tools for generating new sequences using ESM-MSA sampling (used in the notebook above).

References

  • Johnson, Sean R., Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, and Kevin K. Yang. “Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks.” Nature Biotech, April 23, 2024. https://doi.org/10.1038/s41587-024-02214-2.

About

Generating and scoring novel enzyme sequences with a variety of models and metrics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published