Fast Non-autoregressive Inverse Folding with Discrete Diffusion (NeurIPS MLSB 2023)
This repository offers an implementation of discrete diffusion for inverse protein folding. It has pre-trained models, training routines, and inference scripts, ensuring a streamlined experience for protein folding predictions and experiments.
- Provide code of designability metric for proper reproducibility.
- Configure PMPNN ARM sampling temperature correctly.
Clone the repository, navigate to its root directory, and create a conda environment using the provided YAML file. Activate the environment as follows:
conda env create -f environment.yml
conda activate your-env-name
Within the activated environment and the root directory of the repository, execute:
pip install -e .
For discrete diffusion inference with purity sampling, run
python experiments/inference_diff.py --sampling_type purity_sample
Refer to configs/clean/inference_diff.yaml
for a complete description of inference args.
To compute designability numbers, run
python scripts/run_esmfold_csv.py --csv_path your-csv-path ...
passing in your CSV path generated by inference_diff.py
.
We provide pretrained weights for ProteinMPNN trained on the CATH 4.2 dataset under the weights
directory. Both ARM and Discrete Diffusion weights are available.
To train an ARM model from scratch, run
python experiments/train_arm.py ...
To train a discrete diffusion model from scratch, run
python experiments/train_diff.py ...
Please reach out to johnyang@mit.edu
with any questions or concerns.
This project is endorsed under the MIT License - refer to the LICENSE.md file for details.