DomainBed is a PyTorch suite containing benchmark datasets and algorithms for domain generalization, as introduced in In Search of Lost Domain Generalization.
We extend this repo to allow for benchmarking DG algorithms for biological sequences, namely, therapeutic antibodies.
To do so, we adjust the backbones to SeqCNN or ESM, whcih is specified by adding the --is_esm
flag to the train script.
Our dataset can be found here: Before running any tests, please make sure you change the path in domainbed/datasets.py to whereever you store the data
Set up an environment with all necessary packages conda create --name <env_name> --file requirements.txt
Train any DG baseline from Domainbed on the Antibody datset as follows:
python -m domainbed.scripts.train --dataset AbRosetta --algorithm ERM --output_dir='./some_directory'
Dataset is available under domainbed/data/antibody_domainbed_dataset.csv
All other instructions from the main Domainbed repo hold, please see the original repo for more details on running sweeps.