Skip to content

Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction

License

Notifications You must be signed in to change notification settings

GSK-AI/meta-learning-qsar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Meta-Learning Initializations for Low Resource Drug Discovery

This repo contains accompanying code for the publication "Meta-Learning Initializations for Low Resource Drug Discovery" (Nguyen et al.).

Instructions

Cloning and setting up your environment

git clone https://github.com/GSK-AI/meta-learning-qsar.git
conda env create --name metalearning --file environment.yaml
source activate metalearning

Setting PYTHONPATH

cd meta-learning-qsar
export PYTHONPATH=$PYTHONPATH:$(pwd)

Setting OE_LICENSE

This step requires the OpenEye license file and is necessary for running src/featurize.py. Change <path> to the appropriate directory.

export OE_LICENSE=<path>/oe_license.txt

Running tests

Run all tests if OpenEye license is available

pytest

In the case where license file is not available, exclude tests that use OpenEye OEChem library

pytest -k "not openeye"

Usage

Reproducing experiments with ChEMBL20

Extracting and combining chunked and featurized data

python exp/preprocess.py

Train Baselines, MAML, FOMAML, and ANIL using the provided splits

./exp/train_and_evaluate.sh 

Once training is done, generate test statistics on held-out test tasks by running

./exp/test.sh 

Training on custom data

First featurize data from SMILES to graph representation.

python src/featurize.py \
    --data <csv file> \
    --smiles_col <name of SMILES column> \
    --output_col <name of output columns> \
    --output_path <folder to store featurized data>

Use src/train_maml.py to kick off MAML training. The two required arguments are --save_path and --source.

python src/train_maml.py \ 
    --save_path <directory to store checkpoint> \
    --source <directory where training and validation data is stored>
    ...

Use src/validate_maml.py to calculate validation metrics from saved checkpoints. This python script will kick off validation slurm jobs as new checkpoints are found. --monitor_path and --source should be the the same as --save_path and --source used in src/train_maml.py

python src/validate_maml.py  \
    --monitor_path <directory to store checkpoint> \
    --source <directory where training and validation data is stored> 
    ...

Notes

  • Usage instructions can be found at the top of each file.
  • Description of available arguments for each script can be obtained by using the --help flag.
  • For example usage of these files, see exp/train_and_evaluate.sh and exp/test.sh.
  • src/validate_maml.py calls src/evaluate_transfer_learning.py underneath the hood, but requires users to operate on a slurm cluster. If this is not the case, one can directly use src/evaluate_transfer_learning.py to evaluate each checkpoint individually.

Contact

For questions, please feel free to reach out via email at cuong.q.nguyen@gsk.com.

About

Meta-Learning GNN Initializations for Low-Resource Molecular Property Prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published