Meta-Learning Initializations for Low Resource Drug Discovery

This repo contains accompanying code for the publication "Meta-Learning Initializations for Low Resource Drug Discovery" (Nguyen et al.).

Instructions

Cloning and setting up your environment

git clone https://github.com/GSK-AI/meta-learning-qsar.git
conda env create --name metalearning --file environment.yaml
source activate metalearning

Setting PYTHONPATH

cd meta-learning-qsar
export PYTHONPATH=$PYTHONPATH:$(pwd)

Setting OE_LICENSE

This step requires the OpenEye license file and is necessary for running src/featurize.py. Change <path> to the appropriate directory.

export OE_LICENSE=<path>/oe_license.txt

Running tests

Run all tests if OpenEye license is available

pytest

In the case where license file is not available, exclude tests that use OpenEye OEChem library

pytest -k "not openeye"

Usage

Reproducing experiments with ChEMBL20

Extracting and combining chunked and featurized data

python exp/preprocess.py

Train Baselines, MAML, FOMAML, and ANIL using the provided splits

./exp/train_and_evaluate.sh

Once training is done, generate test statistics on held-out test tasks by running

./exp/test.sh

Training on custom data

First featurize data from SMILES to graph representation.

python src/featurize.py \
    --data <csv file> \
    --smiles_col <name of SMILES column> \
    --output_col <name of output columns> \
    --output_path <folder to store featurized data>

Use src/train_maml.py to kick off MAML training. The two required arguments are --save_path and --source.

python src/train_maml.py \ 
    --save_path <directory to store checkpoint> \
    --source <directory where training and validation data is stored>
    ...

Use src/validate_maml.py to calculate validation metrics from saved checkpoints. This python script will kick off validation slurm jobs as new checkpoints are found. --monitor_path and --source should be the the same as --save_path and --source used in src/train_maml.py

python src/validate_maml.py  \
    --monitor_path <directory to store checkpoint> \
    --source <directory where training and validation data is stored> 
    ...

Notes

Usage instructions can be found at the top of each file.
Description of available arguments for each script can be obtained by using the --help flag.
For example usage of these files, see exp/train_and_evaluate.sh and exp/test.sh.
src/validate_maml.py calls src/evaluate_transfer_learning.py underneath the hood, but requires users to operate on a slurm cluster. If this is not the case, one can directly use src/evaluate_transfer_learning.py to evaluate each checkpoint individually.

Contact

For questions, please feel free to reach out via email at cuong.q.nguyen@gsk.com.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Meta-Learning Initializations for Low Resource Drug Discovery

Instructions

Cloning and setting up your environment

Setting PYTHONPATH

Setting OE_LICENSE

Running tests

Usage

Reproducing experiments with ChEMBL20

Training on custom data

Contact

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
data		data
exp		exp
src		src
test		test
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml

License

GSK-AI/meta-learning-qsar

Folders and files

Latest commit

History

Repository files navigation

Meta-Learning Initializations for Low Resource Drug Discovery

Instructions

Cloning and setting up your environment

Setting PYTHONPATH

Setting OE_LICENSE

Running tests

Usage

Reproducing experiments with ChEMBL20

Training on custom data

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages