I would like to modify the model for adapting the molecular generation task.
Mainly by tuning the generator by reinforcement learning (the train_label_rl.py
script).
Official implementation of Mol-CycleGAN for molecular optimization.
Keras CycleGan implementation is based on [tjwei/GANotebooks].
We highly recommend to use conda for package management -- the environment.yml
file is provided.
The environment can be created by running:
conda env create -f environment.yml
We use Junction Tree Variational Autoencoder implementation as a submodule in Mol-CycleGAN code. After cloning this repo, the following script should be executed before running the code
./scripts/init_repo.sh
We provide the user with all datasets needed to reproduce the aromatic rings experiments.
Downloading all the input data (ZINC 250k dataset and related JT-VAE encodings) can be performed by running:
./scripts/download_input_data.sh
Downloading all the data from aromatic rings experiments (train / test splits of datasets, molecules returned by Mol-CycleGAN and related SMILES) can be performed by running:
./scripts/download_ar_data.sh
This code is an implementation of CycleGan for molecular optimization.
Training of the model can be performed by running:
python train.py
with specified training parameters.
After the model is trained and the test set translation is generated, for decoding the molecules the JT-VAE code should be used. This can be performed by running:
python decode.py
with specified decoding parameters.
We provide all the data and code needed to reproduce the Aromatic rings
experiment.
-
In
data/input_data/aromatic_rings/datasets_generator_aromatic_rings.ipynb
one can find the data factory - the code that is needed to create train and test sets used in the experiment. -
Training of the model can be performed by running
./scripts/run_aromatic_rings_training.sh
. It calls thetrain.py
function with base parameters, which are set to process the aromatic rings data. -
Decoding the molecules can be performed by running
./scripts/run_aromatic_rings_decoding.sh
. It calls thedecode.py
function with base parameters, which are set to process the aromatic rings data. -
The analysis of the output is provided in the notebook
experiments/aromatic_rings.ipynb
.
The code for Mol-Cycle-Gan was natively written in Python3, however, the JT-VAE package is written in Python2. To ensure the ease of use, we used downgraded versions of packages, so that the entire experiment can be run in a single environment.
Since many of those packages are outdated, we strongly recommend using the environment.yml
file provided to construct the working environment.