Skip to content

Latest commit

 

History

History
37 lines (30 loc) · 2.8 KB

README.md

File metadata and controls

37 lines (30 loc) · 2.8 KB

Roughness of Molecular Property Landscapes and Its Impact on Modellability

This repo contains the Jupyther Notebooks used to obtain the results presented in the paper Roughness of Molecular Property Landscapes and Its Impact on Modellability.

Requirements

In addition to python>=3.7 and common scientific libraries (e.g., scipy, numpy, pandas, matplotlib), these packages are required to be able to run the notebooks:

  • rogi
  • scikit-learn
  • PyTDC (with all dependencies; note that some are not automatically installed by pip, e.g., requests and networkx)
  • rdkit

Full detail of the Python environment used are in the environment.yml file.

Files

This is a description of the folders and files to help you navigate the repo.

Data and Plot Folders

  • data is generated by PyTDC and it contains the TDC datasets downloaded
  • oracle is generated by PyTDC and it contains a pickle file used by the package
  • chembl_datasets contain the ChEMBL datasets provided in the SI of the paper Exposing the limitations of molecular machine learning with activity cliffs
  • plots contains the plots for all results in the paper
  • landscapes contains 2D and 3D visualizations of the property landscapes

Notebooks

  • toy-examples: results related to the analytical function tests (Figures 1--3).
  • regression: results for all three sets of datasets (ZINC+GuacaMol, TDC, ChEMBL) related to all regression tasks, for ROGI, RMODI, and SARI (Figure 4, Table 1, and related SI Figures).
  • compute_sari: computes SARI scores for all datasets (the output of this notebook is the file regression_sari_scores.csv, which is used by the regression notebook for plotting)
  • classification: results related to all classification tasks, for ROGI and MODI (Figure 5, Table 2, and related SI Figures).
  • binarized_regression: results for the additional classification tasks based on the binzrization of the regression datasets (SI Figure 11)
  • convergence: results testing the convergence of ROGI with datasets of increasing size (SI Figures 13--16)
  • landscape_viz: generates 2D and 3D visualizations of the property landscapes.

Other files

  • regression_results.pkl: pickle file storing the results obtained in the notebook regression.ipynb. In the regression notebook, there is a cell to load this pickle file rather than re-running all experiments.
  • classification_results.pkl: pickle file storing the results obtained in the notebook classification.ipynb.
  • binarized_regression_results.pkl: pickle file storing the results obtained in the notebook binarized_regression.ipynb.
  • convergence_results.pkl: pickle file storing the results obtained in the notebook convergence.ipynb.