Skip to content

Development of easy to use and reproducible ML scripts for chemistry.

Notifications You must be signed in to change notification settings

qle2/McQuade-Chem-ML

 
 

Repository files navigation

McQuade-Chem-ML

Development of easy to use and reproducible ML scripts for chemistry.

Data Sets

Currently we are using three datasets from MoleculeNet.ai: Lipophilicity, FreeSOLV and ESOL and two datasets retreived from tutorials: LogP14k and jak2-pIC50.

Available Models

Our program supports random forest (RF), gradient decent boost (GDB), support vector machines (SVM), Adaboost, and k-nearest neightbors (KNN).

Dependencies

We should all be using the same conda evironment so that we do not run into the issue of "Well it works on my machine". To do this, we will host a .yml file for the shared environment on our repo (mlapp.yml).

  1. Create an conda virual environment from the mlapp.yml file conda env create -f mlapp.yml

  2. Update the virtual environment as necessary using conda install

  3. Update the mlapp.yml file using conda env export > mlapp.yml --no-builds --from-history. Make sure that you add the mlapp.yml file to git, if it not already being watched.

    Note: Sometimes packages cannot be installed from conda, such as descriptastorus. If this is the case, you may need to use pip to install from a github link. See the mlapp.yml file for an example (descriptastorus) for an example of how to account for this in the mlapp.yml file.

    - pip:
        - "git+git://github.com/bp-kelley/descriptastorus.git#egg=descriptastorus"
    
  4. Commit your changes, which include the mlapp.yml file. git commit -m "your commit message here"

Workflow

MLModel Class

This is the overview of our MLModels Python class functions. main.py essentially just iteratively runs this workflow with different input algorithms, data sets and featurization methods.
Alt text

File Tree

About

Development of easy to use and reproducible ML scripts for chemistry.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.6%
  • Shell 0.4%