Skip to content

Releases: ecrl/ecnet

Bug fixes, enhancements

14 Jul 23:05
2653cb2
Compare
Choose a tag to compare
  • ecnet.Server.remove_outliers and ecnet.tasks.remove_outliers have been removed
    • while detecting outliers may be beneficial in determining abnormalities in data, removing them entirely is likely not the right approach (in terms of fuel property prediction). Once a viable usage has been determined, outlier detection will be included.
  • Added the batch_size hyper-parameter, included in the default model configuration and hyper-parameter tuning process
    • Relevant unit tests updated
  • Any missing model configuration variables from config files generated with previous versions of ECNet will now be set to their default values
    • Additional unit tests added
  • Added option to convert SMILES to MDL during PaDEL-based database creation
    • Additional unit test added
  • Added PaDEL-generated databases for all properties
  • ecnet.tasks.limit_inputs.limit_rforest now relies on sklearn.ensemble.RandomForestRegressor as its only dependency
    • limit_rforest now returns list of parameter names/importances instead of a modified DataFrame
    • Server.limit_inputs also returns a list of parameter names/importances
    • Removed the ditto-lib dependency
  • Bug fixes:
    • Server._sets now loads when a PRJ file is opened via ecnet.Server
    • ecnet.utils.data_utils.DataFrame.set_inputs now immediately applies selected inputs to L/V/T sets
    • ParityPlot parity lines now scale to reflect data minimum/maximum
  • More robust unit tests for MultilayerPerceptron, database creation, input parameter limiting
  • All unit tests may now be run individually

Better MLP validation, moved multiprocessing checks

22 Jun 15:13
807f592
Compare
Choose a tag to compare
  • Training an MLP using a validation set now uses Keras' early stopping callback to determine learning cutoff, preserves weights at best validation loss
  • Moved multiprocessing.set_start_method to multiprocessed tasks

Removal of conversion functions, slight Server rework

11 Jun 00:30
1e317c2
Compare
Choose a tag to compare

1.) The following conversions have been removed from ECNet:

  • get_smiles
  • smiles_to_descriptors
  • smiles_to_mdl
  • mdl_to_descriptors

*Note: these were adding clutter, and were not within the main scope of ECNet.

2.) PaDEL-Descriptor is no longer bundled into ECNet

*Note: with the removal of conversion functions, this is no longer needed.

3.) Database creation functions now rely on two separate packages:

*Note: it made sense to create separate packages for interfacing with these software, a Python interface for generating QSPR descriptors is generally quite handy.

4.) ecnet.tools.database.create_db's arguments have been changed:

>>> ecnet.tools.database.create_db(['CC', 'CCC'], 'my_database.csv', targets=[13, 47])

Construct using alvaDesc:

>>> ecnet.tools.database.create_db(['CC', 'CCC'], 'my_database.csv', targets=[13, 47], backend='alvadesc')

*Note: supplying SMILES strings and targets using lists makes more sense than requiring the user to create a separate file - this change allows the user to choose where the data comes from.

5.) ecnet.tools.project.predict's arguments have been changed:

>>> results = ecnet.tools.project.predict(['CC', 'CCC'], 'my_project.prj')
>>> print(results)
[[13], [47]]

*Note: similar to why we switched to lists as inputs in database creation, makes more sense

6.) ecnet.Server has been rearranged a bit:

  • project training has been moved to a separate function at ecnet.tasks.training.train_project
  • various functions have been moved to ecnet.utils.server_utils:
    • creating a project folder structure
    • saving a project as a .prj file
    • opening a .prj file to use
  • task-specific logging messages have been moved to their respective functions in ecnet.tasks

*Note: ecnet.Server needed to be shrunk down, and functions that were obviously utilities were moved into utility files. This should also provide more direct access to the "back-end" of ECNet (subverting Server usage), allowing greater variation in experimental procedure.

7.) Added a suite of unit tests implemented with the unittest library:

  • in addition to Server unit tests, individual utilities of ECNet are tested
  • added a Python script, /tests/test_all.py, to automatically run all unit tests and report a summary of successes/failures

*Note: it's time for "proper" unit testing, and that means implementing a unit testing package. I'm looking forward to expanding ECNet's tests and introduce more automation into the testing process.

8.) Installation now forces TensorFlow 1.13.1 to be installed

*Note: I've encountered pip install tensorflow installing the 2.0.0 beta, which ECNet does not currently support - we'll make the change when we're ready (and so is Keras)

9.) Changed/added a variety of databases to the /databases/ directory

  • All databases constructed using alvaDesc
  • All SMILES strings have been validated with respect to compound name

*Note: in order to ensure accurate QSPR-descriptor to experimental value correlation, accurate SMILES strings are necessary (assuming descriptors are being generated using them).

Type checking, improved unit testing

03 Jun 21:54
2083718
Compare
Choose a tag to compare
  • All methods/functions now enforce specific types for arguments, return values
  • calc_r2 function now uses scikit-learn's r2_score function
  • Changed unit testing scheme, now uses unittest library
    • added a suite of unit tests

Addition to conversion tools, update to database creation function

30 May 16:50
34f96a5
Compare
Choose a tag to compare
  • Addition of the "smiles_to_descriptors" function
  • Database creation functions now use the "smiles_to_descriptors" function, bypassing the use of OpenBabel (used for SMILES -> MDL -> descriptors)
  • Updated relevant documentation

Updates to DataFrame, DataPoint classes and their functionality

30 May 14:57
22d549f
Compare
Choose a tag to compare
  • STRING and GROUP attributes for DataPoints (rows in an ECNet-formatted database) can now be accessed as object attributes. For example:
>>> from ecnet.utils.data_utils import DataFrame
>>> df = DataFrame('my_database.csv')
>>> first_entry = df.data_points[0]
>>> print(first_entry.SMILES)  # SMILES is a STRING column in the supplied database
C
>>> print(getattr(first_entry, 'Compound Name')  # STRINGs with spaces are obtained like this
Methane
  • Additional STRING columns can be supplied when creating an ECNet-formatted database
  • Fixed issue where YAML package was throwing a loader warning
  • Suppressed TensorFlow warnings about deprecation
  • Updates to documentation
  • Other minor changes

Bug fixes, database creation improvements

27 Mar 19:02
f92b742
Compare
Choose a tag to compare
  • Added "set_spawn_method", fixes multiprocessing on Unix systems
  • Databases can now be constructed with fingerprints instead of descriptors
  • "get_smiles" function now returns an empty string if the molecule is not found on PubChem
  • Slight updates to logging
  • Hyperparameter tuning bug fix

3.0.0 Release

25 Mar 16:36
36848dc
Compare
Choose a tag to compare
  • Server object refactor
    • Includes API changes
  • Update to ML back end (raw TensorFlow -> Keras)
  • Logging moved to separate module
  • Input descriptor limiting now uses random forest regression, via ditto-lib 1.0.0
  • Implemented ReadTheDocs page
  • Added classes for parity plot generation
  • Updated hyperparameter tuning for ECabc 2.2.2 release
  • Implemented methods for removing outliers, via ditto-lib 1.0.0

Bug fixes, GA improvements, data sorting options, optimizations

07 Feb 21:16
0691961
Compare
Choose a tag to compare
  • Updated parameter limiting with GA, per 0.6.0 PyGenetics update
  • Fixed bug with MultilayerPerceptron returning "NaN" values
  • Changed default parameter bounds for ABC tuning
  • Added "sort_string" argument for Server.import_data
  • Added "Getting Started" tutorial for new users

Tool integrations, database additions

28 Jan 02:54
7360c08
Compare
Choose a tag to compare
  • Integrated various tools:
    • Database creation tool (wrappers for Open Babel, PaDEL-Descriptor)
    • Using project tool (supply text w/ molecules, ECNet .prj file)
    • Get SMILES from molecule name (PubChemPy)
    • Convert SMILES to MDL/SDF
    • Convert MDL/SDF to QSPR descriptors
  • Added unit tests for database creation tool, using project tool
  • Removed command line tools (integrated, above)
  • Added various databases:
    • Cloud point
    • Pour point
    • Yield sooting index