Releases · ecrl/ecnet

14 Jul 23:05

tjkessler

3.2.2

2653cb2

Bug fixes, enhancements

ecnet.Server.remove_outliers and ecnet.tasks.remove_outliers have been removed
- while detecting outliers may be beneficial in determining abnormalities in data, removing them entirely is likely not the right approach (in terms of fuel property prediction). Once a viable usage has been determined, outlier detection will be included.
Added the batch_size hyper-parameter, included in the default model configuration and hyper-parameter tuning process
- Relevant unit tests updated
Any missing model configuration variables from config files generated with previous versions of ECNet will now be set to their default values
- Additional unit tests added
Added option to convert SMILES to MDL during PaDEL-based database creation
- Additional unit test added
Added PaDEL-generated databases for all properties
ecnet.tasks.limit_inputs.limit_rforest now relies on sklearn.ensemble.RandomForestRegressor as its only dependency
- limit_rforest now returns list of parameter names/importances instead of a modified DataFrame
- Server.limit_inputs also returns a list of parameter names/importances
- Removed the ditto-lib dependency
Bug fixes:
- Server._sets now loads when a PRJ file is opened via ecnet.Server
- ecnet.utils.data_utils.DataFrame.set_inputs now immediately applies selected inputs to L/V/T sets
- ParityPlot parity lines now scale to reflect data minimum/maximum
More robust unit tests for MultilayerPerceptron, database creation, input parameter limiting
All unit tests may now be run individually

Assets 2

22 Jun 15:13

tjkessler

3.2.1

807f592

Better MLP validation, moved multiprocessing checks

Training an MLP using a validation set now uses Keras' early stopping callback to determine learning cutoff, preserves weights at best validation loss
Moved multiprocessing.set_start_method to multiprocessed tasks

Assets 2

11 Jun 00:30

tjkessler

3.2.0

1e317c2

Removal of conversion functions, slight Server rework

1.) The following conversions have been removed from ECNet:

get_smiles
smiles_to_descriptors
smiles_to_mdl
mdl_to_descriptors

*Note: these were adding clutter, and were not within the main scope of ECNet.

2.) PaDEL-Descriptor is no longer bundled into ECNet

*Note: with the removal of conversion functions, this is no longer needed.

3.) Database creation functions now rely on two separate packages:

PaDELPy (https://github.com/ECRL/PaDELPy) - QSPR descriptor generation using PaDEL-Descriptor
alvaDescPy (https://github.com/ECRL/alvaDescPy) - QSPR descriptor generation using alvaDesc

*Note: it made sense to create separate packages for interfacing with these software, a Python interface for generating QSPR descriptors is generally quite handy.

4.) ecnet.tools.database.create_db's arguments have been changed:

>>> ecnet.tools.database.create_db(['CC', 'CCC'], 'my_database.csv', targets=[13, 47])

Construct using alvaDesc:

>>> ecnet.tools.database.create_db(['CC', 'CCC'], 'my_database.csv', targets=[13, 47], backend='alvadesc')

*Note: supplying SMILES strings and targets using lists makes more sense than requiring the user to create a separate file - this change allows the user to choose where the data comes from.

5.) ecnet.tools.project.predict's arguments have been changed:

>>> results = ecnet.tools.project.predict(['CC', 'CCC'], 'my_project.prj')
>>> print(results)
[[13], [47]]

*Note: similar to why we switched to lists as inputs in database creation, makes more sense

6.) ecnet.Server has been rearranged a bit:

project training has been moved to a separate function at ecnet.tasks.training.train_project
various functions have been moved to ecnet.utils.server_utils:
- creating a project folder structure
- saving a project as a .prj file
- opening a .prj file to use
task-specific logging messages have been moved to their respective functions in ecnet.tasks

*Note: ecnet.Server needed to be shrunk down, and functions that were obviously utilities were moved into utility files. This should also provide more direct access to the "back-end" of ECNet (subverting Server usage), allowing greater variation in experimental procedure.

7.) Added a suite of unit tests implemented with the unittest library:

in addition to Server unit tests, individual utilities of ECNet are tested
added a Python script, /tests/test_all.py, to automatically run all unit tests and report a summary of successes/failures

*Note: it's time for "proper" unit testing, and that means implementing a unit testing package. I'm looking forward to expanding ECNet's tests and introduce more automation into the testing process.

8.) Installation now forces TensorFlow 1.13.1 to be installed

*Note: I've encountered pip install tensorflow installing the 2.0.0 beta, which ECNet does not currently support - we'll make the change when we're ready (and so is Keras)

9.) Changed/added a variety of databases to the /databases/ directory

All databases constructed using alvaDesc
All SMILES strings have been validated with respect to compound name
- PubChemPy (https://github.com/mcs07/PubChemPy) is a lifesaver
- Compounds not found on PubChem were validated in-house by an ECRL research assistant

*Note: in order to ensure accurate QSPR-descriptor to experimental value correlation, accurate SMILES strings are necessary (assuming descriptors are being generated using them).

Assets 2

03 Jun 21:54

tjkessler

3.1.2

2083718

Type checking, improved unit testing

All methods/functions now enforce specific types for arguments, return values
calc_r2 function now uses scikit-learn's r2_score function
Changed unit testing scheme, now uses unittest library
- added a suite of unit tests

Assets 2

30 May 16:50

tjkessler

3.1.1

34f96a5

Addition to conversion tools, update to database creation function

Addition of the "smiles_to_descriptors" function
Database creation functions now use the "smiles_to_descriptors" function, bypassing the use of OpenBabel (used for SMILES -> MDL -> descriptors)
Updated relevant documentation

Assets 2

30 May 14:57

tjkessler

3.1.0

22d549f

Updates to DataFrame, DataPoint classes and their functionality

STRING and GROUP attributes for DataPoints (rows in an ECNet-formatted database) can now be accessed as object attributes. For example:

>>> from ecnet.utils.data_utils import DataFrame
>>> df = DataFrame('my_database.csv')
>>> first_entry = df.data_points[0]
>>> print(first_entry.SMILES)  # SMILES is a STRING column in the supplied database
C
>>> print(getattr(first_entry, 'Compound Name')  # STRINGs with spaces are obtained like this
Methane

Additional STRING columns can be supplied when creating an ECNet-formatted database
Fixed issue where YAML package was throwing a loader warning
Suppressed TensorFlow warnings about deprecation
Updates to documentation
Other minor changes

Assets 2

27 Mar 19:02

tjkessler

3.0.1

f92b742

Bug fixes, database creation improvements

Added "set_spawn_method", fixes multiprocessing on Unix systems
Databases can now be constructed with fingerprints instead of descriptors
"get_smiles" function now returns an empty string if the molecule is not found on PubChem
Slight updates to logging
Hyperparameter tuning bug fix

Assets 2

25 Mar 16:36

tjkessler

3.0.0

36848dc

3.0.0 Release

Server object refactor
- Includes API changes
Update to ML back end (raw TensorFlow -> Keras)
Logging moved to separate module
Input descriptor limiting now uses random forest regression, via ditto-lib 1.0.0
Implemented ReadTheDocs page
Added classes for parity plot generation
Updated hyperparameter tuning for ECabc 2.2.2 release
Implemented methods for removing outliers, via ditto-lib 1.0.0

Assets 2

07 Feb 21:16

tjkessler

2.1.1

0691961

Bug fixes, GA improvements, data sorting options, optimizations

Updated parameter limiting with GA, per 0.6.0 PyGenetics update
Fixed bug with MultilayerPerceptron returning "NaN" values
Changed default parameter bounds for ABC tuning
Added "sort_string" argument for Server.import_data
Added "Getting Started" tutorial for new users

Assets 2

28 Jan 02:54

tjkessler

2.1.0

7360c08

Tool integrations, database additions

Integrated various tools:
- Database creation tool (wrappers for Open Babel, PaDEL-Descriptor)
- Using project tool (supply text w/ molecules, ECNet .prj file)
- Get SMILES from molecule name (PubChemPy)
- Convert SMILES to MDL/SDF
- Convert MDL/SDF to QSPR descriptors
Added unit tests for database creation tool, using project tool
Removed command line tools (integrated, above)
Added various databases:
- Cloud point
- Pour point
- Yield sooting index

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.) The following conversions have been removed from ECNet:

2.) PaDEL-Descriptor is no longer bundled into ECNet

3.) Database creation functions now rely on two separate packages:

4.) ecnet.tools.database.create_db's arguments have been changed:

5.) ecnet.tools.project.predict's arguments have been changed:

6.) ecnet.Server has been rearranged a bit:

7.) Added a suite of unit tests implemented with the unittest library:

8.) Installation now forces TensorFlow 1.13.1 to be installed

9.) Changed/added a variety of databases to the /databases/ directory

Releases: ecrl/ecnet

Bug fixes, enhancements

Better MLP validation, moved multiprocessing checks

Removal of conversion functions, slight Server rework

1.) The following conversions have been removed from ECNet:

2.) PaDEL-Descriptor is no longer bundled into ECNet

3.) Database creation functions now rely on two separate packages:

4.) ecnet.tools.database.create_db's arguments have been changed:

5.) ecnet.tools.project.predict's arguments have been changed:

6.) ecnet.Server has been rearranged a bit:

7.) Added a suite of unit tests implemented with the unittest library:

8.) Installation now forces TensorFlow 1.13.1 to be installed

9.) Changed/added a variety of databases to the /databases/ directory

Type checking, improved unit testing

Addition to conversion tools, update to database creation function

Updates to DataFrame, DataPoint classes and their functionality

Bug fixes, database creation improvements

3.0.0 Release

Bug fixes, GA improvements, data sorting options, optimizations

Tool integrations, database additions