vector_model
:- Add option to provide sample
weights
when training aVectorModel
, adjusting all subclasses accordingly. Models that do not support weighting will log a warning if weights are specified. - Remove unnecessary intermediate base class
VectorModelFittableBase
- Add helper function
get_predicted_var_name
- Add option to provide sample
- Add extra
xgboost
on PyPI. sensAI supports a wide range of XGBoost versions (dating back to 2020), but with the extra, we opted to use 1.7 as a lower bound, as compatibility with this version is well-tested. util
:util.version
: Add methodsVersion.is_at_most
andVersion.is_equal
util.logging
:add_memory_logger
now returns the logger instance, which can be queried to retrieve the log (see breaking change below)- Add class
MemoryLoggerContext
, which be used in conjunction with Python'swith
statement to record logs - Allow to control 'append' mode in
add_file_logger
andFileLoggerContext
util.pickle
:- Add class
PersistableObject
as a marker for classes that can be persisted via pickle. This is useful for classes which initially have no state but may have state in the future. Note that if a stateless class is unpickled, it will not call__setstate__
upon unpickling, thus making it impossible to add required state if it has been refactored to have state. setstate
: Allowrenamed_properties
parameter to alternatively accept a tuple providing the new name and a function computing the new value
- Add class
util.cache
: Add classLRUCache
as a simple least-recently-used (LRU) cache implementation implementing theKeyValueCache
interfaceutil.io
:- Add util functions for path creation:
create_path
,create_dir_path
,create_file_path
- Add util functions for path creation:
util.pandas
:- Add
SeriesInterpolation
abstraction for the interpolation ofpd.Series
objects- Method
interpolate_all_with_combined_index
allows to bring multiple series into a common index, filling in missing values in each series via interpolation - Implementation
SeriesInterpolationRepeatPreceding
(to fill gaps by repeating the last value) - Implementation
SeriesInterpolationLinearIndex
(to interpolate linearly based on an index)
- Method
- Add function
average_series
to compute the average of multiple series based on interpolation - Add function
query_data_frame
to support SQL-like queries viaduckdb
(see changes pertaining toResultSet
)
- Add
util.plot
:- Add
AverageSeriesLinePlot
ScatterPlot
: add optionadd_diagonal
- Add
util.helper
:- Add function
contains_any
- Add function
evaluation
:- Introduce
ResultSet
to support interactive querying and analysis of prediction results- Specialised for regression via
RegressionResultSet
; can be created from aVectorRegressionModelEvaluationData
object via new methodcreate_result_set
- Supports filtering based on
duckdb
using SQL queries (optional dependency; tested with v0.10.1)
- Specialised for regression via
- Support weighted data points ...
- in
RegressionEvalStats
(including the heat map plot generation) - in all applicable
RegressionMetric
subclasses (to support this, implementations were partly switched to sklearn-based implementations which already support weighting). - in
RegressionEvalStatsPlotHeatmapGroundTruthPredictions
- but NOT yet for classification evaluation.
- in
EvaluationResultCollector
: Add methodis_plot_creation_enabled
VectorRegressionModelEvaluationData
: Add methodscreate_result_set
andto_data_frame
- Introduce
data
:InputOutputData
:- Add method
to_data_frame
and aliasto_df
- Add method
- Add module
data.dataset
containing sample datasets (mainly for demonstration purposes) - Add abstraction
DataPointWeighting
, reifying the data point weighting process (which is now supported inVectorModel
; see above)- Add specialisation
DataPointWeightingRegressionTargetIntervalTotalWeight
(which allows to apply a total weight to intervals in the regression target's range, distributing the weight of data points in respective intervals accordingly)
- Add specialisation
tracking
:mlflow_tracking
: Optionadd_log_to_all_contexts
now stores only the logs of each model's training process (instead of the entire process beginning with the instantiation of the experiment)
util.logging
: Changeadd_memory_logger
to no longer define a global logger, but return the handler (an instance of
MemoryStramHandler
) instead. Consequently removed methodget_memory_log
as it is no longer needed (use the handler's methodget_log
instead).
evaluation
:ModelEvaluation
(and subclasses): Fix plots being shown if noResultWriter
is used even thoughshow_plots=False
vector_model
:VectorModel
: Fix data frame transformers not appearing in string representations
data_transformation
:DFTOneHotEncoder
: Fitting failed in the presence of missing values
util
- Minimise required dependencies for all modules in this package in preparation of the release of sensAI-utils
util.logging
:- Fix type annotations of
run_main
andrun_cli
- Fix type annotations of
util.cache
:- Add new base class
KeyValueCache
alongsidePersistentKeyValueCache
- Add
InMemoryKeyValueCache
PickleCached
- Rename to
pickle_cached
, keeping old name as alias - Change implementation to use nested functions instead of a class to improve IDE support
- Auto-create the storage directory if it does not yet exist
- Rename to
- Support
cloudpickle
as a backend
- Add new base class
columngen
:ColumnGenerator
: add methodto_feature_generator
evaluation
:MultiDataEvaluation
: Add option to supply test data (without using splitting)VectorRegressionModelEvaluator
: Handle output column mismatch between model output and ground truth for the case where there is only a single column, avoiding the exception and issuing a warning instead
dft
:DFTNormalisation.RuleTemplate
: Add attributesfit
andarray_valued
util.deprecation
: Applyfunctools.wrap
to retain meta-data of wrapped functionutil.logging
:- Support multiple configuration callbacks in
set_configure_callback
- Add line number to default format (
LOG_DEFAULT_FORMAT
) - Add function
is_enabled
to check whether a log handler is registered - Add context manager
LoggingDisabledContext
to temporarily disable logging - Add
FallbackHandler
to support logging to a fallback destination (if no other handlers are defined)
- Support multiple configuration callbacks in
util.io
:ResultWriter
:- Allow to disable an instance such that no results are written (constructor parameter
enabled
) - Add default configuration for closing figures after writing them (constructor parameter
close_figures
) write_image
: Improve layout in written images by settingbbox_inches='tight'
- Allow to disable an instance such that no results are written (constructor parameter
vectoriser
:SequenceVectoriser
:- Allow to inject a sequence item identifier provider
(instance of new class
ItemIdentifierProvider
) in order to determine the set of relevant unique items when using fitting mode UNIQUE - Allow sharing of vectorisers between instances such that a previously fitted vectoriser can be reused in its fitted state, which can be particularly useful for encoder-decoder settings where the decoding stage uses some of the same features (vectorisers) as the encoding stage.
- Allow to inject a sequence item identifier provider
(instance of new class
- Make Vectorisers aware of their 'fitted' status.
torch
:TorchVectorRegressionModel
: Add support for auto-regressive predictions by adding classTorchAutoregressiveResultHandler
and methodwith_autogressive_result_handler
LSTNetwork
:- Add new mode 'encoder', where the output of the complex path prior to the dense layer is returned
- Changed constructor interface to comply with PEP-8
- Add package
seq
for encoder-decoder-style sequence models, adding the highly flexible vector model implementationEncoderDecoderVectorRegressionModel
and a multitude of low-level encoder and decoder modules
data
:- Add
DataFrameSplitterColumnEquivalenceClass
, which splits a data frame based on equivalence classes of a given column
- Add
evaluation
:ModelEvaluation
(and derived classes): Support direct specification of the test data
(previously only indirect specification via a splitter was supported)
GridSearch
: Change return value to a result object for convenient retrieval
TagBuilder
: Fix return value ofwith_component
ModelEvaluation
:create_plots
did not track plots with given tracking context ifshow_plots
=False andresult_writer
=None.ParametersMetricsCollection
:csv_path
could not be NoneLSTNetworkVectorClassificationModel
is now functional in v1, improving the representation (no more dictionaries). This breaks compatibility with sensAI v0.x representations of this class.
tracking
:- Improve (under-the-hood) tracking interfaces, introducing the concept of a tracking
context (class
TrackingContext
, which is typically model-specific) in addition to the more high-level 'experiment' concept - Full support for cross-validation
- Adapt & improve MLflow tracking implementation
- Improve (under-the-hood) tracking interfaces, introducing the concept of a tracking
context (class
util.datastruct
:SortedKeysAndValues
,SortedKeyValuePairs
: Add__len__
featuregen
:FeatureCollector
: Add factory methods for the generation of DFTNormalisation and DFTOneHotEncoder instances (for convenience)FeatureGeneratorRegistry
:- Improve type annotation of singleton dictionary
- Add convenience method
collect_features
, which creates a FeatureCollector
util.io
:write_data_frame_csv_file
: Add optionsindex
andheader
util.pickle
:dump_pickle
,load_pickle
:PickleLoadSaveMixin
: Support passingPath
objects
vector_model
:- Pre-processors are now included in models string representations by default
torch
:TorchVector*Model
: Improve type hints for with* methods
evaluation
:MultiDataModelEvaluation
(previouslyMultiDataEvaluationUtil
):- Add model description/string representation to result object
- Add class
CrossValidationSplitterNested
(for nested cross-validation) ModelComparisonData.Result
: Add methoditer_evaluation_data
feature_selection
:- Add
RecursiveFeatureElimination
(to complement existing CV-based implementation)
- Add
util.string
:- Add class
TagBuilder
(for generation of dataset/experiment tags/identifiers)
- Add class
util.logging
:- Add in-memory logging (
add_memory_logger
,get_memory_log
) - Reuse configured log format (if any) for both file & in-memory loggers
- Add functions
run_main
andrun_cli
for convenient setup - Add
set_configure_callback
for third-party usage ofconfigure
, allowing users to add additional configuration via a callback - Add
remove_log_handler
- Add
FileLoggerContext
for file-based logging within awith
-block
- Add in-memory logging (
- Refactoring:
- Module
featuregen
is now a package with modulesfeature_generator
(all feature generators)feature_generator_registry
(registry and feature collector)
- Module
- Testing:
- Add test for typical usage of
FeatureCollector
in conjunction withFeatureGeneratorRegistry
- Add test for typical usage of
-
Changed all camel case interfaces (methods and parameters) as well as local variables to use snake case in order to align more closely with PEP 8.
This breaks source-level compatibility with earlier v0 releases. However, persisted objects from earlier versions should still be loadable, as attribute names in classes that may have been persisted remain in camel case. Strictly speaking, PEP 8 makes no statement about the format of attribute names, so there is not really a violation anyway.
-
Removed some deprecated interfaces (particularly support for the kwargs/dict interface in parallel to parameter objects in evaluators)
-
TorchVector*Model
: Changed construction of containedTorchModel
to a no-args factory (i.e. support formodelArgs
andmodelKwArgs
dropped). The new mechanism is both simpler and does not encourage usage patterns where correct construction cannot be statically checked (in contrast to the old mechanism). The new mechanisms encourages the implementation of dedicated factory methods (but could be abused withfunctools.partial
, of course). -
FeatureGeneratorRegistry
: Removed support for discouraged mechanism of setting/getting feature generator factories via__setattr__
/__getattr__
-
NNOptimiserParams
: Do not use kwargs for parameters to be passed on to the underlying optimiser, use dictoptimiser_args
instead -
MultiDataModelEvaluation
(previouslyMultiDataEvaluationUtil
):- Moved evaluator and cross-validator params to constructor
- Removed deprecated method
compare_models_cross_validation
-
RegressionEvalStats
: Rename methods using inappropriate prefixget
(nowcompute
) -
Renamed high-level evaluation classes:
RegressionEvalUtil
renamed toRegressionModelEvaluation
ClassificationEvalUtil
renamed toClassificationModelEvaluation
MultiDataEvaluationUtil
renamed toMultiDataModelEvaluation
Vector*ModelEvaluatorParams
->*EvaluatorParams
-
Changed default parameters of
SkLearnDecisionTreeVectorClassificationModel
andSkLearnRandomForestVectorClassificationModel
to align with sklearn defaults
ToStringMixin
: Prevent infinite recursion for case where ToStringMixin references a bound method of itselfTorchVectorModels
: Dropped support for model kwargs in constructorMultiDataModelEvaluation
(previouslyMultiDataEvaluationUtil
):- dataset key column was not removed prior to mean computation (would fail if value is non-numeric)
- Combined eval stats were not logged
EvalStatsClassification
: Do not attempt to create precision/recall plots if class probabilities are unavailable
Final pre-release (primarily for internal use at jambit GmbH and appliedAI Initiative GmbH)
- v0.1.9 (2022-07-20)
- v0.1.8 (2022-07-01)
- v0.1.7 (2022-02-22)
- v0.1.6 (2021-07-16)
- v0.1.5 (2021-06-22)
- v0.1.4 (2021-06-21)
- v0.1.1 (2021-06-01)
- v0.1.0 (2021-05-25)
- v0.0.8 (2021-02-18)
- v0.0.4 (2020-10-16)
- v0.0.1 (2020-02-20)