Tags · oracle/tribuo

v4.3.1

Tribuo v4.3.1 Release Notes

Small patch release to bump some dependencies and pull in minor fixes. The most
notable fix allows CART trees to generate pure nodes, which previously they had
been prevented from doing. This will likely improve the classification tree
performance both for single trees and when used in an ensemble like
RandomForests.

- FeatureHasher should have an option to not hash the values, and TokenPipeline should default to not hashing the values ([#309](#309)).
- Improving the documentation for loading multi-label data with CSVLoader ([#306](#306)).
- Allows Example.densify to add arbitrary features ([#304](#304)).
- Adds accessors to ClassifierChainModel and IndependentMultiLabelModel so the individual models can be accessed ([#302](#302)).
- Allows CART trees to create pure leaves ([#303](#303)).
- Bumping jackson-core to 2.14.1, jackson-databind to 2.14.1, OpenCSV to 5.7.1 (pulling in the fixed commons-text 1.10.0).

Contributors

- Adam Pocock ([@Craigacp](https://github.com/Craigacp))
- Jeffrey Alexander ([@jhalexand](https://github.com/jhalexand))
- Jack Sullivan ([@JackSullivan](https://github.com/JackSullivan))
- Philip Ogren ([@pogren](https://github.com/pogren))

Dec 23, 2022
aa29623
zip
tar.gz
Notes

v4.2.2

Tribuo v4.2.2 Release Notes

Small patch release to bump some dependencies and pull in minor fixes:

- Validate hash salt during object creation ([#237](#237)).
- Fix XGBoost parameter overriding ([#239](#239)).
- Add some necessary accessors to TransformedModel ([#244](#244)).
- Bumping TF-Java to v0.4.2 ([#281](#281)).
- Fixes for test failures when running in a path with spaces in ([#287](#287)).
- Fix documentation links to the OCA.
- Bumping jackson-core to 2.13.4, jackson-databind to 2.13.4.2, protobuf-java to 3.19.6, OpenCSV to 5.7.1 (pulling in the fixed commons-text 1.10.0).

Contributors

- Adam Pocock ([@Craigacp](https://github.com/Craigacp))
- Geoff Stewart ([@geoffreydstewart](https://github.com/geoffreydstewart))
- Jeffrey Alexander ([@jhalexand](https://github.com/jhalexand))
- Jack Sullivan ([@JackSullivan](https://github.com/JackSullivan))
- Philip Ogren ([@pogren](https://github.com/pogren))

Oct 25, 2022
7f45e73
zip
tar.gz
Notes

v4.3.0

Tribuo 4.3.0

Tribuo v4.3 adds feature selection for classification problems, support for
guided generation of model cards, and protobuf serialization for all
serializable classes.  In addition there is a new interface for distance based
computations which can now use a kd-tree or brute force comparisons, the sparse
linear model package has been rewritten to use Tribuo's linear algebra system
improving the speed and reducing memory consumption, and we've added some more
tutorials.

Note this is likely the last feature release of Tribuo to support Java 8. The
next major version of Tribuo will require Java 17. In addition, support for
using `java.io.Serializable` for serialization will be removed in the next
major release, and Tribuo will exclusively use protobuf based serialization.

Feature Selection

In this release we've added support for feature selection algorithms to the
dataset and provenance systems, along with implementations of 4 information
theoretic feature selection algorithms for use in classification problems.  The
algorithms (MIM, CMIM, mRMR and JMI) are described in this [review
paper](https://jmlr.org/papers/v13/brown12a.html). Continuous inputs are
discretised into a fixed number of equal width bins before the mutual
information is computed. These algorithms are a useful feature selection
baseline, and we welcome contributions to extend the set of supported
algorithms.

- Feature selection algorithms [#254](#254).

Model Card Support

[Model Cards](https://dl.acm.org/doi/10.1145/3287560.3287596) are a popular way
of describing a model, its training data, expected applications and any use
cases that should be avoided. In this release we've added guided generation of
model cards, where many fields are automatically generated from the provenance
information inside each Tribuo model. Fields which require user input (such as
the expected use cases for a model, or its license) can be added via a CLI
program, and the resulting model card can be saved in json format.

At the moment, the automatic data extraction fails on some kinds of nested
ensemble models which are generated without using a Tribuo `Trainer` class,
in the future we'll look at improving the data extraction for this case.

- Model card infrastructure ([#243](#243), [#250](#250), [#253](#253)).

Protobuf Serialization

In this release we've added [protocol
buffer](https://developers.google.com/protocol-buffers) definitions for
serializing all of Tribuo's serializable types, along with the necessary code
to interact with those definitions. This effort has improved the validation of
serialized data, and will allow Tribuo models to be upwards compatible across
major versions of Tribuo. Any serialized model or dataset from Tribuo v4.2 or
earlier can be loaded in and saved out into the new format which will ensure
compatibility with the next major version of Tribuo.

- Protobuf support for core types ([#226](#226), [#255](#255), [#262](#262), [#264](#264)).
- Protobuf support for models (Multinomial Naive Bayes [#267](#267), Sparse linear models [#269](#269), XGBoost [#270](#270), OCI, ONNX and TF [#271](#271), LibSVM [#272](#272), LibLinear [#273](#273), SGD [#275](#275), Clustering models [#276](#276), Baseline models and ensembles [#277](#277), Trees [#278](#278)).
- Docs and supporting programs ([#279](#279)).

Smaller improvements

We added an interface for querying the nearest neighbours of a vector, and
updated HDBSCAN, K-Means and K-NN to use the new interface. The old
implementation has been renamed the "brute force" search operator, and a new
implementation which uses a kd-tree has been added.

- Distance refactor ([#213](#213), [#216](#216), [#221](#221), [#231](#231), [#285](#285)).

We migrated off Apache Commons Math, which necessitated adding several methods
to Tribuo's math library. In the process we refactored the sparse linear model
code, removing redundant matrix operations and greatly improving the speed of
LASSO.

- Refactor sparse linear models and remove Apache Commons Math ([#241](#241)).

The ONNX export support has been refactored to allow the use of different ONNX
opsets, and custom ONNX operations. This allows users of Tribuo's ONNX export
support to supply their own operations, and increases the flexibility of the
ONNX support on the JVM.

- ONNX operator refactor ([#245](#245)).

ONNX Runtime has been upgraded to v1.12.1, which includes Linux ARM64 and macOS
ARM64 binaries. As a result we've removed the ONNX tests from the arm Maven
profile, and so those tests will execute on Linux & macOS ARM64 platforms.

- ONNX Runtime upgrade ([#256](#256)).

Small improvements

- Improved the assignment to the noise cluster in HDBSCAN ([#222](#222)).
- Upgrade liblinear-java to v2.44 ([#228](#228)).
- Added accessors for the HDBSCAN cluster exemplars ([#229](#229)).
- Improve validation of salts when hashing feature names ([#237](#237)).
- Added accessors to TransformedModel for the wrapped model ([#244](#244)).
- Added a regex text preprocessor ([#247](#247)).
- Upgrade OpenCSV to v5.6 ([#259](#259)).
- Added a builder to RowProcessor to make it less confusing ([#263](#263)).
- Upgrade TF-Java to v0.4.2 ([#281](#281)).
- Upgrade OCI Java SDK to v2.46.0, protobuf-java to 3.19.6, XGBoost to 1.6.2, jackson to 2.14.0-rc1 ([#288](#288)).

Bug Fixes

- Fix for HDBSCAN small cluster generation ([#236](#236)).
- XGBoost provenance capture ([#239](#239).

Contributors

- Adam Pocock ([@Craigacp](https://github.com/Craigacp))
- Jack Sullivan ([@JackSullivan](https://github.com/JackSullivan))
- Romina Mahinpei ([@rmahinpei](https://github.com/rmahinpei))
- Philip Ogren ([@pogren](https://github.com/pogren))
- Katie Younglove ([@katieyounglove](https://github.com/katieyounglove))
- Jeffrey Alexander ([@jhalexand](https://github.com/jhalexand))
- Geoff Stewart ([@geoffreydstewart](https://github.com/geoffreydstewart))

Oct 7, 2022
b8ba451
zip
tar.gz
Notes

v4.2.1

Tribuo v4.2.1

Small patch release for three issues:

- Ensure K-Means thread pools shut down when training completes ([#224](#224))
- Fix issues where ONNX export of ensembles, K-Means initialization and several tests relied upon HashSet iteration order ([#220](https://github.com/oracle/tribuo/pull/220),[#225](https://github.com/oracle/tribuo/pull/225))
- Upgrade to TF-Java 0.4.1 which includes an upgrade to TF 2.7.1 which brings in several fixes for native crashes operating on malformed or malicious models ([#228](#227))

OLCUT is updated to 5.2.1 to pull in updated versions of jackson & protobuf ([#234](#234)). Also includes some docs and a small update for K-Means' `toString` ([#209](#209), [#211](#211), [#212](#212)).

Contributors

- Adam Pocock ([@Craigacp](https://github.com/Craigacp))
- Geoff Stewart ([@geoffreydstewart](https://github.com/geoffreydstewart))
- Yaliang Wu ([@ylwu-amzn](https://github.com/ylwu-amzn))
- Kaiyao Ke ([@kaiyaok2](https://github.com/kaiyaok2))

May 3, 2022
d510846
zip
tar.gz
Notes

v4.2.0

Tribuo v4.2.0

Tribuo 4.2 adds new models, ONNX export for several types of models, a
reproducibility framework for recreating Tribuo models, easy deployment of
Tribuo models on Oracle Cloud, along with several smaller improvements and bug
fixes. We've added more tutorials covering the new features along with
multi-label classification, and further expanded the javadoc to cover all
public methods.

In Tribuo 4.1.0 and earlier there is a severe bug in multi-dimensional
regression models (i.e., regression tasks with multiple output dimensions).
Models other than `LinearSGDModel` and `SparseLinearModel` (apart from when
using the `ElasticNetCDTrainer`) have a bug in how the output dimension indices
are constructed, and may produce incorrect outputs for all dimensions (as the
output will be for a different dimension than the one named in the `Regressor`
object). This has been fixed, and loading in models trained in earlier versions
of Tribuo will patch the model to rearrange the dimensions appropriately.
Unfortunately this fix cannot be applied to tree based models, and so all
multi-output regression tree based models should be retrained using Tribuo 4.2
as they are irretrievably corrupt. Additionally when using standardization in
multi-output regression LibSVM models dimensions past the first dimension have
the model improperly stored and will also need to be retrained with Tribuo 4.2.
See [#177](#177) for more details.

Note the KMeans implementation had several internal changes to support running
with a `java.lang.SecurityManager` which will break any subclasses of `KMeansTrainer`.
In most cases changing the signature of any overridden `mStep` method to match
the new signature, and allowing the `fjp` argument to be null in single threaded
execution will fix the subclass.

New models

In this release we've added [Factorization
Machines](https://www.computer.org/csdl/proceedings-article/icdm/2010/4256a995/12OmNwMFMfl),
[Classifier
Chains](https://link.springer.com/content/pdf/10.1007/s10994-011-5256-5.pdf)
and
[HDBSCAN\*](https://link.springer.com/chapter/10.1007/978-3-642-37456-2_14).
Factorization machines are a powerful non-linear predictor which uses a
factorized approximation to learn a per output feature-feature interaction term
in addition to a linear model. We've added Factorization Machines for
multi-class classification, multi-label classification and regression.
Classifier chains are an ensemble approach to multi-label classification which
given a specific ordering of the labels learns a chain of classifiers where
each classifier gets the features along with the predicted labels from earlier
in the chain. We also added ensembles of randomly ordered classifier chains
which work well in situations when the ground truth label ordering is unknown
(i.e., most of the time).  HDBSCAN is a hierarchical density based clustering
algorithm which chooses the number of clusters based on properties of the data
rather than as a hyperparameter. The Tribuo implementation can cluster a
dataset, and then at prediction time it provides the cluster the given
datapoint would be in without modifying the cluster structure.

- Classifier Chains ([#149](#149)), which
  also adds the jaccard score as a multi-label evaluation metric, and a
multi-label voting combiner for use in multi-label ensembles.
- Factorization machines ([#179](#179)).
- HDBSCAN ([#196](#196)).

ONNX Export

The [ONNX](https://onnx.ai) format is a cross-platform and cross-library model
exchange format. Tribuo can already serve ONNX models via its [ONNX
Runtime](https://onnxruntime.ai) interface, and now has the ability to export
models in ONNX format for serving on edge devices, in cloud services, or in
other languages like Python or C#.

In this release Tribuo supports exporting linear models (multi-class
classification, multi-label classification and regression), sparse linear
regression models, factorization machines (multi-class classification,
multi-label classification and regression), LibLinear models (multi-class
classification and regression), LibSVM models (multi-class classification and
regression), along with ensembles of those models, including arbitrary levels
of ensemble nesting. We plan to expand this coverage to more models over time,
however for TensorFlow we recommend users export those models as a Saved Model
and use the Python tf2onnx converter.

Tribuo models exported in ONNX format preserve their provenance information in
a metadata field which is accessible when the ONNX model is loaded back into
Tribuo. The provenance is stored as a protobuf so could be read from other
libraries or platforms if necessary.

The ONNX export support is in a separate module with no dependencies, and could
be used elsewhere on the JVM to support generating ONNX graphs. We welcome
contributions to build out the ONNX support in that module.

- ONNX export for LinearSGDModels
  ([#154](#154)), which also adds a
multi-label output transformer for scoring multi-label ONNX models.
- ONNX export for SparseLinearModel ([#163](#163)).
- Add provenance to ONNX exported models ([#182](#182)).
- Refactor ONNX tensor creation ([#187](#187)).
- ONNX ensemble export support ([#186](#186)).
- ONNX export for LibSVM and LibLinear ([#191](#191)).
- Refactor ONNX support to improve type safety ([#199](#199)).
- Extract ONNX support into separate module ([#TBD](https://github.com/oracle/tribuo/pull/)).

Reproducibility Framework

Tribuo has strong model metadata support via its provenance system which
records how models, datasets and evaluations are created. In this release we
enhance this support by adding a push-button reproduction framework which
accepts either a model provenance or a model object and rebuilds the complete
training pipeline, ensuring consistent usage of RNGs and other mutable state.

This allows Tribuo to easily rebuild models to see if updated datasets could
change performance, or even if the model is actually reproducible (which may be
required for regulatory reasons).  Over time we hope to expand this support
into a full experimental framework, allowing models to be rebuilt with
hyperparameter or data changes as part of the data science process or for
debugging models in production.

This framework was written by Joseph Wonsil and Prof. Margo Seltzer at the
University of British Columbia as part of a collaboration between Prof. Seltzer
and Oracle Labs. We're excited to continue working with Joe, Margo and the rest
of the lab at UBC, as this is excellent work.

Note the reproducibility framework module requires Java 16 or greater, and is
thus not included in the `tribuo-all` meta-module.

- Reproducibility framework ([#185](#185), with minor changes in [#189](#189) and [#190](#190)).

OCI Data Science Integration

[Oracle Cloud Data
Science](https://www.oracle.com/data-science/cloud-infrastructure-data-science.html)
is a platform for building and deploying models in Oracle Cloud.  The model
deployment functionality wraps a Python runtime and deploys them with an
auto-scaler at a REST endpoint. In this release we've added support for
deploying Tribuo models which are ONNX exportable directly to OCI DS, allowing
scale-out deployments of models from the JVM. We also added a `OCIModel`
wrapper which scores Tribuo `Example` objects using a deployed model's REST
endpoint, allowing easy use of cloud resources for ML on the JVM.

- Oracle Cloud Data Science integration ([#200](#200)).

Small improvements

- Date field processor and locale support in metadata extractors ([#148](#148))
- Multi-output response processor allowing loading different formats of multi-label and multi-dimensional regression datasets ([#150](#150))
- ARM dev profile for compiling Tribuo on ARM platforms ([#152](#152))
- Refactor CSVLoader so it uses CSVDataSource and parses CSV files using RowProcessor, allowing an easy transition to more complex columnar extraction ([#153](#153))
- Configurable anomaly demo data source ([#160](#160))
- Configurable clustering demo data source ([#161](#161))
- Configurable classification demo data source ([#162](#162))
- Multi-Label tutorial and configurable multi-label demo data source ([#166](#166)) (also adds a multi-label tutorial) plus fix in [#168](#168) after #167
- Add javadoc for all public methods and fields ([#175](#175)) (also fixes a bug in Util.vectorNorm)
- Add hooks for model equality checks to trees and LibSVM models ([#183](#183)) (also fixes a bug in liblinear get top features)
- XGBoost 1.5.0 ([#192](#192))
- TensorFlow Java 0.4.0 ([#195](#195)) (note this changes Tribuo's TF API slightly as TF-Java 0.4.0 has a different method of initializing the session)
- KMeans now uses dense vectors when appropriate, speeding up training ([#201](#201))
- Documentation updates, ONNX and reproducibility tutorials ([#205](#205))

Bug fixes

- NPE fix for LIME explanations using models which don't support per class weights ([#157](#157))
- Fixing a bug in multi-label evaluation which swapped FP for FN ([#167](#167))
- Persist CSVDataSource headers in the provenance ([#171](#171))
- Fixing LibSVM and LibLinear so they have reproducible behaviour ([#172](#172))
- Provenance fix for TransformTrainer and an extra factory for XGBoostExternalModel so you can make them from an in memory booster ([#176](#176))
- Fix multidimensional regression ([#177](#177)) (fixes regression ids, fixes libsvm so it emits correct standardized models, adds support for per dimension feature weights in XGBoostRegressionModel)
- Fix provenance generation for FieldResponseProcessor and BinaryResponseProcessor ([#178](#178))
- Normalize LibSVMDataSource paths consistently in the provenance ([#181](#181))
- KMeans and KNN now run correctly when using OpenSearch's SecurityManager ([#197](#197))

Contributors

- Adam Pocock ([@Craigacp](https://github.com/Craigacp))
- Jack Sullivan ([@JackSullivan](https://github.com/JackSullivan))
- Joseph Wonsil ([@jwons](https://github.com/jwons))
- Philip Ogren ([@pogren](https://github.com/pogren))
- Jeffrey Alexander ([@jhalexand](https://github.com/jhalexand))
- Geoff Stewart ([@geoffreydstewart](https://github.com/geoffreydstewart))

Dec 20, 2021
1c594dc
zip
tar.gz
Notes

v4.1.1

Tribuo v4.1.1

This is the first patch release for Tribuo v4.1. The main fixes in this release
are to the multi-dimensional output regression support, and to support the use
of KMeans and KNN models when running under a restrictive `SecurityManager`.
Additionally this release pulls in TensorFlow-Java 0.4.0 which upgrades the
TensorFlow native library to 2.7.0 fixing several CVEs. Note those CVEs may not
be applicable to TensorFlow-Java, as many of them relate to Python codepaths
which are not included in TensorFlow-Java. Note the TensorFlow upgrade is a
breaking API change as graph initialization is handled differently in this
release, which causes unavoidable changes in Tribuo's TF API.

Multi-dimensional Regression fix

In Tribuo 4.1.0 and earlier there is a severe bug in multi-dimensional
regression models (i.e., regression tasks with multiple output dimensions).
Models other than `LinearSGDModel` and `SparseLinearModel` (apart from when
using the `ElasticNetCDTrainer`) have a bug in how the output dimension indices
are constructed, and may produce incorrect outputs for all dimensions (as the
output will be for a different dimension than the one named in the `Regressor`
object). This has been fixed, and loading in models trained in earlier versions
of Tribuo will patch the model to rearrange the dimensions appropriately.
Unfortunately this fix cannot be applied to tree based models, and so all
multi-output regression tree based models should be retrained using Tribuo 4.2
as they are irretrievably corrupt. Additionally when using standardization in
multi-output regression LibSVM models dimensions past the first dimension have
the model improperly stored and will also need to be retrained with Tribuo 4.2.
See [#177](#177) for more details.

Bug fixes

- NPE fix for LIME explanations using models which don't support per class weights ([#157](#157)).
- Fixing a bug in multi-label evaluation which swapped FP for FN ([#167](#167)).
- Fixing LibSVM and LibLinear so they have reproducible behaviour ([#172](#172)).
- Provenance fix for TransformTrainer and an extra factory for XGBoostExternalModel so you can make them from an in memory booster ([#176](#176))
- Fix multidimensional regression ([#177](#177)) (fixes regression ids, fixes libsvm so it emits correct standardized models, adds support for per dimension feature weights in XGBoostRegressionModel).
- Normalize LibSVMDataSource paths consistently in the provenance ([#181](#181)).
- KMeans and KNN now run correctly when using OpenSearch's SecurityManager ([#197](#197)).
- TensorFlow-Java 0.4.0 ([#195](#195)).

Contributors

- Adam Pocock ([@Craigacp](https://github.com/Craigacp))
- Jack Sullivan ([@JackSullivan](https://github.com/JackSullivan))
- Philip Ogren ([@pogren](https://github.com/pogren))
- Jeffrey Alexander ([@jhalexand](https://github.com/jhalexand))

Dec 10, 2021
cd6236a
zip
tar.gz
Notes

v4.1.0

Tribuo 4.1 is the first feature release after the initial open source…

… release.

We've added new models, new parameters for some models, improvements to data
loading, documentation, transformations and the speed of our CRF and linear
models, along with a large update to the TensorFlow interface. We've also
revised the tutorials and added two new ones covering TensorFlow and document
classification.

Migrated to TensorFlow Java 0.3.1 which allows specification and training of
models in Java ([#134](#134)).  The
TensorFlow models can be saved in two formats, either using TensorFlow's
checkpoint format or Tribuo's native model serialization. They can also be
exported as TensorFlow Saved Models for interop with other TensorFlow
platforms. Tribuo can now load TF v2 Saved Models and serve them alongside TF
v1 frozen graphs with it's external model loader.

We also added a TensorFlow tutorial which walks through the creation of a
simple regression MLP, a classification MLP and a classification CNN, before
exporting the model as a TensorFlow Saved Model and importing it back into
Tribuo.

- Added extremely randomized trees, i.e., ExtraTrees ([#51](#51)).
- Added an SGD based linear model for multi-label classification ([#106](#106)).
- Added liblinear's linear SVM anomaly detector ([#114](#114)).
- Added arbitrary ensemble creation from existing models ([#129](#129)).

- Added K-Means++ ([#34](#34)).
- Added XGBoost feature importance metrics ([#52](#52)).
- Added OffsetDateTimeExtractor to the columnar data package ([#66](#66)).
- Added an empty response processor for use with clustering datasets ([#99](#99)).
- Added IDFTransformation for generating TF-IDF features ([#104](#104)).
- Exposed more parameters for XGBoost models ([#107](#107)).
- Added a Wordpiece tokenizer ([#111](#111)).
- Added optional output standardisation to LibSVM regressors ([#113](#113)).
- Added a BERT feature extractor for text data ([#116](#116)).
This can load in ONNX format BERT (and BERT style) models from HuggingFace Transformers, and use them as part of Tribuo's text feature extraction package.
- Added a configurable version of AggregateDataSource, and added iteration order parameters to both forms of AggregateDataSource ([#125](#125)).
- Added an option to RowProcessor which passes through newlines ([#137](#137)).

- Removed redundant computation in tree construction ([#63](#63)).
- Added better accessors for the centroids of a K-Means model ([#98](#98)).
- Improved the speed of the feature transformation infrastructure ([#104](#104)).
- Refactored the SGD models to reduce redundant code and allow models to share upcoming improvements ([#106](#106), [#134](#134)).
- Added many performance optimisations to the linear SGD and CRF models, allowing the automatic use of dense feature spaces ([#112](#112)). This also adds specialisations to the math library for dense vectors and matrices, improving the performance of the CRF model even when operating on sparse feature sets.
- Added provenance tracking of the Java version, OS and CPU architecture ([#115](#115)).
- Changed the behaviour of sparse features under transformations to expose additional behaviour ([#122](#122)).
- Improved `MultiLabelEvaluation.toString()` ([#136](#136)).
- Added a document classification tutorial which shows the various text feature extraction techniques available in Tribuo.
- Expanded javadoc coverage.
- Upgraded ONNX Runtime to 1.7.0, XGBoost to 1.4.1, TensorFlow to 0.3.1, liblinear-java to 2.43, OLCUT to 5.1.6, OpenCSV to 5.4.
- Miscellaneous small bug fixes.

- Adam Pocock ([@Craigacp](https://github.com/Craigacp))
- Philip Ogren ([@pogren](https://github.com/pogren))
- Jeffrey Alexander ([@jhalexand](https://github.com/jhalexand))
- Jack Sullivan ([@JackSullivan](https://github.com/JackSullivan))
- Samantha Campo ([@samanthacampo](https://github.com/samanthacampo))
- Luke Nezda ([@nezda](https://github.com/nezda))
- Mani Sarkar ([@neomatrix369](https://github.com/neomatrix369))
- Stephen Green ([@eelstretching](https://github.com/eelstretching))
- Kate Silverstein ([@k8si](https://github.com/k8si))

May 26, 2021
1fad812
zip
tar.gz
Notes

v4.0.2

Tribuo v4.0.2

Bugs fixed:
- Fixed a locale issue in the evaluation tests.
- Fixed issues with RowProcessor (expand regexes not being called,
  improper provenance capture).
- IDXDataSource now throws FileNotFoundException rather than a mysterious NPE when it can't find the file.
- Fixed issues in JsonDataSource (consistent exceptions thrown, proper
  termination of reading in several cases).
- Fixed an issue where regression models couldn't be serialized due to a
  non-serializable lambda.
- Fixed UTF-8 BOM issues in CSV loading.
- Fixed an issue where LibSVMTrainer didn't track state between repeated
  calls to train.
- Fixed issues in the evaluators to ensure consistent exception throwing
  when discovering unlabelled or unknown ground truth outputs.
- Fixed a bug in ONNX LabelTransformer where it wouldn't read
  pytorch outputs properly.
- Bumped to OLCUT 5.1.5 to fix a provenance -> configuration conversion
  issue.

New additions:
- Added a method which converts a Jackson ObjectNode into a Map suitable
  for the RowProcessor.
- Added missing serialization tests to all the models.
- Added a getInnerModels method to LibSVMModel, LibLinearModel and
  XGBoostModel to allow users to access a copy of the internal models.
- More documentation.
- Columnar data loading tutorial.
- External model (XGBoost & ONNX) tutorial.

Dependency updates:
- OLCUT 5.1.5 (brings in jline 3.16.0 and jackson 2.11.3).

Nov 5, 2020
51636af
zip
tar.gz
Notes

v4.0.1

Tribuo v4.0.1

- Fixes an issue where CSVReader would fail to read csv files with
  extraneous newlines at the end of the file.
- Adds an IDXDataSource which reads IDX (i.e. MNIST) formatted data.

Sep 1, 2020
4a7361d
zip
tar.gz
Notes

v4.0.0

Tribuo v4.0.0

- Initial public release

Tribuo is a machine learning library in Java that provides multi-class
classification, regression, clustering, anomaly detection and
multi-label classification. Tribuo provides implementations of popular
ML algorithms and also wraps other libraries to provide a unified
interface. Tribuo contains all the code necessary to load, featurise and
transform data. Additionally, it includes the evaluation classes for all
supported prediction types.

Aug 13, 2020
42694f8
zip
tar.gz
Notes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.3.1

v4.2.2

v4.3.0

v4.2.1

v4.2.0

v4.1.1

v4.1.0

v4.0.2

v4.0.1

v4.0.0

Tags: oracle/tribuo