Tags: oracle/tribuo
Tags
Tribuo v4.3.1 Release Notes Small patch release to bump some dependencies and pull in minor fixes. The most notable fix allows CART trees to generate pure nodes, which previously they had been prevented from doing. This will likely improve the classification tree performance both for single trees and when used in an ensemble like RandomForests. - FeatureHasher should have an option to not hash the values, and TokenPipeline should default to not hashing the values ([#309](#309)). - Improving the documentation for loading multi-label data with CSVLoader ([#306](#306)). - Allows Example.densify to add arbitrary features ([#304](#304)). - Adds accessors to ClassifierChainModel and IndependentMultiLabelModel so the individual models can be accessed ([#302](#302)). - Allows CART trees to create pure leaves ([#303](#303)). - Bumping jackson-core to 2.14.1, jackson-databind to 2.14.1, OpenCSV to 5.7.1 (pulling in the fixed commons-text 1.10.0). Contributors - Adam Pocock ([@Craigacp](https://github.com/Craigacp)) - Jeffrey Alexander ([@jhalexand](https://github.com/jhalexand)) - Jack Sullivan ([@JackSullivan](https://github.com/JackSullivan)) - Philip Ogren ([@pogren](https://github.com/pogren))
Tribuo v4.2.2 Release Notes Small patch release to bump some dependencies and pull in minor fixes: - Validate hash salt during object creation ([#237](#237)). - Fix XGBoost parameter overriding ([#239](#239)). - Add some necessary accessors to TransformedModel ([#244](#244)). - Bumping TF-Java to v0.4.2 ([#281](#281)). - Fixes for test failures when running in a path with spaces in ([#287](#287)). - Fix documentation links to the OCA. - Bumping jackson-core to 2.13.4, jackson-databind to 2.13.4.2, protobuf-java to 3.19.6, OpenCSV to 5.7.1 (pulling in the fixed commons-text 1.10.0). Contributors - Adam Pocock ([@Craigacp](https://github.com/Craigacp)) - Geoff Stewart ([@geoffreydstewart](https://github.com/geoffreydstewart)) - Jeffrey Alexander ([@jhalexand](https://github.com/jhalexand)) - Jack Sullivan ([@JackSullivan](https://github.com/JackSullivan)) - Philip Ogren ([@pogren](https://github.com/pogren))
Tribuo 4.3.0 Tribuo v4.3 adds feature selection for classification problems, support for guided generation of model cards, and protobuf serialization for all serializable classes. In addition there is a new interface for distance based computations which can now use a kd-tree or brute force comparisons, the sparse linear model package has been rewritten to use Tribuo's linear algebra system improving the speed and reducing memory consumption, and we've added some more tutorials. Note this is likely the last feature release of Tribuo to support Java 8. The next major version of Tribuo will require Java 17. In addition, support for using `java.io.Serializable` for serialization will be removed in the next major release, and Tribuo will exclusively use protobuf based serialization. Feature Selection In this release we've added support for feature selection algorithms to the dataset and provenance systems, along with implementations of 4 information theoretic feature selection algorithms for use in classification problems. The algorithms (MIM, CMIM, mRMR and JMI) are described in this [review paper](https://jmlr.org/papers/v13/brown12a.html). Continuous inputs are discretised into a fixed number of equal width bins before the mutual information is computed. These algorithms are a useful feature selection baseline, and we welcome contributions to extend the set of supported algorithms. - Feature selection algorithms [#254](#254). Model Card Support [Model Cards](https://dl.acm.org/doi/10.1145/3287560.3287596) are a popular way of describing a model, its training data, expected applications and any use cases that should be avoided. In this release we've added guided generation of model cards, where many fields are automatically generated from the provenance information inside each Tribuo model. Fields which require user input (such as the expected use cases for a model, or its license) can be added via a CLI program, and the resulting model card can be saved in json format. At the moment, the automatic data extraction fails on some kinds of nested ensemble models which are generated without using a Tribuo `Trainer` class, in the future we'll look at improving the data extraction for this case. - Model card infrastructure ([#243](#243), [#250](#250), [#253](#253)). Protobuf Serialization In this release we've added [protocol buffer](https://developers.google.com/protocol-buffers) definitions for serializing all of Tribuo's serializable types, along with the necessary code to interact with those definitions. This effort has improved the validation of serialized data, and will allow Tribuo models to be upwards compatible across major versions of Tribuo. Any serialized model or dataset from Tribuo v4.2 or earlier can be loaded in and saved out into the new format which will ensure compatibility with the next major version of Tribuo. - Protobuf support for core types ([#226](#226), [#255](#255), [#262](#262), [#264](#264)). - Protobuf support for models (Multinomial Naive Bayes [#267](#267), Sparse linear models [#269](#269), XGBoost [#270](#270), OCI, ONNX and TF [#271](#271), LibSVM [#272](#272), LibLinear [#273](#273), SGD [#275](#275), Clustering models [#276](#276), Baseline models and ensembles [#277](#277), Trees [#278](#278)). - Docs and supporting programs ([#279](#279)). Smaller improvements We added an interface for querying the nearest neighbours of a vector, and updated HDBSCAN, K-Means and K-NN to use the new interface. The old implementation has been renamed the "brute force" search operator, and a new implementation which uses a kd-tree has been added. - Distance refactor ([#213](#213), [#216](#216), [#221](#221), [#231](#231), [#285](#285)). We migrated off Apache Commons Math, which necessitated adding several methods to Tribuo's math library. In the process we refactored the sparse linear model code, removing redundant matrix operations and greatly improving the speed of LASSO. - Refactor sparse linear models and remove Apache Commons Math ([#241](#241)). The ONNX export support has been refactored to allow the use of different ONNX opsets, and custom ONNX operations. This allows users of Tribuo's ONNX export support to supply their own operations, and increases the flexibility of the ONNX support on the JVM. - ONNX operator refactor ([#245](#245)). ONNX Runtime has been upgraded to v1.12.1, which includes Linux ARM64 and macOS ARM64 binaries. As a result we've removed the ONNX tests from the arm Maven profile, and so those tests will execute on Linux & macOS ARM64 platforms. - ONNX Runtime upgrade ([#256](#256)). Small improvements - Improved the assignment to the noise cluster in HDBSCAN ([#222](#222)). - Upgrade liblinear-java to v2.44 ([#228](#228)). - Added accessors for the HDBSCAN cluster exemplars ([#229](#229)). - Improve validation of salts when hashing feature names ([#237](#237)). - Added accessors to TransformedModel for the wrapped model ([#244](#244)). - Added a regex text preprocessor ([#247](#247)). - Upgrade OpenCSV to v5.6 ([#259](#259)). - Added a builder to RowProcessor to make it less confusing ([#263](#263)). - Upgrade TF-Java to v0.4.2 ([#281](#281)). - Upgrade OCI Java SDK to v2.46.0, protobuf-java to 3.19.6, XGBoost to 1.6.2, jackson to 2.14.0-rc1 ([#288](#288)). Bug Fixes - Fix for HDBSCAN small cluster generation ([#236](#236)). - XGBoost provenance capture ([#239](#239). Contributors - Adam Pocock ([@Craigacp](https://github.com/Craigacp)) - Jack Sullivan ([@JackSullivan](https://github.com/JackSullivan)) - Romina Mahinpei ([@rmahinpei](https://github.com/rmahinpei)) - Philip Ogren ([@pogren](https://github.com/pogren)) - Katie Younglove ([@katieyounglove](https://github.com/katieyounglove)) - Jeffrey Alexander ([@jhalexand](https://github.com/jhalexand)) - Geoff Stewart ([@geoffreydstewart](https://github.com/geoffreydstewart))
Tribuo v4.2.1 Small patch release for three issues: - Ensure K-Means thread pools shut down when training completes ([#224](#224)) - Fix issues where ONNX export of ensembles, K-Means initialization and several tests relied upon HashSet iteration order ([#220](https://github.com/oracle/tribuo/pull/220),[#225](https://github.com/oracle/tribuo/pull/225)) - Upgrade to TF-Java 0.4.1 which includes an upgrade to TF 2.7.1 which brings in several fixes for native crashes operating on malformed or malicious models ([#228](#227)) OLCUT is updated to 5.2.1 to pull in updated versions of jackson & protobuf ([#234](#234)). Also includes some docs and a small update for K-Means' `toString` ([#209](#209), [#211](#211), [#212](#212)). Contributors - Adam Pocock ([@Craigacp](https://github.com/Craigacp)) - Geoff Stewart ([@geoffreydstewart](https://github.com/geoffreydstewart)) - Yaliang Wu ([@ylwu-amzn](https://github.com/ylwu-amzn)) - Kaiyao Ke ([@kaiyaok2](https://github.com/kaiyaok2))
Tribuo v4.2.0 Tribuo 4.2 adds new models, ONNX export for several types of models, a reproducibility framework for recreating Tribuo models, easy deployment of Tribuo models on Oracle Cloud, along with several smaller improvements and bug fixes. We've added more tutorials covering the new features along with multi-label classification, and further expanded the javadoc to cover all public methods. In Tribuo 4.1.0 and earlier there is a severe bug in multi-dimensional regression models (i.e., regression tasks with multiple output dimensions). Models other than `LinearSGDModel` and `SparseLinearModel` (apart from when using the `ElasticNetCDTrainer`) have a bug in how the output dimension indices are constructed, and may produce incorrect outputs for all dimensions (as the output will be for a different dimension than the one named in the `Regressor` object). This has been fixed, and loading in models trained in earlier versions of Tribuo will patch the model to rearrange the dimensions appropriately. Unfortunately this fix cannot be applied to tree based models, and so all multi-output regression tree based models should be retrained using Tribuo 4.2 as they are irretrievably corrupt. Additionally when using standardization in multi-output regression LibSVM models dimensions past the first dimension have the model improperly stored and will also need to be retrained with Tribuo 4.2. See [#177](#177) for more details. Note the KMeans implementation had several internal changes to support running with a `java.lang.SecurityManager` which will break any subclasses of `KMeansTrainer`. In most cases changing the signature of any overridden `mStep` method to match the new signature, and allowing the `fjp` argument to be null in single threaded execution will fix the subclass. New models In this release we've added [Factorization Machines](https://www.computer.org/csdl/proceedings-article/icdm/2010/4256a995/12OmNwMFMfl), [Classifier Chains](https://link.springer.com/content/pdf/10.1007/s10994-011-5256-5.pdf) and [HDBSCAN\*](https://link.springer.com/chapter/10.1007/978-3-642-37456-2_14). Factorization machines are a powerful non-linear predictor which uses a factorized approximation to learn a per output feature-feature interaction term in addition to a linear model. We've added Factorization Machines for multi-class classification, multi-label classification and regression. Classifier chains are an ensemble approach to multi-label classification which given a specific ordering of the labels learns a chain of classifiers where each classifier gets the features along with the predicted labels from earlier in the chain. We also added ensembles of randomly ordered classifier chains which work well in situations when the ground truth label ordering is unknown (i.e., most of the time). HDBSCAN is a hierarchical density based clustering algorithm which chooses the number of clusters based on properties of the data rather than as a hyperparameter. The Tribuo implementation can cluster a dataset, and then at prediction time it provides the cluster the given datapoint would be in without modifying the cluster structure. - Classifier Chains ([#149](#149)), which also adds the jaccard score as a multi-label evaluation metric, and a multi-label voting combiner for use in multi-label ensembles. - Factorization machines ([#179](#179)). - HDBSCAN ([#196](#196)). ONNX Export The [ONNX](https://onnx.ai) format is a cross-platform and cross-library model exchange format. Tribuo can already serve ONNX models via its [ONNX Runtime](https://onnxruntime.ai) interface, and now has the ability to export models in ONNX format for serving on edge devices, in cloud services, or in other languages like Python or C#. In this release Tribuo supports exporting linear models (multi-class classification, multi-label classification and regression), sparse linear regression models, factorization machines (multi-class classification, multi-label classification and regression), LibLinear models (multi-class classification and regression), LibSVM models (multi-class classification and regression), along with ensembles of those models, including arbitrary levels of ensemble nesting. We plan to expand this coverage to more models over time, however for TensorFlow we recommend users export those models as a Saved Model and use the Python tf2onnx converter. Tribuo models exported in ONNX format preserve their provenance information in a metadata field which is accessible when the ONNX model is loaded back into Tribuo. The provenance is stored as a protobuf so could be read from other libraries or platforms if necessary. The ONNX export support is in a separate module with no dependencies, and could be used elsewhere on the JVM to support generating ONNX graphs. We welcome contributions to build out the ONNX support in that module. - ONNX export for LinearSGDModels ([#154](#154)), which also adds a multi-label output transformer for scoring multi-label ONNX models. - ONNX export for SparseLinearModel ([#163](#163)). - Add provenance to ONNX exported models ([#182](#182)). - Refactor ONNX tensor creation ([#187](#187)). - ONNX ensemble export support ([#186](#186)). - ONNX export for LibSVM and LibLinear ([#191](#191)). - Refactor ONNX support to improve type safety ([#199](#199)). - Extract ONNX support into separate module ([#TBD](https://github.com/oracle/tribuo/pull/)). Reproducibility Framework Tribuo has strong model metadata support via its provenance system which records how models, datasets and evaluations are created. In this release we enhance this support by adding a push-button reproduction framework which accepts either a model provenance or a model object and rebuilds the complete training pipeline, ensuring consistent usage of RNGs and other mutable state. This allows Tribuo to easily rebuild models to see if updated datasets could change performance, or even if the model is actually reproducible (which may be required for regulatory reasons). Over time we hope to expand this support into a full experimental framework, allowing models to be rebuilt with hyperparameter or data changes as part of the data science process or for debugging models in production. This framework was written by Joseph Wonsil and Prof. Margo Seltzer at the University of British Columbia as part of a collaboration between Prof. Seltzer and Oracle Labs. We're excited to continue working with Joe, Margo and the rest of the lab at UBC, as this is excellent work. Note the reproducibility framework module requires Java 16 or greater, and is thus not included in the `tribuo-all` meta-module. - Reproducibility framework ([#185](#185), with minor changes in [#189](#189) and [#190](#190)). OCI Data Science Integration [Oracle Cloud Data Science](https://www.oracle.com/data-science/cloud-infrastructure-data-science.html) is a platform for building and deploying models in Oracle Cloud. The model deployment functionality wraps a Python runtime and deploys them with an auto-scaler at a REST endpoint. In this release we've added support for deploying Tribuo models which are ONNX exportable directly to OCI DS, allowing scale-out deployments of models from the JVM. We also added a `OCIModel` wrapper which scores Tribuo `Example` objects using a deployed model's REST endpoint, allowing easy use of cloud resources for ML on the JVM. - Oracle Cloud Data Science integration ([#200](#200)). Small improvements - Date field processor and locale support in metadata extractors ([#148](#148)) - Multi-output response processor allowing loading different formats of multi-label and multi-dimensional regression datasets ([#150](#150)) - ARM dev profile for compiling Tribuo on ARM platforms ([#152](#152)) - Refactor CSVLoader so it uses CSVDataSource and parses CSV files using RowProcessor, allowing an easy transition to more complex columnar extraction ([#153](#153)) - Configurable anomaly demo data source ([#160](#160)) - Configurable clustering demo data source ([#161](#161)) - Configurable classification demo data source ([#162](#162)) - Multi-Label tutorial and configurable multi-label demo data source ([#166](#166)) (also adds a multi-label tutorial) plus fix in [#168](#168) after #167 - Add javadoc for all public methods and fields ([#175](#175)) (also fixes a bug in Util.vectorNorm) - Add hooks for model equality checks to trees and LibSVM models ([#183](#183)) (also fixes a bug in liblinear get top features) - XGBoost 1.5.0 ([#192](#192)) - TensorFlow Java 0.4.0 ([#195](#195)) (note this changes Tribuo's TF API slightly as TF-Java 0.4.0 has a different method of initializing the session) - KMeans now uses dense vectors when appropriate, speeding up training ([#201](#201)) - Documentation updates, ONNX and reproducibility tutorials ([#205](#205)) Bug fixes - NPE fix for LIME explanations using models which don't support per class weights ([#157](#157)) - Fixing a bug in multi-label evaluation which swapped FP for FN ([#167](#167)) - Persist CSVDataSource headers in the provenance ([#171](#171)) - Fixing LibSVM and LibLinear so they have reproducible behaviour ([#172](#172)) - Provenance fix for TransformTrainer and an extra factory for XGBoostExternalModel so you can make them from an in memory booster ([#176](#176)) - Fix multidimensional regression ([#177](#177)) (fixes regression ids, fixes libsvm so it emits correct standardized models, adds support for per dimension feature weights in XGBoostRegressionModel) - Fix provenance generation for FieldResponseProcessor and BinaryResponseProcessor ([#178](#178)) - Normalize LibSVMDataSource paths consistently in the provenance ([#181](#181)) - KMeans and KNN now run correctly when using OpenSearch's SecurityManager ([#197](#197)) Contributors - Adam Pocock ([@Craigacp](https://github.com/Craigacp)) - Jack Sullivan ([@JackSullivan](https://github.com/JackSullivan)) - Joseph Wonsil ([@jwons](https://github.com/jwons)) - Philip Ogren ([@pogren](https://github.com/pogren)) - Jeffrey Alexander ([@jhalexand](https://github.com/jhalexand)) - Geoff Stewart ([@geoffreydstewart](https://github.com/geoffreydstewart))
Tribuo v4.1.1 This is the first patch release for Tribuo v4.1. The main fixes in this release are to the multi-dimensional output regression support, and to support the use of KMeans and KNN models when running under a restrictive `SecurityManager`. Additionally this release pulls in TensorFlow-Java 0.4.0 which upgrades the TensorFlow native library to 2.7.0 fixing several CVEs. Note those CVEs may not be applicable to TensorFlow-Java, as many of them relate to Python codepaths which are not included in TensorFlow-Java. Note the TensorFlow upgrade is a breaking API change as graph initialization is handled differently in this release, which causes unavoidable changes in Tribuo's TF API. Multi-dimensional Regression fix In Tribuo 4.1.0 and earlier there is a severe bug in multi-dimensional regression models (i.e., regression tasks with multiple output dimensions). Models other than `LinearSGDModel` and `SparseLinearModel` (apart from when using the `ElasticNetCDTrainer`) have a bug in how the output dimension indices are constructed, and may produce incorrect outputs for all dimensions (as the output will be for a different dimension than the one named in the `Regressor` object). This has been fixed, and loading in models trained in earlier versions of Tribuo will patch the model to rearrange the dimensions appropriately. Unfortunately this fix cannot be applied to tree based models, and so all multi-output regression tree based models should be retrained using Tribuo 4.2 as they are irretrievably corrupt. Additionally when using standardization in multi-output regression LibSVM models dimensions past the first dimension have the model improperly stored and will also need to be retrained with Tribuo 4.2. See [#177](#177) for more details. Bug fixes - NPE fix for LIME explanations using models which don't support per class weights ([#157](#157)). - Fixing a bug in multi-label evaluation which swapped FP for FN ([#167](#167)). - Fixing LibSVM and LibLinear so they have reproducible behaviour ([#172](#172)). - Provenance fix for TransformTrainer and an extra factory for XGBoostExternalModel so you can make them from an in memory booster ([#176](#176)) - Fix multidimensional regression ([#177](#177)) (fixes regression ids, fixes libsvm so it emits correct standardized models, adds support for per dimension feature weights in XGBoostRegressionModel). - Normalize LibSVMDataSource paths consistently in the provenance ([#181](#181)). - KMeans and KNN now run correctly when using OpenSearch's SecurityManager ([#197](#197)). - TensorFlow-Java 0.4.0 ([#195](#195)). Contributors - Adam Pocock ([@Craigacp](https://github.com/Craigacp)) - Jack Sullivan ([@JackSullivan](https://github.com/JackSullivan)) - Philip Ogren ([@pogren](https://github.com/pogren)) - Jeffrey Alexander ([@jhalexand](https://github.com/jhalexand))
Tribuo 4.1 is the first feature release after the initial open source… … release. We've added new models, new parameters for some models, improvements to data loading, documentation, transformations and the speed of our CRF and linear models, along with a large update to the TensorFlow interface. We've also revised the tutorials and added two new ones covering TensorFlow and document classification. Migrated to TensorFlow Java 0.3.1 which allows specification and training of models in Java ([#134](#134)). The TensorFlow models can be saved in two formats, either using TensorFlow's checkpoint format or Tribuo's native model serialization. They can also be exported as TensorFlow Saved Models for interop with other TensorFlow platforms. Tribuo can now load TF v2 Saved Models and serve them alongside TF v1 frozen graphs with it's external model loader. We also added a TensorFlow tutorial which walks through the creation of a simple regression MLP, a classification MLP and a classification CNN, before exporting the model as a TensorFlow Saved Model and importing it back into Tribuo. - Added extremely randomized trees, i.e., ExtraTrees ([#51](#51)). - Added an SGD based linear model for multi-label classification ([#106](#106)). - Added liblinear's linear SVM anomaly detector ([#114](#114)). - Added arbitrary ensemble creation from existing models ([#129](#129)). - Added K-Means++ ([#34](#34)). - Added XGBoost feature importance metrics ([#52](#52)). - Added OffsetDateTimeExtractor to the columnar data package ([#66](#66)). - Added an empty response processor for use with clustering datasets ([#99](#99)). - Added IDFTransformation for generating TF-IDF features ([#104](#104)). - Exposed more parameters for XGBoost models ([#107](#107)). - Added a Wordpiece tokenizer ([#111](#111)). - Added optional output standardisation to LibSVM regressors ([#113](#113)). - Added a BERT feature extractor for text data ([#116](#116)). This can load in ONNX format BERT (and BERT style) models from HuggingFace Transformers, and use them as part of Tribuo's text feature extraction package. - Added a configurable version of AggregateDataSource, and added iteration order parameters to both forms of AggregateDataSource ([#125](#125)). - Added an option to RowProcessor which passes through newlines ([#137](#137)). - Removed redundant computation in tree construction ([#63](#63)). - Added better accessors for the centroids of a K-Means model ([#98](#98)). - Improved the speed of the feature transformation infrastructure ([#104](#104)). - Refactored the SGD models to reduce redundant code and allow models to share upcoming improvements ([#106](#106), [#134](#134)). - Added many performance optimisations to the linear SGD and CRF models, allowing the automatic use of dense feature spaces ([#112](#112)). This also adds specialisations to the math library for dense vectors and matrices, improving the performance of the CRF model even when operating on sparse feature sets. - Added provenance tracking of the Java version, OS and CPU architecture ([#115](#115)). - Changed the behaviour of sparse features under transformations to expose additional behaviour ([#122](#122)). - Improved `MultiLabelEvaluation.toString()` ([#136](#136)). - Added a document classification tutorial which shows the various text feature extraction techniques available in Tribuo. - Expanded javadoc coverage. - Upgraded ONNX Runtime to 1.7.0, XGBoost to 1.4.1, TensorFlow to 0.3.1, liblinear-java to 2.43, OLCUT to 5.1.6, OpenCSV to 5.4. - Miscellaneous small bug fixes. - Adam Pocock ([@Craigacp](https://github.com/Craigacp)) - Philip Ogren ([@pogren](https://github.com/pogren)) - Jeffrey Alexander ([@jhalexand](https://github.com/jhalexand)) - Jack Sullivan ([@JackSullivan](https://github.com/JackSullivan)) - Samantha Campo ([@samanthacampo](https://github.com/samanthacampo)) - Luke Nezda ([@nezda](https://github.com/nezda)) - Mani Sarkar ([@neomatrix369](https://github.com/neomatrix369)) - Stephen Green ([@eelstretching](https://github.com/eelstretching)) - Kate Silverstein ([@k8si](https://github.com/k8si))
Tribuo v4.0.2 Bugs fixed: - Fixed a locale issue in the evaluation tests. - Fixed issues with RowProcessor (expand regexes not being called, improper provenance capture). - IDXDataSource now throws FileNotFoundException rather than a mysterious NPE when it can't find the file. - Fixed issues in JsonDataSource (consistent exceptions thrown, proper termination of reading in several cases). - Fixed an issue where regression models couldn't be serialized due to a non-serializable lambda. - Fixed UTF-8 BOM issues in CSV loading. - Fixed an issue where LibSVMTrainer didn't track state between repeated calls to train. - Fixed issues in the evaluators to ensure consistent exception throwing when discovering unlabelled or unknown ground truth outputs. - Fixed a bug in ONNX LabelTransformer where it wouldn't read pytorch outputs properly. - Bumped to OLCUT 5.1.5 to fix a provenance -> configuration conversion issue. New additions: - Added a method which converts a Jackson ObjectNode into a Map suitable for the RowProcessor. - Added missing serialization tests to all the models. - Added a getInnerModels method to LibSVMModel, LibLinearModel and XGBoostModel to allow users to access a copy of the internal models. - More documentation. - Columnar data loading tutorial. - External model (XGBoost & ONNX) tutorial. Dependency updates: - OLCUT 5.1.5 (brings in jline 3.16.0 and jackson 2.11.3).
Tribuo v4.0.0 - Initial public release Tribuo is a machine learning library in Java that provides multi-class classification, regression, clustering, anomaly detection and multi-label classification. Tribuo provides implementations of popular ML algorithms and also wraps other libraries to provide a unified interface. Tribuo contains all the code necessary to load, featurise and transform data. Additionally, it includes the evaluation classes for all supported prediction types.