Add dense feature vector optimisations to CRF and LinearSGD models #112

Craigacp · 2021-02-06T18:09:35Z

Description

This PR adds support for dense feature vectors to the CRF trainer/model and the linear SGD trainer/model. The CRF & linear model only use dense feature vectors if the example is truly dense (i.e. it contains a feature for each entry in the model or dataset's feature map). At some point in the future we could make this a model/trainer parameter, as the dense representation is probably faster at the cost of additional memory in spaces which are moderately sparse. We should do some benchmarking to find a reasonable point to make the default if we do this change.

To support the CRF and linear model change then there were additional methods implemented in the org.tribuo.math.la package, along with extra cases inside those methods. Additionally the ModelProvenance now correctly tracks if the dataset was dense, and the dataset better understands if it is dense (rather than only assuming it is dense if it had densify called on it). Only MutableDataset and MutableSequenceDataset know if they are dense or not, we should probably move this check onto Dataset and SequenceDataset so immutable views can be considered dense, but the nature of the views mean this is a little more complicated to check.

Several classes were added to various sequence packages to build them out for further testing. The main ones are an IndependentSequenceTrainer and an IndependentSequenceModel which wrap a regular Trainer/Model and use it to make independent predictions for each element of the sequence. There is also a more useful sequence train test harness for comparing different sequence models (and a small fix for ViterbiTrainer which was not compatible with the configuration system).

Motivation

Working with a dense feature space (e.g. an embedding vector space) is many times faster than treating the dense space as if it is sparse with the associated indirection on all the linear algebra operations. This greatly improves the speed of the CRF operating over embedding vectors, and of the linear model when operating on dense spaces. The output of the two systems is identical.

This density optimisation opens the path for further speedups either by making Tribuo's la package more amenable to autovectorisation, or by explicitly vectorising it using the Vector API in Java 16.

…ctor to SGDVector in prep for using DenseVectors for word embeddings.

- Adding a layer of indirection to all the operations in DenseVector which read from the elements array, to make AdaGradRDA and ShrinkingVector simpler (simplification to be done later). - Adding sparse.outer(dense) to SparseVector. - Relaxing the type in LinearParameters.gradients from SparseVector to SGDVector.

…ainer} - CRFModel, AbstractLinearSGDModel and AbstractLinearSGDTrainer now use DenseVector for the feature vector if the feature vector is the same size as the number of features. This should optimise things which operate on dense embedding vectors. In the future this may be converted into a user controlled parameter which uses a DenseVector if at least some fraction of the features are present.

…ue to a missing constructor.

…models.

…ns for each sequence element using a standard Model.

- MutableDataset now tracks if it contains dense or sparse examples - MutableSequenceDataset now tracks if it contains dense or sparse examples - MutableSequenceDataset has densify and clear methods. - DatasetProvenance now checks to see if the SequenceDataset is dense or not. - Example.densify(FeatureMap) is now public (as it already was for ArrayExample). - Added tests to density detection and densify calls.

…method throw if there is no feature overlap between the example and the feature map.

…nside that class.

eelstretching

Looks good to me. More of a comment than a question: we're clearly planning to densify or sparsify based on characteristics of the examples vs. the feature map. I believe that we should also have a way to let the users pick what they want to do, even if they're going to pick a dumb way.

This will be more important when we get to the case where dense is not just "uses all the features"

Craigacp · 2021-03-03T23:45:52Z

I agree there should be a user parameter. I'm thinking a flag on the trainer and model with an associated threshold float between zero and one. It won't be useful on most of the models as the backends are also sparse, but LinearSGDModel and CRFModel are likely targets.

Craigacp · 2021-03-03T23:46:19Z

But that can be something for the next release.

* Adding DenseVector support to KMeansTraner. * Adding docs to KMeansTrainer.mStep. * Letting KMeansModel pick dynamically between dense and sparse vectors at predict time. Description Relaxed the method signatures in `KMeansTrainer` to allow the use of `DenseVector` or `SparseVector`, and changed the train & predict methods to dynamically pick between sparse and dense vectors. This is similar to the change for #112. This speeds up k-means in dense spaces as dense vectors are faster than a sparse vector operating in a dense space. This changes the signature of the protected `mStep` method, relaxing one of the argument types from `SparseVector[]` to `SGDVector[]`. This change means the train method will no longer call subclasses which override the `mStep` method, and so it's a breaking change for subclasses of `KMeansTrainer`. If users have tagged their override with `@Override` then the compiler will warn them and it should be a one line change. Motivation Improves the speed of K-means when working in dense spaces.

Craigacp added 11 commits December 17, 2020 18:52

Relaxing the types in org.tribuo.classification.sgd.crf from SparseVe…

2589872

…ctor to SGDVector in prep for using DenseVectors for word embeddings.

Deprecating an unused util method, adding timing hooks to SeqTest.

ee7d020

Fixing a bug where ViterbiTrainer couldn't be instantiated by OLCUT d…

e8150de

…ue to a missing constructor.

Adding a configurable train test harness for sequence classification …

295887e

…models.

Adding a sequence trainer and model which makes independent predictio…

8cce7cf

…ns for each sequence element using a standard Model.

Adding tests for DenseVector.createDenseVector(Example), making that …

af2b41b

…method throw if there is no feature overlap between the example and the feature map.

Fix a few copyrights.

1c0d39a

Adding BinaryExample.isDense, and making a couple of small cleanups i…

6de246a

…nside that class.

eelstretching approved these changes Mar 3, 2021

View reviewed changes

Craigacp merged commit 4001680 into main Mar 3, 2021

Craigacp deleted the dense-crf branch March 3, 2021 23:47

Craigacp mentioned this pull request Dec 9, 2021

KMeans DenseVector support #201

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dense feature vector optimisations to CRF and LinearSGD models #112

Add dense feature vector optimisations to CRF and LinearSGD models #112

Craigacp commented Feb 6, 2021

eelstretching left a comment

Craigacp commented Mar 3, 2021

Craigacp commented Mar 3, 2021

Add dense feature vector optimisations to CRF and LinearSGD models #112

Add dense feature vector optimisations to CRF and LinearSGD models #112

Conversation

Craigacp commented Feb 6, 2021

Description

Motivation

eelstretching left a comment

Choose a reason for hiding this comment

Craigacp commented Mar 3, 2021

Craigacp commented Mar 3, 2021