Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dense feature vector optimisations to CRF and LinearSGD models #112

Merged
merged 11 commits into from
Mar 3, 2021

Conversation

Craigacp
Copy link
Member

@Craigacp Craigacp commented Feb 6, 2021

Description

This PR adds support for dense feature vectors to the CRF trainer/model and the linear SGD trainer/model. The CRF & linear model only use dense feature vectors if the example is truly dense (i.e. it contains a feature for each entry in the model or dataset's feature map). At some point in the future we could make this a model/trainer parameter, as the dense representation is probably faster at the cost of additional memory in spaces which are moderately sparse. We should do some benchmarking to find a reasonable point to make the default if we do this change.

To support the CRF and linear model change then there were additional methods implemented in the org.tribuo.math.la package, along with extra cases inside those methods. Additionally the ModelProvenance now correctly tracks if the dataset was dense, and the dataset better understands if it is dense (rather than only assuming it is dense if it had densify called on it). Only MutableDataset and MutableSequenceDataset know if they are dense or not, we should probably move this check onto Dataset and SequenceDataset so immutable views can be considered dense, but the nature of the views mean this is a little more complicated to check.

Several classes were added to various sequence packages to build them out for further testing. The main ones are an IndependentSequenceTrainer and an IndependentSequenceModel which wrap a regular Trainer/Model and use it to make independent predictions for each element of the sequence. There is also a more useful sequence train test harness for comparing different sequence models (and a small fix for ViterbiTrainer which was not compatible with the configuration system).

Motivation

Working with a dense feature space (e.g. an embedding vector space) is many times faster than treating the dense space as if it is sparse with the associated indirection on all the linear algebra operations. This greatly improves the speed of the CRF operating over embedding vectors, and of the linear model when operating on dense spaces. The output of the two systems is identical.

This density optimisation opens the path for further speedups either by making Tribuo's la package more amenable to autovectorisation, or by explicitly vectorising it using the Vector API in Java 16.

…ctor to SGDVector in prep for using DenseVectors for word embeddings.
- Adding a layer of indirection to all the operations in DenseVector
  which read from the elements array, to make AdaGradRDA and
ShrinkingVector simpler (simplification to be done later).
- Adding sparse.outer(dense) to SparseVector.
- Relaxing the type in LinearParameters.gradients from SparseVector to
  SGDVector.
…ainer}

- CRFModel, AbstractLinearSGDModel and AbstractLinearSGDTrainer now use
DenseVector for the feature vector if the feature vector is the same
size as the number of features. This should optimise things which
operate on dense embedding vectors. In the future this may be converted
into a user controlled parameter which uses a DenseVector if at least
some fraction of the features are present.
…ns for each sequence element using a standard Model.
- MutableDataset now tracks if it contains dense or sparse examples
- MutableSequenceDataset now tracks if it contains dense or sparse
  examples
- MutableSequenceDataset has densify and clear methods.
- DatasetProvenance now checks to see if the SequenceDataset is dense or
  not.
- Example.densify(FeatureMap) is now public (as it already was for ArrayExample).
- Added tests to density detection and densify calls.
…method throw if there is no feature overlap between the example and the feature map.
Copy link
Member

@eelstretching eelstretching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. More of a comment than a question: we're clearly planning to densify or sparsify based on characteristics of the examples vs. the feature map. I believe that we should also have a way to let the users pick what they want to do, even if they're going to pick a dumb way.

This will be more important when we get to the case where dense is not just "uses all the features"

@Craigacp
Copy link
Member Author

Craigacp commented Mar 3, 2021

I agree there should be a user parameter. I'm thinking a flag on the trainer and model with an associated threshold float between zero and one. It won't be useful on most of the models as the backends are also sparse, but LinearSGDModel and CRFModel are likely targets.

@Craigacp
Copy link
Member Author

Craigacp commented Mar 3, 2021

But that can be something for the next release.

@Craigacp Craigacp merged commit 4001680 into main Mar 3, 2021
@Craigacp Craigacp deleted the dense-crf branch March 3, 2021 23:47
Craigacp added a commit that referenced this pull request Dec 17, 2021
* Adding DenseVector support to KMeansTraner.

* Adding docs to KMeansTrainer.mStep.

* Letting KMeansModel pick dynamically between dense and sparse vectors at predict time.

Description
Relaxed the method signatures in `KMeansTrainer` to allow the use of `DenseVector` or `SparseVector`, and changed the train & predict methods to dynamically pick between sparse and dense vectors. This is similar to the change for #112.

This speeds up k-means in dense spaces as dense vectors are faster than a sparse vector operating in a dense space.

This changes the signature of the protected `mStep` method, relaxing one of the argument types from `SparseVector[]` to `SGDVector[]`. This change means the train method will no longer call subclasses which override the `mStep` method, and so it's a breaking change for subclasses of `KMeansTrainer`. If users have tagged their override with `@Override` then the compiler will warn them and it should be a one line change.

Motivation
Improves the speed of K-means when working in dense spaces.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants