-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dense feature vector optimisations to CRF and LinearSGD models #112
Conversation
…ctor to SGDVector in prep for using DenseVectors for word embeddings.
- Adding a layer of indirection to all the operations in DenseVector which read from the elements array, to make AdaGradRDA and ShrinkingVector simpler (simplification to be done later). - Adding sparse.outer(dense) to SparseVector. - Relaxing the type in LinearParameters.gradients from SparseVector to SGDVector.
…ainer} - CRFModel, AbstractLinearSGDModel and AbstractLinearSGDTrainer now use DenseVector for the feature vector if the feature vector is the same size as the number of features. This should optimise things which operate on dense embedding vectors. In the future this may be converted into a user controlled parameter which uses a DenseVector if at least some fraction of the features are present.
…ue to a missing constructor.
…ns for each sequence element using a standard Model.
- MutableDataset now tracks if it contains dense or sparse examples - MutableSequenceDataset now tracks if it contains dense or sparse examples - MutableSequenceDataset has densify and clear methods. - DatasetProvenance now checks to see if the SequenceDataset is dense or not. - Example.densify(FeatureMap) is now public (as it already was for ArrayExample). - Added tests to density detection and densify calls.
…method throw if there is no feature overlap between the example and the feature map.
…nside that class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. More of a comment than a question: we're clearly planning to densify or sparsify based on characteristics of the examples vs. the feature map. I believe that we should also have a way to let the users pick what they want to do, even if they're going to pick a dumb way.
This will be more important when we get to the case where dense is not just "uses all the features"
I agree there should be a user parameter. I'm thinking a flag on the trainer and model with an associated threshold float between zero and one. It won't be useful on most of the models as the backends are also sparse, but LinearSGDModel and CRFModel are likely targets. |
But that can be something for the next release. |
* Adding DenseVector support to KMeansTraner. * Adding docs to KMeansTrainer.mStep. * Letting KMeansModel pick dynamically between dense and sparse vectors at predict time. Description Relaxed the method signatures in `KMeansTrainer` to allow the use of `DenseVector` or `SparseVector`, and changed the train & predict methods to dynamically pick between sparse and dense vectors. This is similar to the change for #112. This speeds up k-means in dense spaces as dense vectors are faster than a sparse vector operating in a dense space. This changes the signature of the protected `mStep` method, relaxing one of the argument types from `SparseVector[]` to `SGDVector[]`. This change means the train method will no longer call subclasses which override the `mStep` method, and so it's a breaking change for subclasses of `KMeansTrainer`. If users have tagged their override with `@Override` then the compiler will warn them and it should be a one line change. Motivation Improves the speed of K-means when working in dense spaces.
Description
This PR adds support for dense feature vectors to the CRF trainer/model and the linear SGD trainer/model. The CRF & linear model only use dense feature vectors if the example is truly dense (i.e. it contains a feature for each entry in the model or dataset's feature map). At some point in the future we could make this a model/trainer parameter, as the dense representation is probably faster at the cost of additional memory in spaces which are moderately sparse. We should do some benchmarking to find a reasonable point to make the default if we do this change.
To support the CRF and linear model change then there were additional methods implemented in the
org.tribuo.math.la
package, along with extra cases inside those methods. Additionally theModelProvenance
now correctly tracks if the dataset was dense, and the dataset better understands if it is dense (rather than only assuming it is dense if it haddensify
called on it). OnlyMutableDataset
andMutableSequenceDataset
know if they are dense or not, we should probably move this check ontoDataset
andSequenceDataset
so immutable views can be considered dense, but the nature of the views mean this is a little more complicated to check.Several classes were added to various sequence packages to build them out for further testing. The main ones are an
IndependentSequenceTrainer
and anIndependentSequenceModel
which wrap a regularTrainer
/Model
and use it to make independent predictions for each element of the sequence. There is also a more useful sequence train test harness for comparing different sequence models (and a small fix forViterbiTrainer
which was not compatible with the configuration system).Motivation
Working with a dense feature space (e.g. an embedding vector space) is many times faster than treating the dense space as if it is sparse with the associated indirection on all the linear algebra operations. This greatly improves the speed of the CRF operating over embedding vectors, and of the linear model when operating on dense spaces. The output of the two systems is identical.
This density optimisation opens the path for further speedups either by making Tribuo's la package more amenable to autovectorisation, or by explicitly vectorising it using the Vector API in Java 16.