-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for multiple data types (will break up into smaller pull requests) #196
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hcho3
force-pushed
the
multi_type_support2
branch
3 times, most recently
from
August 29, 2020 08:06
4458ec7
to
322acc8
Compare
hcho3
force-pushed
the
multi_type_support2
branch
from
August 29, 2020 22:24
322acc8
to
2e823eb
Compare
…ss with all possible template args
hcho3
force-pushed
the
multi_type_support2
branch
from
August 31, 2020 09:52
ad3314a
to
c0d1abd
Compare
hcho3
force-pushed
the
multi_type_support2
branch
from
August 31, 2020 18:58
5ec551a
to
acc20be
Compare
hcho3
force-pushed
the
multi_type_support2
branch
from
August 31, 2020 21:51
3f3ec0d
to
77477e7
Compare
hcho3
force-pushed
the
multi_type_support2
branch
from
August 31, 2020 23:30
ab6277d
to
b846f6a
Compare
hcho3
force-pushed
the
multi_type_support2
branch
from
September 2, 2020 04:43
a658f01
to
4515d9d
Compare
hcho3
force-pushed
the
multi_type_support2
branch
from
September 3, 2020 02:37
4515d9d
to
914469b
Compare
This was referenced Sep 10, 2020
Create template classes for model representations to support multiple data types (Part of #196)
#198
Merged
) (dmlc#201) * Upgrade C++ standard to C++14. * Split struct Model into class Model and class ModelImpl. The ModelImpl class will soon become a template class in order to hold Tree objects with uint32, float32, or float64 type. The Model class will become an abstract class so as to avoid exposing ModelImpl to external interface. (It's very hard to pass template classes through a FFI boundary.) * Change signature of methods that return Model, since Model is now an abstract class. These functions now return std::unique_ptr<Model>. * Move bodies of tiny methods from tree_impl.h to tree.h. This will reduce verbosity once ModelImpl becomes a template class.
…o multi_type_support2
… data types (Part of dmlc#196) (dmlc#198) * Create template classes for model representations to support multiple data types * Update include/treelite/tree.h Co-authored-by: William Hicks <wphicks@users.noreply.github.com> * Address review comments from @canonizer * Address more comments from @canonizer Co-authored-by: William Hicks <wphicks@users.noreply.github.com> Co-authored-by: William Hicks <wphicks@users.noreply.github.com> Co-authored-by: Andy Adinets <aadinets@nvidia.com>
…o multi_type_support2
hcho3
force-pushed
the
multi_type_support2
branch
from
September 25, 2020 08:47
65d83ee
to
d9173df
Compare
hcho3
added a commit
that referenced
this pull request
Oct 9, 2020
…196) (#199) * The runtime now queries the data type for the model it loads, via QueryThresholdType() and QueryLeafOutputType(). Every compiled model now embeds the type information. * Implement a full-fledged data matrix class DMatrix in the runtime, to replace DenseBatch and SparseBatch. The former *Batch classes assumed float32 data, whereas the new DMatrix class is able to handle both float32 and float64. * Some API functions like PredictBatch() now takes void* pointers to accommodate multiple data types. Co-authored-by: Yuta Hinokuma <higumachan@users.noreply.github.com>
hcho3
added a commit
that referenced
this pull request
Oct 9, 2020
…#201) * Upgrade C++ standard to C++14. * Split struct Model into class Model and class ModelImpl. The ModelImpl class will soon become a template class in order to hold Tree objects with uint32, float32, or float64 type. The Model class will become an abstract class so as to avoid exposing ModelImpl to external interface. (It's very hard to pass template classes through a FFI boundary.) * Change signature of methods that return Model, since Model is now an abstract class. These functions now return std::unique_ptr<Model>. * Move bodies of tiny methods from tree_impl.h to tree.h. This will reduce verbosity once ModelImpl becomes a template class. Co-authored-by: Andy Adinets <aadinets@nvidia.com>
hcho3
added a commit
that referenced
this pull request
Oct 9, 2020
… data types (Part of #196) (#198) * The ModelImpl class (created in #201) becomes the template class ModelImpl<ThresholdType, LeafOutputType>. * Implement template classes ModelImpl<ThresholdType, LeafOutputType> and TreeImpl<ThresholdType, LeafOutputType> that contain the details of the tree ensemble model. The template classes are parameterized by the types of the thresholds and leaf outputs. Currently the following combinations are allowed: | Threshold type | Leaf output type | |----------------|------------------| | float32 | float32 | | float32 | uint32 | | float64 | float64 | | float64 | uint32 | |----------------|------------------| * Revise the zero-copy serialization protocol, to prepend the type information (threshold_type, leaf_output_type) to the serialized types so that the recipient will choose the correct ModelImpl<ThresholdType, LeafOutputType> to deserialize to. * A run-time type dispatching system using the enum type TypeInfo. Users are able to dispatch a correct version of ModelImpl<ThresholdType, LeafOutputType> by specifying a pair of TypeInfo values. We also implement a set of convenient functions, such as InferTypeInfoOf<T> that converts the template arg T into TypeInfo enum. Co-authored-by: William Hicks <wphicks@users.noreply.github.com> Co-authored-by: Andy Adinets <aadinets@nvidia.com>
hcho3
added a commit
that referenced
this pull request
Oct 9, 2020
Addresses #95 and #111. Follow-up to #198, #199, #201 Trying again, since #130 failed. This time, I made the Model class to be polymorphic. This way, the amount of pointer indirection is minimized. Summary: Model is an opaque container that wraps the polymorphic handle ModelImpl<ThresholdType, LeafOutputType>. The handle in turn stores the list of trees Tree<ThresholdType, LeafOutputType>. To unbox the Model container and obtain ModelImpl<ThresholdType, LeafOutputType>, use Model::Dispatch(<lambda expression>). Also, upgrade to C++14 to access the generic lambda feature, which proved to be very useful in the dispatching logic for the polymorphic Model class. * Turn the Model and Tree classes into template classes * Revise the string templates so that correct data types are used in the generated C code * Rewrite the model builder class * Revise the zero-copy serializer * Create an abstract matrix class that supports multiple data types (float32, float64 for now). * Move the DMatrix class to the runtime. * Extend the DMatrix class so that it can hold float32 and float64. * Redesign the C runtime API using the DMatrix class. * Ensure accuracy of scikit-learn models. To achieve the best results, use float32 for the input matrix and float64 for the split thresholds and leaf outputs. * Revise the JVM runtime.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses #95 and #111.
Trying again, since #130 failed. This time, I made the
Model
class to be polymorphic. This way, the amount of pointer indirection is minimized.Summary:
Model
is an opaque container that wraps the polymorphic handleModelImpl<ThresholdType, LeafOutputType>
. The handle in turn stores the list of treesTree<ThresholdType, LeafOutputType>
. To unbox theModel
container and obtainModelImpl<ThresholdType, LeafOutputType>
, useModel::Dispatch(<lambda expression>)
.Also, upgrade to C++14 to access the generic lambda feature, which proved to be very useful in the dispatching logic for the polymorphic
Model
class.EDIT. I will break up this PR into smaller PRs, once I get the whole system working together correctly.
TODOs