-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor struct Model -> class Model + class ModelImpl (Part of #196) #201
Conversation
253568f
to
a56e2d9
Compare
Codecov Report
@@ Coverage Diff @@
## multi_type_refactor_breakup #201 +/- ##
=================================================================
+ Coverage 82.28% 82.79% +0.50%
- Complexity 54 56 +2
=================================================================
Files 85 85
Lines 5657 5079 -578
Branches 48 48
=================================================================
- Hits 4655 4205 -450
+ Misses 974 845 -129
- Partials 28 29 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A list of changes in the pull request description would be useful. Otherwise looks good.
…#201) * Upgrade C++ standard to C++14. * Split struct Model into class Model and class ModelImpl. The ModelImpl class will soon become a template class in order to hold Tree objects with uint32, float32, or float64 type. The Model class will become an abstract class so as to avoid exposing ModelImpl to external interface. (It's very hard to pass template classes through a FFI boundary.) * Change signature of methods that return Model, since Model is now an abstract class. These functions now return std::unique_ptr<Model>. * Move bodies of tiny methods from tree_impl.h to tree.h. This will reduce verbosity once ModelImpl becomes a template class. Co-authored-by: Andy Adinets <aadinets@nvidia.com>
… data types (Part of #196) (#198) * The ModelImpl class (created in #201) becomes the template class ModelImpl<ThresholdType, LeafOutputType>. * Implement template classes ModelImpl<ThresholdType, LeafOutputType> and TreeImpl<ThresholdType, LeafOutputType> that contain the details of the tree ensemble model. The template classes are parameterized by the types of the thresholds and leaf outputs. Currently the following combinations are allowed: | Threshold type | Leaf output type | |----------------|------------------| | float32 | float32 | | float32 | uint32 | | float64 | float64 | | float64 | uint32 | |----------------|------------------| * Revise the zero-copy serialization protocol, to prepend the type information (threshold_type, leaf_output_type) to the serialized types so that the recipient will choose the correct ModelImpl<ThresholdType, LeafOutputType> to deserialize to. * A run-time type dispatching system using the enum type TypeInfo. Users are able to dispatch a correct version of ModelImpl<ThresholdType, LeafOutputType> by specifying a pair of TypeInfo values. We also implement a set of convenient functions, such as InferTypeInfoOf<T> that converts the template arg T into TypeInfo enum. Co-authored-by: William Hicks <wphicks@users.noreply.github.com> Co-authored-by: Andy Adinets <aadinets@nvidia.com>
Addresses #95 and #111. Follow-up to #198, #199, #201 Trying again, since #130 failed. This time, I made the Model class to be polymorphic. This way, the amount of pointer indirection is minimized. Summary: Model is an opaque container that wraps the polymorphic handle ModelImpl<ThresholdType, LeafOutputType>. The handle in turn stores the list of trees Tree<ThresholdType, LeafOutputType>. To unbox the Model container and obtain ModelImpl<ThresholdType, LeafOutputType>, use Model::Dispatch(<lambda expression>). Also, upgrade to C++14 to access the generic lambda feature, which proved to be very useful in the dispatching logic for the polymorphic Model class. * Turn the Model and Tree classes into template classes * Revise the string templates so that correct data types are used in the generated C code * Rewrite the model builder class * Revise the zero-copy serializer * Create an abstract matrix class that supports multiple data types (float32, float64 for now). * Move the DMatrix class to the runtime. * Extend the DMatrix class so that it can hold float32 and float64. * Redesign the C runtime API using the DMatrix class. * Ensure accuracy of scikit-learn models. To achieve the best results, use float32 for the input matrix and float64 for the split thresholds and leaf outputs. * Revise the JVM runtime.
Extracted from #198, which was extracted from #196
Per #198 (comment), I created a separate pull request that just consists of the refactoring of
struct Model
intoclass Model
andclass ModelImpl
. I also moved small functions' definitions back totree.h
header.List of changes:
struct Model
intoclass Model
andclass ModelImpl
. TheModelImpl
class will soon become a template class in order to holdTree
objects with uint32, float32, or float64 type. TheModel
class will become an abstract class so as to avoid exposingModelImpl
to external interface. (It's very hard to pass template classes through a FFI boundary.)Model
, sinceModel
is now an abstract class. These functions now returnstd::unique_ptr<Model>
.tree_impl.h
totree.h
. This will reduce verbosity onceModelImpl
becomes a template class.@canonizer