Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor struct Model -> class Model + class ModelImpl (Part of #196) #201

Merged
merged 8 commits into from
Sep 17, 2020

Conversation

hcho3
Copy link
Collaborator

@hcho3 hcho3 commented Sep 14, 2020

Extracted from #198, which was extracted from #196

Per #198 (comment), I created a separate pull request that just consists of the refactoring of struct Model into class Model and class ModelImpl. I also moved small functions' definitions back to tree.h header.

List of changes:

  • Upgrade C++ standard to C++14.
  • Split struct Model into class Model and class ModelImpl. The ModelImpl class will soon become a template class in order to hold Tree objects with uint32, float32, or float64 type. The Model class will become an abstract class so as to avoid exposing ModelImpl to external interface. (It's very hard to pass template classes through a FFI boundary.)
  • Change signature of methods that return Model, since Model is now an abstract class. These functions now return std::unique_ptr<Model>.
  • Move bodies of tiny methods from tree_impl.h to tree.h. This will reduce verbosity once ModelImpl becomes a template class.

@canonizer

@hcho3 hcho3 changed the title Refactor struct Model -> class Model + class ModelImpl Refactor struct Model -> class Model + class ModelImpl (Part of #196) Sep 14, 2020
@codecov-commenter
Copy link

codecov-commenter commented Sep 14, 2020

Codecov Report

Merging #201 into multi_type_refactor_breakup will increase coverage by 0.50%.
The diff coverage is 94.63%.

Impacted file tree graph

@@                        Coverage Diff                        @@
##             multi_type_refactor_breakup     #201      +/-   ##
=================================================================
+ Coverage                          82.28%   82.79%   +0.50%     
- Complexity                            54       56       +2     
=================================================================
  Files                                 85       85              
  Lines                               5657     5079     -578     
  Branches                              48       48              
=================================================================
- Hits                                4655     4205     -450     
+ Misses                               974      845     -129     
- Partials                              28       29       +1     
Impacted Files Coverage Δ Complexity Δ
include/treelite/compiler.h 100.00% <ø> (ø) 0.00 <0.00> (ø)
include/treelite/frontend.h 66.66% <ø> (+16.66%) 0.00 <0.00> (ø)
src/compiler/ast/builder.h 100.00% <ø> (ø) 0.00 <0.00> (ø)
include/treelite/tree.h 90.54% <89.85%> (-9.46%) 0.00 <0.00> (ø)
include/treelite/tree_impl.h 89.56% <92.30%> (+0.62%) 0.00 <0.00> (ø)
src/annotator.cc 95.29% <100.00%> (+1.41%) 0.00 <0.00> (ø)
src/c_api/c_api.cc 76.25% <100.00%> (+0.29%) 0.00 <0.00> (ø)
src/compiler/ast/build.cc 95.65% <100.00%> (+2.43%) 0.00 <0.00> (ø)
src/compiler/ast_native.cc 90.27% <100.00%> (+1.00%) 0.00 <0.00> (ø)
src/compiler/failsafe.cc 94.90% <100.00%> (+4.79%) 0.00 <0.00> (ø)
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d0bf6e3...a56e2d9. Read the comment docs.

Copy link
Contributor

@canonizer canonizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A list of changes in the pull request description would be useful. Otherwise looks good.

@hcho3 hcho3 merged commit dc18811 into dmlc:multi_type_refactor_breakup Sep 17, 2020
@hcho3 hcho3 deleted the new_model_rep2 branch September 17, 2020 02:36
hcho3 added a commit that referenced this pull request Oct 9, 2020
…#201)

* Upgrade C++ standard to C++14.
* Split struct Model into class Model and class ModelImpl. The ModelImpl class will soon become a template class in order to hold Tree objects with uint32, float32, or float64 type. The Model class will become an abstract class so as to avoid exposing ModelImpl to external interface. (It's very hard to pass template classes through a FFI boundary.)
* Change signature of methods that return Model, since Model is now an abstract class. These functions now return std::unique_ptr<Model>.
* Move bodies of tiny methods from tree_impl.h to tree.h. This will reduce verbosity once ModelImpl becomes a template class.

Co-authored-by: Andy Adinets <aadinets@nvidia.com>
hcho3 added a commit that referenced this pull request Oct 9, 2020
… data types (Part of #196) (#198)

* The ModelImpl class (created in #201) becomes the template class ModelImpl<ThresholdType, LeafOutputType>.
* Implement template classes ModelImpl<ThresholdType, LeafOutputType> and TreeImpl<ThresholdType, LeafOutputType> that contain the details of the tree ensemble model. The template classes are parameterized by the types of the thresholds and leaf outputs. Currently the following combinations are allowed:

| Threshold type | Leaf output type |
|----------------|------------------|
| float32        | float32          |
| float32        | uint32           |
| float64        | float64          |
| float64        | uint32           |
|----------------|------------------|

* Revise the zero-copy serialization protocol, to prepend the type information (threshold_type, leaf_output_type) to the serialized types so that the recipient will choose the correct ModelImpl<ThresholdType, LeafOutputType> to deserialize to.
* A run-time type dispatching system using the enum type TypeInfo. Users are able to dispatch a correct version of ModelImpl<ThresholdType, LeafOutputType> by specifying a pair of TypeInfo values. We also implement a set of convenient functions, such as InferTypeInfoOf<T> that converts the template arg T into TypeInfo enum.

Co-authored-by: William Hicks <wphicks@users.noreply.github.com>
Co-authored-by: Andy Adinets <aadinets@nvidia.com>
hcho3 added a commit that referenced this pull request Oct 9, 2020
Addresses #95 and #111.
Follow-up to #198, #199, #201

Trying again, since #130 failed. This time, I made the Model class to be polymorphic. This way, the amount of pointer indirection is minimized.

Summary: Model is an opaque container that wraps the polymorphic handle ModelImpl<ThresholdType, LeafOutputType>. The handle in turn stores the list of trees Tree<ThresholdType, LeafOutputType>. To unbox the Model container and obtain ModelImpl<ThresholdType, LeafOutputType>, use Model::Dispatch(<lambda expression>).

Also, upgrade to C++14 to access the generic lambda feature, which proved to be very useful in the dispatching logic for the polymorphic Model class.

* Turn the Model and Tree classes into template classes
* Revise the string templates so that correct data types are used in the generated C code
* Rewrite the model builder class
* Revise the zero-copy serializer
* Create an abstract matrix class that supports multiple data types (float32, float64 for now).
* Move the DMatrix class to the runtime.
* Extend the DMatrix class so that it can hold float32 and float64.
* Redesign the C runtime API using the DMatrix class.
* Ensure accuracy of scikit-learn models. To achieve the best results, use float32 for the input matrix and float64 for the split thresholds and leaf outputs.
* Revise the JVM runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants