-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine TreeModel and RegTree #3995
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3995 +/- ##
============================================
+ Coverage 56.66% 57.24% +0.57%
- Complexity 205 210 +5
============================================
Files 187 190 +3
Lines 14823 15045 +222
Branches 498 527 +29
============================================
+ Hits 8399 8612 +213
+ Misses 6185 6176 -9
- Partials 239 257 +18
Continue to review full report at Codecov.
|
Looked briefly, I like the moving to cc file part. One minor issue is can you take care the indent of function parameters? Next time I step on the code my editor might try to correct these indent which brings some headaches. |
@trivialfis I'm not sure what you mean can you show an example please. |
src/tree/tree_model.cc
Outdated
} | ||
|
||
void RegTree::CalculateContributionsApprox(const RegTree::FVec& feat, unsigned root_id, | ||
bst_float *out_contribs) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is incorrectly indented. I don't know how it looks like on your IDE, but please view it on Github.
src/tree/tree_model.cc
Outdated
|
||
// extend our decision path with a fraction of one and zero extensions | ||
void ExtendPath(PathElement *unique_path, unsigned unique_depth, | ||
bst_float zero_fraction, bst_float one_fraction, int feature_index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this line.
src/tree/tree_model.cc
Outdated
// determine what the total permuation weight would be if | ||
// we unwound a previous extension in the decision path | ||
bst_float UnwoundPathSum(const PathElement *unique_path, unsigned unique_depth, | ||
unsigned path_index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this.
src/tree/tree_model.cc
Outdated
|
||
// recursive computation of SHAP values for a decision tree | ||
void RegTree::TreeShap(const RegTree::FVec& feat, bst_float *phi, | ||
unsigned node_index, unsigned unique_depth, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this, any many others in tree_model.h
.
include/xgboost/tree_model.h
Outdated
* \brief set the left child | ||
* \param nid node id to right child | ||
*/ | ||
inline void SetLeftChild(int nid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove inline
keyword.
include/xgboost/tree_model.h
Outdated
@@ -143,7 +160,7 @@ class TreeModel { | |||
* \param split_cond split condition | |||
* \param default_left the default direction when feature is unknown | |||
*/ | |||
inline void SetSplit(unsigned split_index, TSplitCond split_cond, | |||
inline void SetSplit(unsigned split_index, SplitCondT split_cond, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inline
is not useful here. Any methods defined inside class is automatically inlined.
int depth = 0; | ||
while (!nodes_[nid].IsRoot()) { | ||
if (!pass_rchild || nodes_[nid].IsLeftChild()) ++depth; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand the logic of this, so it's not used anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass_rchild is never used in the code, therefore it is always false and this condition always evaluates to true. So I have removed both the function argument pass_rchild and the if statement so the behaviour is the same but the function is simpler.
This PR contains a subset of less controversial changes from #3983 with more concrete reasoning for the changes. The high level goal is to reduce the complexity and size of the class and improve usability.
The changes are:
Merge TreeModel and RegTree
RegTree is a subclass of TreeModel, which has no other subclasses. It looks as if this was designed many years ago with the possibility of having other types of trees apart from regression trees. Todays xgboost only supports regression trees and it is difficult to see what other kind of tree we might support, whether this interface would even allow it and how this would fit into the current code base.
I believe this classifies as speculative generality (https://refactoring.guru/smells/speculative-generality). Removal will make the code base easier to understand, more flexible and easier to maintain. In the event that we do want to add additional types of tree in future we can reintroduce the base class, but I think it is better to do this with specific proposals in mind as opposed to keeping the current design that may or may not meet these needs.
Remove TreeModel::AddRightChild method
This method is unused and contributes to the bloat of the class.
Remove second argument of TreeModel::GetDepth(int nid, bool pass_rchild = false)
Argument is not used anywhere and has no clear purpose
Move TreeModel::InitModel() to constructor
There is no need for a delayed Init method here. It is safer to perform initialisation in the constructor so the object is always prepared for use. The previous design is less safe as if the user forgets to call InitModel it can result in a segfault.
Remove TreeModel::Predict method
This method is unused and contributes to the bloat of the class. If this is needed in future it can be trivially implemented.
Move large functions to .cc file
These functions are not frequently called (e.g. in a hot loop) and do not need to be inlined. This also improves compilation time.