Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine TreeModel and RegTree #3995

Merged
merged 3 commits into from
Dec 18, 2018
Merged

Conversation

RAMitchell
Copy link
Member

This PR contains a subset of less controversial changes from #3983 with more concrete reasoning for the changes. The high level goal is to reduce the complexity and size of the class and improve usability.

The changes are:

Merge TreeModel and RegTree
RegTree is a subclass of TreeModel, which has no other subclasses. It looks as if this was designed many years ago with the possibility of having other types of trees apart from regression trees. Todays xgboost only supports regression trees and it is difficult to see what other kind of tree we might support, whether this interface would even allow it and how this would fit into the current code base.

I believe this classifies as speculative generality (https://refactoring.guru/smells/speculative-generality). Removal will make the code base easier to understand, more flexible and easier to maintain. In the event that we do want to add additional types of tree in future we can reintroduce the base class, but I think it is better to do this with specific proposals in mind as opposed to keeping the current design that may or may not meet these needs.

Remove TreeModel::AddRightChild method
This method is unused and contributes to the bloat of the class.

Remove second argument of TreeModel::GetDepth(int nid, bool pass_rchild = false)
Argument is not used anywhere and has no clear purpose

Move TreeModel::InitModel() to constructor
There is no need for a delayed Init method here. It is safer to perform initialisation in the constructor so the object is always prepared for use. The previous design is less safe as if the user forgets to call InitModel it can result in a segfault.

Remove TreeModel::Predict method
This method is unused and contributes to the bloat of the class. If this is needed in future it can be trivially implemented.

Move large functions to .cc file
These functions are not frequently called (e.g. in a hot loop) and do not need to be inlined. This also improves compilation time.

@RAMitchell RAMitchell requested a review from tqchen December 12, 2018 23:21
@codecov-io
Copy link

codecov-io commented Dec 13, 2018

Codecov Report

Merging #3995 into master will increase coverage by 0.57%.
The diff coverage is 51.77%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3995      +/-   ##
============================================
+ Coverage     56.66%   57.24%   +0.57%     
- Complexity      205      210       +5     
============================================
  Files           187      190       +3     
  Lines         14823    15045     +222     
  Branches        498      527      +29     
============================================
+ Hits           8399     8612     +213     
+ Misses         6185     6176       -9     
- Partials        239      257      +18
Impacted Files Coverage Δ Complexity Δ
src/gbm/gbtree.cc 17.1% <ø> (-1.57%) 0 <0> (ø)
tests/cpp/tree/test_refresh.cc 100% <ø> (ø) 0 <0> (ø) ⬇️
tests/cpp/predictor/test_cpu_predictor.cc 100% <ø> (ø) 0 <0> (ø) ⬇️
tests/cpp/tree/test_quantile_hist.cc 100% <ø> (ø) 0 <0> (ø) ⬇️
tests/cpp/tree/test_prune.cc 100% <ø> (ø) 0 <0> (ø) ⬇️
src/tree/updater_prune.cc 97.67% <100%> (+6.76%) 0 <0> (ø) ⬇️
src/tree/tree_model.cc 18.14% <30.3%> (+15.28%) 0 <0> (ø) ⬇️
include/xgboost/tree_model.h 93.41% <95.31%> (+31.04%) 0 <0> (ø) ⬇️
...xgboost4j/scala/spark/XGBoostTrainingSummary.scala 35.71% <0%> (-27.93%) 2% <0%> (ø)
src/gbm/gblinear.cc 12.34% <0%> (-2.12%) 0% <0%> (ø)
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d81c48...7c76ed9. Read the comment docs.

@trivialfis
Copy link
Member

Looked briefly, I like the moving to cc file part. One minor issue is can you take care the indent of function parameters? Next time I step on the code my editor might try to correct these indent which brings some headaches.

include/xgboost/tree_model.h Outdated Show resolved Hide resolved
@RAMitchell
Copy link
Member Author

@trivialfis I'm not sure what you mean can you show an example please.

}

void RegTree::CalculateContributionsApprox(const RegTree::FVec& feat, unsigned root_id,
bst_float *out_contribs) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is incorrectly indented. I don't know how it looks like on your IDE, but please view it on Github.


// extend our decision path with a fraction of one and zero extensions
void ExtendPath(PathElement *unique_path, unsigned unique_depth,
bst_float zero_fraction, bst_float one_fraction, int feature_index) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this line.

// determine what the total permuation weight would be if
// we unwound a previous extension in the decision path
bst_float UnwoundPathSum(const PathElement *unique_path, unsigned unique_depth,
unsigned path_index) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this.


// recursive computation of SHAP values for a decision tree
void RegTree::TreeShap(const RegTree::FVec& feat, bst_float *phi,
unsigned node_index, unsigned unique_depth,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this, any many others in tree_model.h.

* \brief set the left child
* \param nid node id to right child
*/
inline void SetLeftChild(int nid) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove inline keyword.

@@ -143,7 +160,7 @@ class TreeModel {
* \param split_cond split condition
* \param default_left the default direction when feature is unknown
*/
inline void SetSplit(unsigned split_index, TSplitCond split_cond,
inline void SetSplit(unsigned split_index, SplitCondT split_cond,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline is not useful here. Any methods defined inside class is automatically inlined.

int depth = 0;
while (!nodes_[nid].IsRoot()) {
if (!pass_rchild || nodes_[nid].IsLeftChild()) ++depth;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the logic of this, so it's not used anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass_rchild is never used in the code, therefore it is always false and this condition always evaluates to true. So I have removed both the function argument pass_rchild and the if statement so the behaviour is the same but the function is simpler.

@RAMitchell RAMitchell merged commit 84c99f8 into dmlc:master Dec 18, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Mar 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants