Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require leaf statistics when expanding tree #4015

Merged
merged 2 commits into from
Jan 18, 2019

Conversation

RAMitchell
Copy link
Member

This PR extends the RegTree::ExpandNode() function to require summary statistics for the node being expanded as well as leaf weights for the left and right child. Calculating the leaf weights in advance for the child nodes required cacheing the left and right gradient sums from split calculation, so I have extended the SplitEntry class with this information.

There are several impacts to this change:

  • Expanding a tree node is now a safe operation. After you call ExpandNode() the tree is complete and may be used immediately with all the required statistics and leaf nodes configured.
  • As a consequence of the above, all code in tree updater algorithms that manually accesses RegTree internals such as tree nodes, statistics or leaf nodes becomes unnecessary (e.g. this). I have not remove these in this PR as it would make it too complicated, but I have verified that it is possible to remove them safely.
  • Because the gradient sums in the newly created node children are cached, we no longer need to recalculate this. This means we can remove loops such as this and this for free. Again I will leave this for a future PR.

So in summary this will collapse a lot of the complexity around configuring a decision tree and will also improve performance. This is the way I have been doing things already in the GPU algorithms so it also helps me unify the code base.

@codecov-io
Copy link

Codecov Report

Merging #4015 into master will decrease coverage by 0.12%.
The diff coverage is 25.31%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #4015      +/-   ##
============================================
- Coverage     57.31%   57.19%   -0.13%     
  Complexity      210      210              
============================================
  Files           190      190              
  Lines         15035    15065      +30     
  Branches        527      527              
============================================
- Hits           8617     8616       -1     
- Misses         6161     6192      +31     
  Partials        257      257
Impacted Files Coverage Δ Complexity Δ
src/tree/updater_histmaker.cc 2.87% <0%> (-0.04%) 0 <0> (ø)
src/tree/updater_colmaker.cc 1.55% <0%> (-0.05%) 0 <0> (ø)
src/tree/updater_skmaker.cc 2.17% <0%> (-0.08%) 0 <0> (ø)
tests/cpp/tree/test_prune.cc 100% <100%> (ø) 0 <0> (ø) ⬇️
tests/cpp/tree/test_refresh.cc 100% <100%> (ø) 0 <0> (ø) ⬇️
include/xgboost/tree_model.h 93.71% <100%> (+0.1%) 0 <0> (ø) ⬇️
tests/cpp/tree/test_param.cc 100% <100%> (ø) 0 <0> (ø) ⬇️
src/tree/updater_quantile_hist.cc 34.64% <33.33%> (-0.1%) 0 <0> (ø)
src/tree/param.h 86.43% <66.66%> (-0.08%) 0 <0> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f75a21a...2345918. Read the comment docs.

@RAMitchell
Copy link
Member Author

Ping @tqchen @hcho3

Copy link
Collaborator

@hcho3 hcho3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a comment. Can you look at it? @RAMitchell

@hcho3 hcho3 merged commit 1fc37e4 into dmlc:master Jan 18, 2019
@hcho3 hcho3 mentioned this pull request Mar 11, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Apr 18, 2019
@RAMitchell RAMitchell deleted the tree-simplify2 branch April 19, 2022 12:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants