-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization of EvaluateSplit function #5138
Conversation
This looks much better than before, also PR is about the right size. I will leave others to comment on algorithm correctness. |
Looks like errors in CI contain are not related to my changes: |
Please rebase it to master branch or merge the latest commits. |
a7e49bd
to
ec65580
Compare
@hcho3, here is a breakdown for Airline data set, now performance for EvaluateSplit is similar to how it was before reverting:
|
src/tree/updater_quantile_hist.cc
Outdated
this->EvaluateSplit(0, gmat, hist_, *p_fmat, *p_tree); | ||
qexpand_loss_guided_->push(ExpandEntry(0, p_tree->GetDepth(0), | ||
snode_[0].best.loss_chg, timestamp++)); | ||
ExpandEntry node(0, p_tree->GetDepth(0), snode_[0].best.loss_chg, timestamp++); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The depth of root node is determined I believe. Please help removing it as legacy support for root_index
is now removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed 0 to ExpandEntry::kRootNid const, if I correctly understand the comment.
Will run some bench on my own today. For other reviews I will leave it up to @hcho3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your valuable contribution. In particular, I appreciate the introduction of new abstraction BlockedSpace2d
and ParallelFor2d
. It improved code legibility tremendously. I have some minor comments, but otherwise LGTM.
@trivialfis Can we merge this now? |
Will try merging it today. |
Running benchmarks. |
@SmirnovEgorRu Thanks for the optimization! |
EvaluateSplit was optimized by processing many tree nodes in the same time.
Foe details look at the issue #5104
To implement this ParallelFor2d was introduced to support nested parallelism. This function will be used for optimizations of other functions BuildHist and ApplySplit.
In previous implementation the main source of code complexity was blocking for nested parallelism. Now common code for this was aggregated and moved in common threading utils. So, further PRs should be more light-weight.
I will provide performance numbers a bit later.
CC: @hcho3, @RAMitchell, @trivialfis