-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization/buildhist/colwisebuildhist #8233
Optimization/buildhist/colwisebuildhist #8233
Conversation
Merge the last changes
…dhist/colwisebuildhist
Hi,
for performance measurements I have used the master branch in the dmlc/xgboost repo.
You can easily reproduce the calculations with it.
If you need any help, you are welcome!
From: wenyi Liu ***@***.***>
Sent: Saturday, September 10, 2022 11:34 AM
To: dmlc/xgboost ***@***.***>
Cc: Razdoburdin, Dmitry ***@***.***>; Author ***@***.***>
Subject: Re: [dmlc/xgboost] Optimization/buildhist/colwisebuildhist (PR #8233)
Hi,
Your work is very interesting. I‘d like to test your optimization effect. Could you please tell me the source version of XGBoost?
|
much appreciated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the work on optimization. Some questions in the comments. Can we omit the optimization on column sampling for now and just build the histogram for all columns?
src/common/column_matrix.cc
Outdated
@@ -23,10 +23,12 @@ void ColumnMatrix::InitStorage(GHistIndexMatrix const& gmat, double sparse_thres | |||
gmat.GetFeatureCounts(feature_counts.data()); | |||
|
|||
// classify features | |||
any_sparse_column_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any_sparse_column_ = !all_dense_column
src/tree/hist/histogram.h
Outdated
@@ -16,6 +19,19 @@ | |||
|
|||
namespace xgboost { | |||
namespace tree { | |||
|
|||
struct ColSample { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using TrainParam
directly seems to be simpiler.
src/tree/hist/histogram.h
Outdated
size_t n_batches_{0}; | ||
// Whether XGBoost is running in distributed environment. | ||
bool is_distributed_{false}; | ||
// Addhoch colsample threshold level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adhoc
.
src/tree/hist/histogram.h
Outdated
const size_t n_nodes = nodes_for_explicit_hist_build.size(); | ||
CHECK_GT(n_nodes, 0); | ||
|
||
const common::ColumnMatrix& column_matrix = gidx.Transpose(); | ||
const bool column_sampling = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain the logic here? there are missing values but no sparse column in column matrix and bytree/bylevel lesser than threshold while there's no bynode? Is this necessary?
Also, this special case seems to be difficult to test.
Hi, I have removed column sampling optimization for a while. Hope it will help with a review. Thanks for the suggestion! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please try to enable some tests for each of the case? I don't want to break anything in here by accident. ;-)
…n wise buiilhist.
As far as column wise building is just another way to calculate the same result as the row wise building, I suggest to reuse the existing tests for histogram building. |
Hi,
this PR is a part of #7192. It continues the optimization of BuildHistKernel, that was started in #8218.
Here I introduce a column wise building of histogram and a dispatcher for choice between the row wise and the column wise kernels. This optimization allow to improve runtime speed for several datasets up to 2x. For performance measurements I booked the c6i.12xlarge (24 cores, hyperthreading is off) instance on on Amazon Web Services and used the benchmarks from here.
Methodology of the benchmarking is the same as in #8218. The only two datasets from the list affecting by this optimization are santander and epsilon. For this reason only them are shown in the following table:
I am looking forward for your review and comments!
upg:
I have deleted the optimization related to santander dataset for simplification of the review.