You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In general, I would expect usage of colsample_by* parameters to improve training speed (per tree), since we do not need to consider all features when evaluating splits. For my use case, however, I do not observe this.
Using tree_method=hist and grow_policy=depthwise, I see that most of the time is taken in building histograms (I profiled QuantileHistMaker::Builder::ExpandWithDepthWise, and saw that almost all of the time is spent in BuildLocalHistograms), which is actually done before sampling the feature sets for each node (in
Could performance be improved by sampling the features prior to building the histograms instead, since we then do not need to compute histograms for the unused features? If so, can we please include this as a feature request?
The text was updated successfully, but these errors were encountered:
In general, I would expect usage of colsample_by* parameters to improve training speed (per tree), since we do not need to consider all features when evaluating splits. For my use case, however, I do not observe this.
Using tree_method=hist and grow_policy=depthwise, I see that most of the time is taken in building histograms (I profiled QuantileHistMaker::Builder::ExpandWithDepthWise, and saw that almost all of the time is spent in BuildLocalHistograms), which is actually done before sampling the feature sets for each node (in
xgboost/src/tree/updater_quantile_hist.cc
Line 996 in 522b897
Could performance be improved by sampling the features prior to building the histograms instead, since we then do not need to compute histograms for the unused features? If so, can we please include this as a feature request?
The text was updated successfully, but these errors were encountered: