Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colsample performance when using tree_method=hist #7002

Open
karunrao97 opened this issue May 26, 2021 · 1 comment
Open

Colsample performance when using tree_method=hist #7002

karunrao97 opened this issue May 26, 2021 · 1 comment

Comments

@karunrao97
Copy link

karunrao97 commented May 26, 2021

In general, I would expect usage of colsample_by* parameters to improve training speed (per tree), since we do not need to consider all features when evaluating splits. For my use case, however, I do not observe this.

Using tree_method=hist and grow_policy=depthwise, I see that most of the time is taken in building histograms (I profiled QuantileHistMaker::Builder::ExpandWithDepthWise, and saw that almost all of the time is spent in BuildLocalHistograms), which is actually done before sampling the feature sets for each node (in

features_sets[nid_in_set] = column_sampler_.GetFeatureSet(tree.GetDepth(nid));
).

Could performance be improved by sampling the features prior to building the histograms instead, since we then do not need to compute histograms for the unused features? If so, can we please include this as a feature request?

@Denisevi4
Copy link

I'm surprised this doesn't bring attention. Seems an easy win since a lot of times people use colsample.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants