Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column sampling not working with approximate tree building #7244

Closed
mathijs02 opened this issue Sep 17, 2021 · 0 comments · Fixed by #7214
Closed

Column sampling not working with approximate tree building #7244

mathijs02 opened this issue Sep 17, 2021 · 0 comments · Fixed by #7214

Comments

@mathijs02
Copy link

mathijs02 commented Sep 17, 2021

Hi,

I ran into a potential bug when using column sampling with approximate tree building.

What is the problem?

When I set tree_method to approx and I use column sampling by setting either colsample_bylevel or colsample_bynode to < 1.0, no column sampling seems to take place. Sampling with colsample_bytree and subsample do work with approximate tree building, and all sampling methods work when tree_method is set to hist or exact.

What did I try?

To reproduce the issue, I trained two xgboost models with the same parameters but different seeds on a dataset of 100 rows and 5 columns with random floats. When sampling is applied by setting any of the sampling rates to < 1.0, I excepted the predictions from these two models to be different. I indeed received two different sets of predictions for all combinations of sampling and tree_method, with the exception of colsample_bylevel or colsample_bynode when tree_method='approx'.

This unexpected behaviour / bug was found through the H2O implementation of xgboost, but was reproduced on different machines with native xgboost:
https://h2oai.atlassian.net/browse/PUBDEV-8266

Environment

I used native xgboost 1.4.2 with both Python 3.7.7 and 3.8.5, on both macOS and linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants