You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into a potential bug when using column sampling with approximate tree building.
What is the problem?
When I set tree_method to approx and I use column sampling by setting either colsample_bylevel or colsample_bynode to < 1.0, no column sampling seems to take place. Sampling with colsample_bytree and subsample do work with approximate tree building, and all sampling methods work when tree_method is set to hist or exact.
What did I try?
To reproduce the issue, I trained two xgboost models with the same parameters but different seeds on a dataset of 100 rows and 5 columns with random floats. When sampling is applied by setting any of the sampling rates to < 1.0, I excepted the predictions from these two models to be different. I indeed received two different sets of predictions for all combinations of sampling and tree_method, with the exception of colsample_bylevel or colsample_bynode when tree_method='approx'.
This unexpected behaviour / bug was found through the H2O implementation of xgboost, but was reproduced on different machines with native xgboost: https://h2oai.atlassian.net/browse/PUBDEV-8266
Environment
I used native xgboost 1.4.2 with both Python 3.7.7 and 3.8.5, on both macOS and linux.
The text was updated successfully, but these errors were encountered:
Hi,
I ran into a potential bug when using column sampling with approximate tree building.
What is the problem?
When I set
tree_method
toapprox
and I use column sampling by setting eithercolsample_bylevel
orcolsample_bynode
to < 1.0, no column sampling seems to take place. Sampling withcolsample_bytree
andsubsample
do work with approximate tree building, and all sampling methods work whentree_method
is set tohist
orexact
.What did I try?
To reproduce the issue, I trained two xgboost models with the same parameters but different seeds on a dataset of 100 rows and 5 columns with random floats. When sampling is applied by setting any of the sampling rates to < 1.0, I excepted the predictions from these two models to be different. I indeed received two different sets of predictions for all combinations of sampling and
tree_method
, with the exception ofcolsample_bylevel
orcolsample_bynode
whentree_method='approx'
.This unexpected behaviour / bug was found through the H2O implementation of xgboost, but was reproduced on different machines with native xgboost:
https://h2oai.atlassian.net/browse/PUBDEV-8266
Environment
I used native xgboost 1.4.2 with both Python 3.7.7 and 3.8.5, on both macOS and linux.
The text was updated successfully, but these errors were encountered: