Column sampling not working with approximate tree building #7244

mathijs02 · 2021-09-17T15:05:09Z

Hi,

I ran into a potential bug when using column sampling with approximate tree building.

What is the problem?

When I set tree_method to approx and I use column sampling by setting either colsample_bylevel or colsample_bynode to < 1.0, no column sampling seems to take place. Sampling with colsample_bytree and subsample do work with approximate tree building, and all sampling methods work when tree_method is set to hist or exact.

What did I try?

To reproduce the issue, I trained two xgboost models with the same parameters but different seeds on a dataset of 100 rows and 5 columns with random floats. When sampling is applied by setting any of the sampling rates to < 1.0, I excepted the predictions from these two models to be different. I indeed received two different sets of predictions for all combinations of sampling and tree_method, with the exception of colsample_bylevel or colsample_bynode when tree_method='approx'.

This unexpected behaviour / bug was found through the H2O implementation of xgboost, but was reproduced on different machines with native xgboost:
https://h2oai.atlassian.net/browse/PUBDEV-8266

Environment

I used native xgboost 1.4.2 with both Python 3.7.7 and 3.8.5, on both macOS and linux.

The text was updated successfully, but these errors were encountered:

trivialfis mentioned this issue Sep 17, 2021

Rewrite approx #7214

Merged

7 tasks

trivialfis added the feature-request label Sep 22, 2021

trivialfis closed this as completed in #7214 Jan 10, 2022

This was referenced May 11, 2023

XGBoost colsample_bylevel (col_sample_rate) not working with tree_method="approx" h2oai/h2o-3#7290

Closed

Check sampling aliases are set correctly in H2O XGBoost h2oai/h2o-3#8458

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column sampling not working with approximate tree building #7244

Column sampling not working with approximate tree building #7244

mathijs02 commented Sep 17, 2021 •

edited

Loading

Column sampling not working with approximate tree building #7244

Column sampling not working with approximate tree building #7244

Comments

mathijs02 commented Sep 17, 2021 • edited Loading

What is the problem?

What did I try?

Environment

mathijs02 commented Sep 17, 2021 •

edited

Loading