You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When setting col_sample_rate to a value lower than 1.0 for an H2O XGBoost model, no sampling appears to happen. This is confirmed by:
When col_sample_rate is the only parameter that triggers a stochastic process and the seed is fixed, models with different values for col_sample_rate (e.g 0.3, 0.5, 0.7, etc) are exactly the same
When col_sample_rate is the only parameter that triggers a stochastic process (e.g. for a value of 0.5), models with different seeds are exactly the same
The value for col_sample_rate is logged correctly in Flow (also in the set of native XGBoost parameters), and it shows correctly in the MOJO file (see attachments). I did not notice this bug in earlier versions of H2O, but I have no way to verify this at the moment. I have verified that this problem shows up with different datasets. I verified that the parameter col_sample_rate_per_treedoes seem to work as intended.
Veronika Maurerová commented: I created this new Jira ticket based on the [https://h2oai.atlassian.net/browse/PUBDEV-8266|https://h2oai.atlassian.net/browse/PUBDEV-8266|smart-link] , where we solved a different problem than [~accountid:5d9dc9eb87dd6f0dcb4d4d98] reported.
In this ticket I will add an error when someone wants to set tree_method=”approx” and some column sampling parameters together.
Jira Issue: PUBDEV-8368
Assignee: Veronika Maurerová
Reporter: Mathijs de Jong
State: Resolved
Fix Version: 3.34.0.4
Attachments: N/A
Development PRs: Available
When setting
col_sample_rate
to a value lower than 1.0 for an H2O XGBoost model, no sampling appears to happen. This is confirmed by:col_sample_rate
is the only parameter that triggers a stochastic process and the seed is fixed, models with different values forcol_sample_rate
(e.g 0.3, 0.5, 0.7, etc) are exactly the samecol_sample_rate
is the only parameter that triggers a stochastic process (e.g. for a value of 0.5), models with different seeds are exactly the sameThe value for
col_sample_rate
is logged correctly in Flow (also in the set of native XGBoost parameters), and it shows correctly in the MOJO file (see attachments). I did not notice this bug in earlier versions of H2O, but I have no way to verify this at the moment. I have verified that this problem shows up with different datasets. I verified that the parametercol_sample_rate_per_tree
does seem to work as intended.After investigation, the bug on the XGBoost side was found: [https://github.com/dmlc/xgboost/issues/7244|https://github.com/dmlc/xgboost/issues/7244|smart-link]
The text was updated successfully, but these errors were encountered: