Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost colsample_bylevel (col_sample_rate) not working with tree_method="approx" #7290

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 3 comments

Comments

@exalate-issue-sync
Copy link

When setting col_sample_rate to a value lower than 1.0 for an H2O XGBoost model, no sampling appears to happen. This is confirmed by:

  • When col_sample_rate is the only parameter that triggers a stochastic process and the seed is fixed, models with different values for col_sample_rate (e.g 0.3, 0.5, 0.7, etc) are exactly the same
  • When col_sample_rate is the only parameter that triggers a stochastic process (e.g. for a value of 0.5), models with different seeds are exactly the same

The value for col_sample_rate is logged correctly in Flow (also in the set of native XGBoost parameters), and it shows correctly in the MOJO file (see attachments). I did not notice this bug in earlier versions of H2O, but I have no way to verify this at the moment. I have verified that this problem shows up with different datasets. I verified that the parameter col_sample_rate_per_tree does seem to work as intended.

After investigation, the bug on the XGBoost side was found: [https://github.com/dmlc/xgboost/issues/7244|https://github.com/dmlc/xgboost/issues/7244|smart-link]

@exalate-issue-sync
Copy link
Author

Veronika Maurerová commented: I created this new Jira ticket based on the [https://h2oai.atlassian.net/browse/PUBDEV-8266|https://h2oai.atlassian.net/browse/PUBDEV-8266|smart-link] , where we solved a different problem than [~accountid:5d9dc9eb87dd6f0dcb4d4d98] reported.

In this ticket I will add an error when someone wants to set tree_method=”approx” and some column sampling parameters together.

@h2o-ops-ro
Copy link
Collaborator

JIRA Issue Details

Jira Issue: PUBDEV-8368
Assignee: Veronika Maurerová
Reporter: Mathijs de Jong
State: Resolved
Fix Version: 3.34.0.4
Attachments: N/A
Development PRs: Available

@h2o-ops-ro
Copy link
Collaborator

Linked PRs from JIRA

#5811

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant