-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing hist_util #5251
Testing hist_util #5251
Conversation
b3ddadc
to
02e68d1
Compare
I added tests for eps accuracy of sketch for combinations of number of bins and input sizes. I also fixed an off by one error where for categorical data the number of histogram bins would always be one less than the unique inputs, so certain categorical values were never used. It still seems possible for summary.SetPrune() to unnecessarily remove cuts in some cases, but this is not a major issue and I'm not confident to change that function, so will leave that for another day. I also did some experiments with specialised logic for categorical splits, I don't think they have any effect on accuracy, maybe someone else can provide an example of why this is necessary? |
1c18d6b
to
9f3fca7
Compare
src/common/hist_util.h
Outdated
if (i == 2 || cpt > p_cuts_->cut_values_.back()) { | ||
p_cuts_->cut_values_.push_back(cpt); | ||
} | ||
for (size_t i = 1; i < summary.size; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hcho3 I vaguely remembered that you have some use cases for this specialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still working through a bunch of corner cases for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RAMitchell My preference is either we don't support it, or provide full blown implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specialisation is unnecessary, we can use the same code to get correct results for both categorical and continuous. After this PR, if the number of unique feature values is less than or equal to max_bin
, then each value will get its own bin and can be used for splitting. Before this PR, if the number of unique values was > 16 but less than max_bin
, categorical features would get lost.
9061c84
to
9621278
Compare
So WXQSummary.SetPrune() seems to be quite broken. Sometimes it unnecessarily removes elements from a set before it reaches capacity and my tests show incorrect distributions of elements in its quantile ranges. I replaced this with the simpler function WQSummary.SetPrune() for the hist and gpu_hist algorithms and everything is working as expected. I believe this function only gets called once towards the end of quantile calculation, so I am not expecting this to affect speed but will check with some benchmarks shortly. |
@rongou I noticed your sampling tests were quite volatile. The reason is because quantiles calculated on external memory end up slightly different compared to in memory. The fix is to set max_bins equal to the number of training rows so that the quantiles end up the same for both external memory and in memory (each row has its own bin). Your tests comparing predictions from external memory/in memory now pass to a much higher accuracy. |
@RAMitchell that's great! Thanks! |
58675d0
to
cb1b2c8
Compare
4b58811
to
648641c
Compare
648641c
to
375d37a
Compare
This PR is for adding more tests for quantile generation in hist/gpu_hist. Part of the motivation is to create some absolute tests of accuracy for GPU sketching, instead of comparing against existing CPU algorithms.
So far I have tested the following expected behaviours with respect to a single feature:
min_value
should be less than all inputsI also expected that number of cuts (excluding the minimum value) should equal the number of bins requested if the unique input size is greater than num_bins. This is true for categorical features with <= unique values but not for continuous features. This currently means that if you ask for 256 bins you get 254 actual bins, which seems like a bug. Two of the samples get lost here (
xgboost/src/common/hist_util.h
Line 208 in 472ded5
xgboost/src/common/hist_util.cc
Line 319 in 472ded5
After resolving this I also want to add tests comparing the rank of the output quantiles against the correct rank by sorting the input.
I would also like to clarify the need for specialised cut points for categorical variables, implemented here (
xgboost/src/common/hist_util.h
Line 200 in 472ded5
@hcho3 any comments would be much appreciated.