-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Breaking] Let's make histogram method the default #7049
Conversation
Restarted the CI. There are some issues with hist, like its incomplete support for external memory. |
@trivialfis, do you prefer to keep } else if (!fmat->SingleColBlock()) {
LOG(INFO) << "Tree method is automatically set to 'hist' "
"since external-memory data matrix is used.";
tparam_.tree_method = TreeMethod::kHist;
} |
I made a similar attempt before, will look into this again. Thanks for running the comprehensive benchmark. |
Hi could you please take a look into the failing JVM tests? |
It seems like the problem still appears even if we use Most of the tests (5/7) are failing in a place like this: Here we perform training with |
Sorry for the late reply. Hey, @hcho3 @CodingCat @RAMitchell could you please join the discussion? |
I am in favor of making |
I don't have a strong feeling. But I'm migrating the approx tree method to hist's code base (for example #7079) to have uniform categorical data support. After the migration I expect the approx tree method to become much faster. |
@trivialfis Do you have a tracking issue for merging |
@hcho3 I don't. It's a mix of different refactors. I need to:
|
I can merge these items into the categorical data support tracker. |
@trivialfis So is it fair to say that there are some useful utility functions in |
There are 2 features in
That's the reason I need to migrate approx to hist's code base and make things as reusble as possible. Whatever improvement goes into hist will go into approx for free. |
Done. |
@trivialfis, @hcho3,
Another point - alignment:
Based on above - I prefer to make Unification of code for |
Thanks for the detailed explanation! First I also prefer changing the tree method, but I think we need more work than setting the parameter along. Few reasons I haven't merged this PR yet are:
But we know these aren't true in practice from the accuracy results here. For different outputs, my guess is on the difference of parameters. For
My personal preference is to remove the name Having said that, I'm looking forward to the change of the default tree method to These are my personal preference. I'm looking forward to replies. ;-) |
Please ignore the R failure for now. It's caused by stalled R cache on github action. |
Codecov Report
@@ Coverage Diff @@
## master #7049 +/- ##
=======================================
Coverage 81.60% 81.60%
=======================================
Files 13 13
Lines 3903 3903
=======================================
Hits 3185 3185
Misses 718 718 Continue to review full report at Codecov.
|
Running some tests with the latest implementation of external memory. Hopefully can narrow down the failures. |
* Changing hist border to 2^18
Let's try to take another look here As we know, We chose the best threshold based on the training time and two testing metrics on each case. It was grid-searched as the power of 2, starting from 256. We used Before the start of the training, all the datasets were randomly shuffled. Next, the first N lines from training datasets were selected for training. Full testing datasets were used for testing. The procedure was repeated for Classification task:
Regression task:
HW: The full table with all numbers can be found here |
If we can proceed with this change, let's remove the selection altogether. Just use 1 algorithm (hist) as default instead of the "auto". |
We suggest changing the default method for large datasets from
approx
tohist
as it's much faster. This way, XGBoost will perform better for those users who don't choose tree method themselves.An attempt was already made in #$5178, but the PR wasn't merged.
Here're perf measurements:
Intel(R) Xeon(R) Platinum 8280L CPU
2 sockets, 28 cores per socket, HT:on
Geometric mean for training time speedup is 5.667. We still have a slowdown on
epsilon
, and we're working on this case right now.Metrics are equal or better than with
approx
for all cases.