-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support hist
in the partition builder under column split
#9120
Conversation
@@ -218,7 +232,27 @@ class PartitionBuilder { | |||
} | |||
} | |||
} else { | |||
LOG(FATAL) << "Column data split is only supported for the `approx` tree method"; | |||
auto pred_hist = [&](auto ridx, auto bin_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically, is there any difference from
xgboost/src/common/partition_builder.h
Line 126 in 07b2d5a
auto pred_hist = [&](auto ridx, auto bin_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they are the same, but since it's a closure, can't be easily reused. Maybe we should extract into a standalone function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The partitioner is becoming less and less manageable now. I don't have any good suggestions on how to refactor it at the moment and can approve the PR as-is. But would be great if there's a way to simplify/modularize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to separate out the column-split logic into its own class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe next time. Currently, we have total 8 combinations for tree_method, categorical, column-split
, along with these, we also need to take multi-output into account. After these combinations, we have type dispatching for histogram bins and dense/sparse/dense-with-missing. So ... how many cases there are?
I'm trying to think of a way to make the code easier to understand and write a test suite specifically for partition builder.
@trivialfis looks like the CI failure is unrelated. Can this be merged now? Thanks! |
I have restarted all the tests. |
First step in supporting column split for
hist
.