-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support CPU input for device QuantileDMatrix
.
#8136
Conversation
7794124
to
3bf9af7
Compare
src/data/ellpack_page.cu
Outdated
auto r_end = d_row_ptr[ridx + 1]; | ||
size_t rsize = r_end - r_begin; | ||
|
||
if (ifeature >= rsize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you set the null values here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Let me try to merge the kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged.
src/data/iterative_dmatrix.h
Outdated
* - The CPU format and the GPU format are different, the former uses a CSR + CSC for | ||
* histogram index while the latter uses only Ellpack. This results into a design that | ||
* we can obtain the GPU format from CPU but not the other way around since we can't | ||
* recover the CSC from Ellpack. More concretely, if users want to construct a CPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the problem with obtaining CSC from ellpack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to get the feature index for each element from ellpack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's encoded in the bin number, which are the values in Ellpack. If the dataset is sparse you would have to look this up with binary search I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't think of that. Yup, we can recover the feature index by binary searching the cut values. Will update the comment and leave it for a different PR for copying data from GPU to CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the note. Will work on the other direction of conversion.
- Copy `GHistIndexMatrix` to `Ellpack` when needed.
afd281f
to
43ae4ad
Compare
GHistIndexMatrix
toEllpack
when needed.