-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run prediction on histogram index. #5319
Conversation
1c2408c
to
983bd79
Compare
@RAMitchell Your last PR made the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to gpu_predictor look great.
@@ -493,13 +505,33 @@ class DMatrix { | |||
virtual BatchSet<CSCPage> GetColumnBatches() = 0; | |||
virtual BatchSet<SortedCSCPage> GetSortedColumnBatches() = 0; | |||
virtual BatchSet<EllpackPage> GetEllpackBatches(const BatchParam& param) = 0; | |||
virtual BatchSet<GradientIndexPage> GetGradientIndexBatches(const BatchParam& param) = 0; | |||
|
|||
virtual bool EllpackExists() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to avoid adding these virtual private methods and make PageExists() virtual instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RAMitchell Not entirely sure how to do that right now. Maybe add a enum for each type of page?
src/predictor/gpu_predictor.cu
Outdated
|
||
if (precise && dmat->PageExists<SparsePage>()) { | ||
do_predict_sparse_page(); | ||
} else if (!dmat->PageExists<EllpackPage>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make sure this EllpackPage exists at the time of the first prediction call, otherwise the SparsePage will be created on device in the first iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the part where I need to know the plan for DMatrix initialization.
This is quite subtle. Here is an explanation I found from simple examples:
Here is a dataset with normal distribution I generated, with shape of
The cuts for first column are (max_bin=2):
The cut values combined with
and |
eae2ed6
to
89041a4
Compare
@trivialfis I don't follow. If we have a split condition |
In the above example, 0.76103773 should go left, but as it's assigned to the bin of 1.532 so it goes right |
I see. Any element assigned to bin 0 must have value < 1.532 based on the way the bins are constructed, so if we return floating point value of Maybe you can produce a few test cases verifying these scenarios. Fundamentally I still think there should be no difference between quantile prediction and normal prediction, assuming the tree has been constructed on the same quantile cuts. |
* Move CPU histogram index into DMatrix (external memory is still not supported). * Add predict on CPU gradient index. * Check is missing when getting global index.
89041a4
to
a2b4328
Compare
The feature is basically done. Waiting for #5351 to be resolved. |
@SmirnovEgorRu @ShvetsKS Hi, this PR is used to move the histogram index into |
ellpack matrix
in GPU Predictor. (merged)PageExists
toDMatrix
for testing whether gradient index is available. (merged)