Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run prediction on histogram index. #5319

Closed
wants to merge 1 commit into from

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Feb 17, 2020

  • Unify GPU Predict tree node. (merged)
  • Add wrapper for ellpack matrix in GPU Predictor. (merged)
  • Add PageExists to DMatrix for testing whether gradient index is available. (merged)
  • Move CPU histogram index into DMatrix (external memory is still not supported).
  • Add predict on CPU gradient index.

@trivialfis
Copy link
Member Author

@RAMitchell Your last PR made the SparsePage a value instead of a pointer, this way it will always exist. This PR runs predict on Ellpack only when SparsePage is not created to avoid performance drop using normal DMatrix.

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to gpu_predictor look great.

@@ -493,13 +505,33 @@ class DMatrix {
virtual BatchSet<CSCPage> GetColumnBatches() = 0;
virtual BatchSet<SortedCSCPage> GetSortedColumnBatches() = 0;
virtual BatchSet<EllpackPage> GetEllpackBatches(const BatchParam& param) = 0;
virtual BatchSet<GradientIndexPage> GetGradientIndexBatches(const BatchParam& param) = 0;

virtual bool EllpackExists() const = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to avoid adding these virtual private methods and make PageExists() virtual instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RAMitchell Not entirely sure how to do that right now. Maybe add a enum for each type of page?

include/xgboost/data.h Show resolved Hide resolved
src/common/hist_util.h Outdated Show resolved Hide resolved
src/common/quantile.h Outdated Show resolved Hide resolved
src/data/gradient_index_source.h Show resolved Hide resolved
src/data/simple_dmatrix.cc Outdated Show resolved Hide resolved
src/predictor/cpu_predictor.cc Outdated Show resolved Hide resolved

if (precise && dmat->PageExists<SparsePage>()) {
do_predict_sparse_page();
} else if (!dmat->PageExists<EllpackPage>()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure this EllpackPage exists at the time of the first prediction call, otherwise the SparsePage will be created on device in the first iteration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the part where I need to know the plan for DMatrix initialization.

@trivialfis
Copy link
Member Author

trivialfis commented Feb 19, 2020

@RAMitchell

Why would predictions from the quantised matrix be different from the SparsePage?

This is quite subtle. Here is an explanation I found from simple examples:

  • The input value would have exact same values as the tree node split condition. When traversing the tree we use fvalue < split_cond, which is false because they are exactly the same value. Hence it will prefer right child.

  • Shifting value by an Eps helps, but doesn't solve the problem.

Here is a dataset with normal distribution I generated, with shape of 8 * 4:

[[ 1.76405235  0.40015721  0.97873798  2.2408932 ]
 [ 1.86755799 -0.97727788  0.95008842 -0.15135721]
 [-0.10321885  0.4105985   0.14404357  1.45427351]
 [ 0.76103773  0.12167502  0.44386323  0.33367433]
 [ 1.49407907 -0.20515826  0.3130677  -0.85409574]
 [-2.55298982  0.6536186   0.8644362  -0.74216502]
 [ 2.26975462 -1.45436567  0.04575852 -0.18718385]
 [ 1.53277921  1.46935877  0.15494743  0.37816252]]

The cuts for first column are (max_bin=2):

1.53277921676635742, 4.53951930999755859, 

The cut values combined with min_values cover the whole range of this column, but min_value will not be used in prediction (should we use it?). The last row will be assigned to second bin as the sorted version of first column is:

[-2.55298982,
 -0.10321885,
 0.76103773,
 1.49407907,
 1.53277921,
 1.76405235,
 1.86755799,
 2.26975462]

and 1.53277921 is at second half. So it will get 4.53925 as cut value and hence feature value. But if you are using original data, it's exactly the same as cut point value 1.53... If we shift the value by Eps or change < to <=, we will again obtain different result.

@RAMitchell
Copy link
Member

@trivialfis I don't follow. If we have a split condition f0 < 1.53277921 the row containing 1.53277921 will go right no matter if predicting from quantised data or floating point data.

@trivialfis
Copy link
Member Author

trivialfis commented Feb 19, 2020

In the above example, 0.76103773 should go left, but as it's assigned to the bin of 1.532 so it goes right

@RAMitchell
Copy link
Member

I see. Any element assigned to bin 0 must have value < 1.532 based on the way the bins are constructed, so if we return floating point value of 1.532 for bin 0 this means we would need to use the <= condition for splits, but that is not consistent with prediction directly from floating point data. I think the solution is to return the cut point less than 1.532, in this case the min value.

Maybe you can produce a few test cases verifying these scenarios. Fundamentally I still think there should be no difference between quantile prediction and normal prediction, assuming the tree has been constructed on the same quantile cuts.

* Move CPU histogram index into DMatrix (external memory is still not supported).
* Add predict on CPU gradient index.
* Check is missing when getting global index.
@trivialfis
Copy link
Member Author

The feature is basically done. Waiting for #5351 to be resolved.

@trivialfis
Copy link
Member Author

trivialfis commented Mar 13, 2020

@SmirnovEgorRu @ShvetsKS Hi, this PR is used to move the histogram index into DMatrix object, and prepare for in-place sketching so that we can prevent copying data into CSR for hist and gpu_hist algorithm. The resulting memory usage reduction is expected to be significant. The GPU version is already merged, where histogram index is stored in an Ellpack matrix. Also part of in-place sketching on GPU is merged in #5365 . Now we are working on user interface. There's an issue #5351 waiting to be resolved for merging this one. Feel free to take a look into this PR when time allows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants