-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Obtain CSR matrix from DMatrix. #8269
Conversation
* Obtain CSR matrix from DMatrix. * Obtain gradient index from Quantile DMatrix. This is mostly for testing at higher level. Right now we rely on training a booster to infer that the DMatrix is correctly constructed. With this PR, we can ease the testing process. Also, this has been requested before. The return value from Quantile DMatrix is histogram index instead of cut values as we shift the cut value to include min_values, which is not very useful for language bindings. Unlike most of XGBoost C functions, caller of C API is required to allocate the memory itself instead of using thread local memory from XGBoost. This is to avoid allocating a huge memory buffer that can not be freed until exiting the thread. External memory is not supported.
include/xgboost/c_api.h
Outdated
XGB_DLL int XGDMatrixNumNonMissing(DMatrixHandle handle, bst_ulong *out); | ||
|
||
/*! | ||
* \brief Get the predictors from DMatrix as CSR matrix. If this is a quantized DMatrix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be useful to return quantised float values instead of integers?
We could return values such that bst.predict(quantile_dmat) == bst.predict(xgb.DMatrix(quantile_dmat.get_data()))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can. That's a good argument. I skipped restoring the values because it contains artificially created min and max values, which might be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the return value to be cut values along with the addition of suggested tests.
@RAMitchell Please take another look when you are available. |
This is mostly for testing at a higher level. Right now we rely on training a booster to infer that the DMatrix is correctly constructed. With this PR, we can ease the testing process. Also, this has been requested before.
The return value from Quantile DMatrix is histogram index instead of cut values as we shift the cut value to include
min_values
, which is not very useful for language bindings.Unlike most of XGBoost C functions, the caller of C API is required to allocate the memory itself instead of using thread local memory from XGBoost. This is to avoid allocating a huge memory buffer that can not be freed until exiting the thread.
External memory is not supported.
Close #4759 .