-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement apply() in FIL #5358
Implement apply() in FIL #5358
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only reviewed the P/Cython code: added a few suggestions, but overall LGTM.
fm = ForestInference.load( | ||
model_path, output_class=True, model_type="xgboost" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter in which using_device_type
context this is instantiated? If not, should we maybe test expected behavior if this is done wrongly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not matter, and yes, we should definitely test that.
model_path, output_class=True, model_type="xgboost" | ||
) | ||
|
||
with using_device_type(infer_device): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to ensure/test that the inference is actually performed on the correct device?
|
||
with using_device_type(infer_device): | ||
pred_leaf = fm.apply(X).astype(np.int32) | ||
expected_pred_leaf = bst.predict(xgb.DMatrix(X), pred_leaf=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless this is affected by using_device_type()
, I'd suggest to move it outside of the context. That goes for all other code where the same principle applies.
classification=True, | ||
) | ||
|
||
model_path = os.path.join(tmp_path, "xgb_class.model") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tmp_path
fixture should be a pathlib.Path
object, so this should be equivalent:
model_path = os.path.join(tmp_path, "xgb_class.model") | |
model_path = tmp_path / "xgb_class.model" |
preds | ||
If non-None, outputs will be written in-place to this array. | ||
Therefore, if given, this should be a C-major array of shape | ||
n_rows * n_trees. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You use a X
symbol in row 1254, maybe use the same here to be consistent?
n_rows * n_trees. | |
n_rows X n_trees. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good. After we changed our approach a bit for predict_per_tree
, though, I'm wondering if we can't simplify the logic a bit for apply
as well.
cpp/include/cuml/experimental/fil/detail/decision_forest_builder.hpp
Outdated
Show resolved
Hide resolved
cpp/include/cuml/experimental/fil/detail/decision_forest_builder.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really nice refactor of the previous implementation! I think we can simplify the logic for output indexing just a little further, but otherwise (assuming perf testing shakes out), this looks perfect.
cpp/include/cuml/experimental/fil/detail/decision_forest_builder.hpp
Outdated
Show resolved
Hide resolved
cpp/include/cuml/experimental/fil/detail/decision_forest_builder.hpp
Outdated
Show resolved
Hide resolved
cpp/include/cuml/experimental/fil/detail/decision_forest_builder.hpp
Outdated
Show resolved
Hide resolved
cpp/include/cuml/experimental/fil/detail/decision_forest_builder.hpp
Outdated
Show resolved
Hide resolved
categorical_data, | ||
infer_type); | ||
if (infer_type == infer_kind::leaf_id) { | ||
infer_kernel_cpu<has_categorical_nodes, true>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only adding predict_leaf
template parameter to the CPU kernel. Adding it to the GPU kernel adds too much boilerplate.
/merge |
Replaces #5307
Depends on #5365