-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev/unified dispatching prototype #7724
Dev/unified dispatching prototype #7724
Conversation
…GPUCoordDescent LogitSerializationTest.GpuHist LogitSerializationTest.GPUCoordDescent MultiClassesSerializationTest.GpuHist MultiClassesSerializationTest.GPUCoordDescent
Thank you for working on this! I also wrote a higher level RFC #7308 for future device dispatching, which should be complementary to this PR. I will look into this in more detail later. |
@@ -31,6 +32,22 @@ struct GenericParameter : public XGBoostParameter<GenericParameter> { | |||
bool fail_on_invalid_gpu_id {false}; | |||
bool validate_parameters {false}; | |||
|
|||
/* Device dispatcher object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
} | ||
|
||
void DeviceSelector::Init(const std::string& user_input_device_selector) { | ||
int fit_position = user_input_device_selector.find(fit_.Prefix()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's appropriate that we don't distinguish between predict and fit? Whatever device the user has specified, we will use it everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, a user can configure the prediction on CPU and fitting on GPU by specifying 'predictor=cpu_predictor', right? The idea here is to provide the user a unified way for selecting devices in both fitting and prediction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will remove gpu_predictor
and gpu_hist
(hopefully in this release) as documented in #7308 . The expected result is we will have only one (global) parameter device
to control the dispatch:
with xgb.config_context(device="sycl:0"):
booster.predict(X)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see.
If you don't plan supporting different devices for fitting and prediction, this feature is inappropriate. Fortunately, it can easily be reduced to the uniform device descriptor for both stages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That avoids some internal conflicts, it's difficult to configure the states with the current design. We have been working on using https://github.com/dmlc/xgboost/blob/master/include/xgboost/generic_parameters.h as the context object for XGBoost. Maybe we can integrate the device selector in this PR with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make some progress on setting up the interface and keep you posted. Thank you for working on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @trivialfis ,
are there any progress in this direction? May be some help from our side can be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@razdoburdin I have had some experiments on this recently, the problem is distributed environments and multi-threaded environments (like python async). We need to share the device index between all workers and all threads, which needs some synchronization strategies.
We don't need any synchronization if the device id is limited to booster as a local variable. But if we were to extend it to DMatrix as well (for constructing DMatrix from various sources of data), then the issue becomes a headache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @trivialfis , have we a chance to implement such or similar concept in xgboost 2.0? Maybe you need some help in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm still planning it as a major breaking change for 2.0. Got distracted away in the 1.7 by the new pyspark interface. Expect some progress next month. Sorry for the slow update.
This is mostly complete now. #7308 (comment) |
In continuous of #5659 and #6212.
Here I present a way for dispatching the various devices (cpu/cuda device/oneapi device).
This request contains only the changes being related to all the devices. The code for oneapi devices support is planned to be added later.
The main idea of dispatching was discussed in #6212. A new global parameter called device_selector is added. This parameter determines the device where the calculations will be made as well as the specific kernel that will be executed. So if the user configures XGBoost by the following parameters:
clf = xgboost.XGBClassifier(... , objective='multi:softmax', tree_method='hist')
the cpu version of the library will be executed. But if the user add device_selector="oneapi:gpu":
clf = xgboost.XGBClassifier(... , device_selector='oneapi:gpu', objective='multi:softmax', tree_method='hist')
the specific code for oneapi GPU will be used.
For cuda the relative logic is not implemented, thus for this case it is just an alternative way for setting the gpu_id. For saving backward compatibility with the existing user code, the priority of gpu_id is made higher.
The additional feature added by this request is an independent specification of devices for fitting and prediction. If the user specifies
device_selector='fit:oneapi:gpu; predict:cpu'
, oneapi GPU will be used for fitting, and CPU will be used for prediction.