-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dart inplace prediction with GPU input. #6777
Conversation
trivialfis
commented
Mar 24, 2021
•
edited
Loading
edited
- On C API there's a check for output prediction being device-readable. With dart this doesn't hold.
- Added optimization to avoid memory copying whenever possible. Will post the result later.
* Fix dart inplace predict with data on GPU, which might trigger a fatal check for device access right. * Avoid copying data whenever possible.
fb0f2f6
to
29a78c5
Compare
const auto& host_data = page.data.ConstHostVector(); | ||
dh::device_vector<Entry> sorted_entries(host_data.begin() + begin, | ||
host_data.begin() + end); | ||
dh::device_vector<Entry> sorted_entries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to avoid copying data when input is already on GPU.
@@ -92,7 +92,10 @@ class HostDeviceVectorImpl { | |||
} else { | |||
gpu_access_ = GPUAccess::kWrite; | |||
SetDevice(); | |||
thrust::fill(data_d_->begin(), data_d_->end(), v); | |||
auto s_data = dh::ToSpan(*data_d_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid synchronization.
dh::safe_cuda(cudaMemcpyAsync(entries_d.data().get(), | ||
data_vec.data() + ent_cnt_begin, | ||
n_entries * sizeof(Entry), cudaMemcpyDefault)); | ||
if (row_batch.data.DeviceCanRead()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid copying data when it's already on GPU.
100 rounds on higgs: Before: After: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand what this PR does. Is it correct to summarize it as thus:
- Current behavior: Copy the predicted scores
predts.predictions
to the host before summing them intoh_out_predts
- New behavior: Sum the predicted scores
predts.predictions
intopredts.predictions.DeviceSpan()
, keeping all the data on the device.
thrust::fill(data_d_->begin(), data_d_->end(), v); | ||
auto s_data = dh::ToSpan(*data_d_); | ||
dh::LaunchN(device_, data_d_->size(), [=]XGBOOST_DEVICE(size_t i) { | ||
s_data[i] = v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the bound-checked interface here, given that the size of data_d_
is already clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can use pointer.
Your summary is correct. |