Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor refactor of split evaluation in gpu_hist #3889

Merged
merged 3 commits into from
Nov 13, 2018

Conversation

RAMitchell
Copy link
Member

@RAMitchell RAMitchell commented Nov 10, 2018

Move split evaluation logic into each device instead of having this logic "globally". This is in preparation for a future PR where each device will propose splits based on its subset of the data, each device will vote and then the algorithm will globally calculate the optimal split for the selected feature.

Use span class in split evaluation for improved memory safety.

@RAMitchell
Copy link
Member Author

@hcho3 I am having issues with the Jenkins machines, do you know what's going on?

@hcho3
Copy link
Collaborator

hcho3 commented Nov 11, 2018

@RAMitchell Somehow auto scaling isn't working. For now, I manually provisioned workers.

@hcho3
Copy link
Collaborator

hcho3 commented Nov 11, 2018

@RAMitchell Can you do me a favor and write 1-2 sentence summary of what's being done here?

@RAMitchell
Copy link
Member Author

Thanks added a description above. I will try to be a little more verbose in future.

}

void InitRoot(RegTree* p_tree) {
void InitRoot(DMatrix* p_fmat, RegTree* p_tree) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently p_fmat is not used in InitRoot. If it will be used in the future then it's fine.

*/
void BuildHistWithSubtractionTrick(int nidx_parent, int nidx_left,
int nidx_right) {
auto smallest_nidx =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, now we have a better naming.

dh::safe_cuda(cudaSetDevice(device_id_));
auto d_split_candidates = temp_memory.GetSpan<DeviceSplitCandidate>(feature_set.Size());
DeviceNodeStats node(node_sum_gradients[nidx], nidx, param);
feature_set.Reshard(GPUSet::Range(device_id_, 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume in the future there will be one feature_set for each device and the correspondence won't change, otherwise Reshard will get us into trouble.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we will have to deal with this in the next PR. I am wondering if we can reshard it such that a copy exists on each device.

TempStorageT* temp_storage) {
__shared__ cub::Uninitialized<GradientPairSumT> uninitialized_sum;
GradientPairSumT& shared_sum = uninitialized_sum.Alias();

GradientPairSumT local_sum = GradientPairSumT();
// For loop sums features into one block size
auto begin = feature_histogram.data();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we might be able to use non-pointer iteration with this line:

SPAN_CHECK(index_ != span_->size());

change to use <,
and remove the check in this function:
XGBOOST_DEVICE SpanIterator& operator+=(difference_type n) {

The error report will be delayed to applying * operator and -> operator, but it will provide us some flexibility. WDYT?

@thvasilo
Copy link
Contributor

Hello @RAMitchell, just curious will the future PR implement the PV-Tree described here?

@RAMitchell
Copy link
Member Author

@thvasilo Yes something like that, perhaps not exactly.

@thvasilo
Copy link
Contributor

@RAMitchell that's cool I'm interested in seeing how it turns out. The PV-Tree had issues with scaling and their accuracy experiments were limited unfortunately. Doing away completely with histogram communication still seems like a big challenge.

@RAMitchell
Copy link
Member Author

@thvasilo I would be interested if you have done some experiments to suggest pv-tree is not accurate enough. My intuition was that choosing slightly suboptimal features sometimes will not affect convergence much.

@RAMitchell RAMitchell deleted the multi-gpu branch November 13, 2018 11:15
@thvasilo
Copy link
Contributor

@RAMitchell I'm mostly referring to the quality of evaluation in the paper. Only two closed-source datasets where used, and Figures 1-3 show deltas in NDCG and AUC in the order of 10^-2 to 10^-3. Whether that of difference is useful is up to the user/debatable.

For scalability see Section 5.2.1, Figure 2 is hard to interpret, a strong/weak scaling linear plot would have made more sense. In any case, just food for thought.

@lock lock bot locked as resolved and limited conversation to collaborators Feb 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants