-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XGBoost gpu_hist running slower than hist (on Higgs dataset and benchmark_tree.py) #5888
Comments
9 minutes is something went really wrong. I'm expecting 9 seconds. Could you provide a complete script for running on higgs? |
I see that you are using CLI, will try to reproduce. |
Just a note that K80 is an older architecture, support may be removed soon. |
If I want to do a fair comparison between gpu hist and hist, what would be a good way to do it? As in for hist how many threads should be given and for gpu hist how many GPUs? Or if I want to do a single GPU gpu hist run, then what should be the setup for cpu hist? |
Is it the case that since gpu_hist is designed to work on multi GPUs where each GPU processes a subset of training instances, so on a single GPU, it divides the data into chunks and processes the chunks sequentially? Because of this, the desired speedup expected on gpu_hist is not observed on a single GPU? |
No, see my perf: #5926 That's what I'm expecting. Not sure about issue here. |
I can reproduce the issue on CLI. Will investigate. |
@trivialfis Did you have a change to look at this? If not, can you tell me how you reproduced the issue? |
No longer reproduce with higgs. |
Hi community,
I was performing GPU experiments on XGBoost C++ binary (v1.1.0) on single NVIDIA Tesla K80 GPU. There are 2 experiments I performed:
Using tree method hist, (default nthread taken)
command: xgboost empty.conf tree_method=hist booster=gbtree task=train 'train_path=data/higgs/train.csv?format=csv&label_column=0' num_round=50 max_depth=5 learning_rate=0.1 min_child_weight=1.0 reg_alpha=0 reg_lambda=0 in_split_loss=0 objective=binary:logistic model_out=higgs/cpu_latest.model
Boosting rounds (snippet of 10 rounds):
[21:31:00] [0]
[21:31:00] [1]
[21:31:01] [2]
[21:31:01] [3]
[21:31:02] [4]
[21:31:02] [5]
[21:31:03] [6]
[21:31:03] [7]
[21:31:03] [8]
[21:31:04] [9]
[21:31:04] [10]
Total time for 50 boosting rounds = 21 seconds
Using tree method gpu_hist,(default nthread taken)
command: xgboost empty.conf tree_method=gpu_hist booster=gbtree task=train 'train_path=data/higgs/train.csv?format=csv&label_column=0' num_round=50 max_depth=3 learning_rate=0.1 min_child_weight=1.0 reg_alpha=0 reg_lambda=0 in_split_loss=0 objective=binary:logistic model_out=higgs/gpu_latest.model
Boosting rounds (snippet of 10 rounds):
[21:32:09] [0]
[21:32:23] [1]
[21:32:36] [2]
[21:32:49] [3]
[21:33:03] [4]
[21:33:16] [5]
[21:33:30] [6]
[21:33:43] [7]
[21:33:56] [8]
[21:34:11] [9]
[21:34:24] [10]
Total time for 50 boosting rounds = 9 mins 43 sec
As you can observe hist is taking lesser time per boosting round compared to gpu_hist.
hist: 20 boosting rounds: Train Time: 31.67970037460327 seconds
gpu_hist: 20 boosting rounds: Train Time: 3.3874778747558594 seconds
Here gpu_hist tree method is running faster.
I wanted to know why on Higgs dataset is hist method running faster and on benchmark gpu_hist faster? Is it that the GPU is getting under-utilised in some way because of which boosting ends up taking more time?
I checked the gpu_hist paper there in the comparison on Higgs dataset, gpu_hist is performing faster compared to hist. But the comparison done there is between hist method on 64 CPU cores and gpu hist on 8 GPUs. So is the experiment I ran slower because of execution on a single GPU?
Also this issue might be linked to issue: #3315
The text was updated successfully, but these errors were encountered: