Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks to track performance changes in 'hist' method #5126

Closed
wants to merge 1 commit into from

Conversation

SmirnovEgorRu
Copy link
Contributor

It is PR №2 from the issue #5104.
It is required to understand impact of the optimizations. I'm planning to use them for all further PRs.

@trivialfis
Copy link
Member

trivialfis commented Dec 16, 2019

@RAMitchell We staged many benchmarking scripts in external projects. I also have a collection of them with dask. I'm open to have some of those XGBoost specific scripts to be maintained in one place. WDYT?

@SmirnovEgorRu
Copy link
Contributor Author

Time by kernels after optimizations reverting collected by the benchmarks:

Data set ApplySplit EvaluateSplit BuildHist SyncHistogram Prediction Total, sec
higgs1m 36 62 156 186 3 446
airline-ohe 30 46 77 126 2 303
msrank-10k 162 244 1366 836 50 2680

Time by kernels before optimizations reverting:

Data set ApplySplit EvaluateSplit BuildHist SyncHistogram Prediction Total, sec
higgs1m 3.7 3.5 6.2 0 1.6 17.7
airline-ohe 9.0 6.1 28.8 0 0.7 64
msrank-10k 15.5 52.6 66.6 0 47.6 197

HW: c5.metal AWS instance

@RAMitchell
Copy link
Member

As this becomes more sophisticated it begs the question, should this code be inside the xgboost main repo? It has no dependency on xgboost source code, only on having some installed version of xgboost. We can just as easily run it via our CI as a separate repo.

Also how is this different from https://github.com/NVIDIA/gbm-bench? Would you get the information you need by running this? Maybe we need a more neutrally hosted version of gbm-bench.

@RAMitchell
Copy link
Member

Also, one of the problems with previous optimisations was that they caused performance regression in the distributed algorithm due to increasing the number of rabit calls. To resolve this we could run experiments with dask.

@trivialfis
Copy link
Member

@hcho3

@hcho3
Copy link
Collaborator

hcho3 commented Dec 17, 2019

@RAMitchell We can probably combine NVIDIA/gbm-bench and this pull request. For now, let us just benchmark XGBoost and not worry about other libraries (LightGBM, CatBoost etc). And as you mentioned, we should definitely test distributed training.

@tqchen Can I have admin right over https://github.com/dmlc/xgboost-bench ? This seems perfect for hosting benchmark script.

@hcho3
Copy link
Collaborator

hcho3 commented Dec 19, 2019

@dmlc/xgboost-committer https://github.com/dmlc/xgboost-bench is now public. All committers of XGBoost should have push access to it.

@hcho3
Copy link
Collaborator

hcho3 commented Dec 19, 2019

Closing this PR now. I will move this PR code to https://github.com/dmlc/xgboost-bench

@hcho3 hcho3 closed this Dec 19, 2019
@hcho3
Copy link
Collaborator

hcho3 commented Dec 19, 2019

@SmirnovEgorRu I moved your benchmark code to xgboost-bench repo: dmlc/xgboost-bench@c787a59

@lock lock bot locked as resolved and limited conversation to collaborators Mar 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants