Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change type for GradStats to avoid conversion for hist method #5523

Closed
wants to merge 1 commit into from

Conversation

ShvetsKS
Copy link
Contributor

@ShvetsKS ShvetsKS commented Apr 13, 2020

This PR is supposed to change internal type of GradStats to avoid conversion for hist method. Also bug related to numerical instability in NeedReplace method.

Accuracy ~ the same as was with double type, and ex. for mnist data set was even improved:

mnist log-loss
Master 0.07304085
This PR 0.07301824

Similar changes were provided in PR:#4529 but full scope of changes was reverted in #5008.

Performance improvements (1.5x for BuildHist):

santander full train InitData BuildHist SyncHist PredictRaw
Master 179.71 47.24 58.01 14 46.48
This PR 162.94 47.51 38.12 8.08 54.7
santander full train BuildHist SyncHist ApplySplit
Master 78.511 34.82 23.29 1.28
This PR 62.92 22.98 9.89 1.14

@trivialfis
Copy link
Member

trivialfis commented Apr 13, 2020

I don't think we can simply merge this PR. I did some experiments with Exact tree method with float32 on covertype data set (can be obtained from scikit-learn dataset module), the accuracy changed a lot.

One suggestion is to use GradientPair instead of GradStat, where you have the flexibility of defining underlying types.

You can check my implementation here: #5460 updater_exact.cc . The type is used very consistently (guaranteed no-casting by C++ type GradientT).

@ShvetsKS ShvetsKS force-pushed the gradstat_change_type_d branch 5 times, most recently from 0438212 to 36e0034 Compare April 16, 2020 07:46
@@ -407,7 +407,60 @@ class GHistIndexBlockMatrix {
* for that particular bin
* Uses global bin id so as to represent all features simultaneously
*/
using GHistRow = Span<tree::GradStats>;

struct GradStatHist {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will stop you from continuing the process. ;-) Please avoid copying code, use template instead if applicable. Also why not GradientPair?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GradientPair was used. Thanks :)

@ShvetsKS ShvetsKS force-pushed the gradstat_change_type_d branch 2 times, most recently from 69ec691 to db311a5 Compare April 29, 2020 18:01
@ShvetsKS
Copy link
Contributor Author

@trivialfis sorry, not clearly why I have error:
python-package/xgboost/core.py:1591: error (E1121, too-many-function-args, Booster.predict) Too many positional arguments for method call.

Local lint checks for changed files were successful.

@ShvetsKS ShvetsKS force-pushed the gradstat_change_type_d branch 2 times, most recently from d434bb9 to ddb786b Compare April 30, 2020 09:29
@ShvetsKS ShvetsKS changed the title [WIP] Change type for GradStats to avoid conversion for hist method Change type for GradStats to avoid conversion for hist method Apr 30, 2020
@ShvetsKS ShvetsKS force-pushed the gradstat_change_type_d branch from 3ba5362 to 6793b03 Compare April 30, 2020 16:44
@SmirnovEgorRu
Copy link
Contributor

@ShvetsKS, @trivialfis,
My prev experience in Gradient Boosting dev in DAAL - single precision numbers lead to overflow for some corner cases with very large dims. In the same time it improves single-node perf for data sets with large amount of features + halves the communication cost for multi-node version.

Current XGB GPU impl has single_precision_histogram parameter, we can extend this for usage in CPU impl also. It will be consistent approach accross the library and we will eliminate the implementation-specific parameter. Keep it false by default.
What do you think?

@trivialfis
Copy link
Member

Sounds good to me. On GPU the difference is very noticeable.

@SmirnovEgorRu
Copy link
Contributor

@trivialfis, thank you.
@ShvetsKS, could you, please, rework the PR with proposed changes? Also, a update of the documentation is required, I suppose.

@RAMitchell
Copy link
Member

Optional single precision support is a great idea.

@ShvetsKS
Copy link
Contributor Author

ShvetsKS commented May 6, 2020

@SmirnovEgorRu to be honest I expected the request of this changes :)
Work will be continued in #5624.

@ShvetsKS ShvetsKS closed this May 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants