Change type for GradStats to avoid conversion for hist method #5523

ShvetsKS · 2020-04-13T19:08:09Z

This PR is supposed to change internal type of GradStats to avoid conversion for hist method. Also bug related to numerical instability in NeedReplace method.

Accuracy ~ the same as was with double type, and ex. for mnist data set was even improved:

mnist	log-loss
Master	0.07304085
This PR	0.07301824

Similar changes were provided in PR:#4529 but full scope of changes was reverted in #5008.

Performance improvements (1.5x for BuildHist):

santander	full train	InitData	BuildHist	SyncHist	PredictRaw
Master	179.71	47.24	58.01	14	46.48
This PR	162.94	47.51	38.12	8.08	54.7

santander	full train	BuildHist	SyncHist	ApplySplit
Master	78.511	34.82	23.29	1.28
This PR	62.92	22.98	9.89	1.14

trivialfis · 2020-04-13T19:32:48Z

I don't think we can simply merge this PR. I did some experiments with Exact tree method with float32 on covertype data set (can be obtained from scikit-learn dataset module), the accuracy changed a lot.

One suggestion is to use GradientPair instead of GradStat, where you have the flexibility of defining underlying types.

You can check my implementation here: #5460 updater_exact.cc . The type is used very consistently (guaranteed no-casting by C++ type GradientT).

trivialfis · 2020-04-16T14:18:06Z

src/common/hist_util.h

@@ -407,7 +407,60 @@ class GHistIndexBlockMatrix {
 *     for that particular bin
 *  Uses global bin id so as to represent all features simultaneously
 */
-using GHistRow = Span<tree::GradStats>;
+
+struct GradStatHist {


I will stop you from continuing the process. ;-) Please avoid copying code, use template instead if applicable. Also why not GradientPair?

GradientPair was used. Thanks :)

ShvetsKS · 2020-04-29T19:12:39Z

@trivialfis sorry, not clearly why I have error:
python-package/xgboost/core.py:1591: error (E1121, too-many-function-args, Booster.predict) Too many positional arguments for method call.

Local lint checks for changed files were successful.

SmirnovEgorRu · 2020-05-04T17:49:24Z

@ShvetsKS, @trivialfis,
My prev experience in Gradient Boosting dev in DAAL - single precision numbers lead to overflow for some corner cases with very large dims. In the same time it improves single-node perf for data sets with large amount of features + halves the communication cost for multi-node version.

Current XGB GPU impl has single_precision_histogram parameter, we can extend this for usage in CPU impl also. It will be consistent approach accross the library and we will eliminate the implementation-specific parameter. Keep it false by default.
What do you think?

trivialfis · 2020-05-04T18:25:40Z

Sounds good to me. On GPU the difference is very noticeable.

SmirnovEgorRu · 2020-05-04T19:07:29Z

@trivialfis, thank you.
@ShvetsKS, could you, please, rework the PR with proposed changes? Also, a update of the documentation is required, I suppose.

RAMitchell · 2020-05-04T23:31:14Z

Optional single precision support is a great idea.

ShvetsKS · 2020-05-06T05:29:57Z

@SmirnovEgorRu to be honest I expected the request of this changes :)
Work will be continued in #5624.

ShvetsKS force-pushed the gradstat_change_type_d branch 5 times, most recently from 0438212 to 36e0034 Compare April 16, 2020 07:46

trivialfis reviewed Apr 16, 2020

View reviewed changes

ShvetsKS force-pushed the gradstat_change_type_d branch 2 times, most recently from 69ec691 to db311a5 Compare April 29, 2020 18:01

ShvetsKS force-pushed the gradstat_change_type_d branch 2 times, most recently from d434bb9 to ddb786b Compare April 30, 2020 09:29

ShvetsKS changed the title ~~[WIP] Change type for GradStats to avoid conversion for hist method~~ Change type for GradStats to avoid conversion for hist method Apr 30, 2020

ShvetsKS force-pushed the gradstat_change_type_d branch from 3ba5362 to 6793b03 Compare April 30, 2020 16:44

fix issue with nan cmp for split

fa09751

ShvetsKS force-pushed the gradstat_change_type_d branch from 6793b03 to fa09751 Compare May 1, 2020 06:59

ShvetsKS mentioned this pull request May 1, 2020

Change type of hist buffer to float #5624

Merged

ShvetsKS closed this May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change type for GradStats to avoid conversion for hist method #5523

Change type for GradStats to avoid conversion for hist method #5523

ShvetsKS commented Apr 13, 2020 •

edited

Loading

trivialfis commented Apr 13, 2020 •

edited

Loading

trivialfis Apr 16, 2020

ShvetsKS Apr 29, 2020

ShvetsKS commented Apr 29, 2020

SmirnovEgorRu commented May 4, 2020

trivialfis commented May 4, 2020

SmirnovEgorRu commented May 4, 2020

RAMitchell commented May 4, 2020

ShvetsKS commented May 6, 2020

Change type for GradStats to avoid conversion for hist method #5523

Change type for GradStats to avoid conversion for hist method #5523

Conversation

ShvetsKS commented Apr 13, 2020 • edited Loading

trivialfis commented Apr 13, 2020 • edited Loading

trivialfis Apr 16, 2020

Choose a reason for hiding this comment

ShvetsKS Apr 29, 2020

Choose a reason for hiding this comment

ShvetsKS commented Apr 29, 2020

SmirnovEgorRu commented May 4, 2020

trivialfis commented May 4, 2020

SmirnovEgorRu commented May 4, 2020

RAMitchell commented May 4, 2020

ShvetsKS commented May 6, 2020

ShvetsKS commented Apr 13, 2020 •

edited

Loading

trivialfis commented Apr 13, 2020 •

edited

Loading