Extract interaction constraint from split evaluator. #5034

trivialfis · 2019-11-13T08:56:49Z

Extract interaction constraints from split evaluator.

The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win.

Enable inc for approx tree method.

As now the implementation is spited up from evaluator class, it's also enabled for approx method.

Removing obsoleted code in colmaker.

They are never documented nor actually used in real world. Also there isn't a single test for those code blocks.

Unifying the types used for row and column.

As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.

Related to #4732 .

* Extract interaction constraints from split evaluator. The primary reason for doing so is that it copies the `num_feature` parameter, which makes serialization and parameter validation difficult. Also, as it should be used for selecting feature, like column sampler, instead of computing weight. * clean up for colmaker. Remove support for `parallel_option` and `cache_opt`. Now we use whatever settings that are default before this PR. As these parameters are never documented nor actually maintained. * Enable for approx.

RAMitchell

Looks good! This is quite a nice feature upgrade for the 'histmaker' algorithm.

src/tree/updater_quantile_hist.cc

trivialfis · 2019-11-14T04:18:06Z

@hcho3 @RAMitchell I enforced the row index to be uint64_t as size_t is different for 32 bit system and 64 bit system. This might change memory usage on 32 bit system, does that seem to be a reasonable change?

trivialfis · 2019-11-14T10:28:47Z

Note:
On OSX:

typedef unsigned int         uint32_t;
typedef unsigned long long   uint64_t;
typedef unsigned long       __darwin_size_t;

codecov-io · 2019-11-14T11:35:31Z

Codecov Report

Merging #5034 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5034   +/-   ##
=======================================
  Coverage   71.52%   71.52%           
=======================================
  Files          11       11           
  Lines        2311     2311           
=======================================
  Hits         1653     1653           
  Misses        658      658

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2abe69d...75c2b14. Read the comment docs.

trivialfis added 3 commits November 13, 2019 16:48

Remove the implementation in split evaluator.

3ffb893

Mention in doc.

f28efca

trivialfis requested review from hcho3 and RAMitchell November 13, 2019 08:56

trivialfis added 9 commits November 13, 2019 17:14

Remove dead code.

32a5788

Fix compilation.

72c4130

Inline the shortcuts.

c0bd97e

Restore some changes.

a465bf1

Lint.

6c6559c

Convert rest of the bst_uint.

e17d56e

Don't shortcut too much.

c72e986

Some more clean up.

10f5bbb

Amalgamation.

f4fd698

RAMitchell approved these changes Nov 14, 2019

View reviewed changes

trivialfis commented Nov 14, 2019

View reviewed changes

src/tree/updater_quantile_hist.cc Show resolved Hide resolved

src/tree/updater_quantile_hist.cc Show resolved Hide resolved

Don't enforce by static_assert.

9f933db

trivialfis added 8 commits November 14, 2019 12:40

Auto deduce the type.

c72a069

Deduce more type.

344a7a6

More restricted types.

7b39d31

Compiles even when changing bst_row_t to nonsense type.

3be4d18

`std::size_t' is evil ...

1e9a406

Indeed, so evil.

b948a32

Typo and warning.

1bbe188

Fix compilation.

7fad200

Keep wrestling with std::size_t.

75c2b14

trivialfis merged commit 97abcc7 into dmlc:master Nov 14, 2019

trivialfis deleted the interaction-constraint branch November 14, 2019 12:11

lock bot locked as resolved and limited conversation to collaborators Feb 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract interaction constraint from split evaluator. #5034

Extract interaction constraint from split evaluator. #5034

trivialfis commented Nov 13, 2019 •

edited

Loading

RAMitchell left a comment

trivialfis commented Nov 14, 2019 •

edited

Loading

trivialfis commented Nov 14, 2019

codecov-io commented Nov 14, 2019 •

edited

Loading

Extract interaction constraint from split evaluator. #5034

Extract interaction constraint from split evaluator. #5034

Conversation

trivialfis commented Nov 13, 2019 • edited Loading

RAMitchell left a comment

Choose a reason for hiding this comment

trivialfis commented Nov 14, 2019 • edited Loading

trivialfis commented Nov 14, 2019

codecov-io commented Nov 14, 2019 • edited Loading

Codecov Report

trivialfis commented Nov 13, 2019 •

edited

Loading

trivialfis commented Nov 14, 2019 •

edited

Loading

codecov-io commented Nov 14, 2019 •

edited

Loading