Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract interaction constraint from split evaluator. #5034

Merged
merged 22 commits into from
Nov 14, 2019

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Nov 13, 2019

  • Extract interaction constraints from split evaluator.

The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win.

  • Enable inc for approx tree method.

As now the implementation is spited up from evaluator class, it's also enabled for approx method.

  • Removing obsoleted code in colmaker.

They are never documented nor actually used in real world. Also there isn't a single test for those code blocks.

  • Unifying the types used for row and column.

As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.

Related to #4732 .

* Extract interaction constraints from split evaluator.

The primary reason for doing so is that it copies the `num_feature` parameter,
which makes serialization and parameter validation difficult.  Also, as it
should be used for selecting feature, like column sampler, instead of computing
weight.

* clean up for colmaker.

Remove support for `parallel_option` and `cache_opt`.  Now we use whatever
settings that are default before this PR.  As these parameters are never
documented nor actually maintained.

* Enable for approx.
Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! This is quite a nice feature upgrade for the 'histmaker' algorithm.

@trivialfis
Copy link
Member Author

trivialfis commented Nov 14, 2019

@hcho3 @RAMitchell I enforced the row index to be uint64_t as size_t is different for 32 bit system and 64 bit system. This might change memory usage on 32 bit system, does that seem to be a reasonable change?

@trivialfis
Copy link
Member Author

Note:
On OSX:

typedef unsigned int         uint32_t;
typedef unsigned long long   uint64_t;
typedef unsigned long       __darwin_size_t;

@codecov-io
Copy link

codecov-io commented Nov 14, 2019

Codecov Report

Merging #5034 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #5034   +/-   ##
=======================================
  Coverage   71.52%   71.52%           
=======================================
  Files          11       11           
  Lines        2311     2311           
=======================================
  Hits         1653     1653           
  Misses        658      658

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2abe69d...75c2b14. Read the comment docs.

@trivialfis trivialfis merged commit 97abcc7 into dmlc:master Nov 14, 2019
@trivialfis trivialfis deleted the interaction-constraint branch November 14, 2019 12:11
@lock lock bot locked as resolved and limited conversation to collaborators Feb 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants