Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support for monotonic constraints? #14

Closed
alexvorobiev opened this issue Oct 18, 2016 · 21 comments
Closed

[Feature] Support for monotonic constraints? #14

alexvorobiev opened this issue Oct 18, 2016 · 21 comments

Comments

@alexvorobiev
Copy link

Are you planning support for monotonic constraints? See e.g. here dmlc/xgboost#1514

@chivee
Copy link
Collaborator

chivee commented Oct 18, 2016

I'm pasting the snippets for the monotonic constraints here

IF (split is a continuous variable and monotonic)

THEN take average of left and right child nodes if current split is used

IF monotonic increasing THEN CHECK left average <= right average

IF monotonic decreasing THEN CHECK left average >= right average

@alexvorobiev , do you have referable papers for this features?

@alexvorobiev
Copy link
Author

@chivee I only have the reference to the R GBM package https://cran.r-project.org/package=gbm

@chivee
Copy link
Collaborator

chivee commented Oct 19, 2016

@alexvorobiev , thanks for your sharing. I'm trying to get the idea behind this method.

@guolinke guolinke changed the title Support for monotonic constraints? [Feature] Support for monotonic constraints? Oct 19, 2016
@AbdealiLoKo
Copy link

Note that the given pseudo code only ensures the split to be in the correct order and not the whole model as a later split could lead the model to be non monotonic

@aldanor
Copy link

aldanor commented Jan 20, 2018

Any thoughts on this?

@mayer79
Copy link
Contributor

mayer79 commented Jan 21, 2018

From practical perspective (outside kaggle-world!), this feature would be extremely helpful in many applications where reasonable model behavior is relevant.

@aldanor
Copy link

aldanor commented Jan 22, 2018

@guolinke Would you be able to advise how to approach this and whether it's feasible? I.e., where should it belong, would it be sufficient to implement it just somewhere in feature_histogram.hpp? I guess FeatureMetainfo could just contain the -1/0/1 constraint then.

Here's the meat of the implementation in XGBoost, for reference: https://github.com/dmlc/xgboost/blob/master/src/tree/param.h#L422 -- all of it pretty much contained in CalcSplitGain(), plus CalcWeight(). Where would stuff like this go in LightGBM?

@guolinke
Copy link
Collaborator

@aldanor
I don't know the details about the monotonic constraints.
What is the idea? And why it is needed?

following may is useful:

The split gain calculation: https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L291-L297

The leaf-output calculation:
https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L305-L308

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jan 22, 2018

@guolinke I may add some links here about the implementation in XGBoost:
https://xgboost.readthedocs.io/en/latest//tutorials/monotonic.html
dmlc/xgboost#1514
dmlc/xgboost#1516

@aldanor
Copy link

aldanor commented Jan 23, 2018

@aldanor
I don't know the details about the monotonic constraints.
What is the idea? And why it is needed?

@guolinke Monotonic constraints may be a very important requirement for the resulting models. For many reasons: e.g., as noted above, there could be domain knowledge that must be respected - e.g., in insurance and risk management problems.

How about we all cooperate and make this work?

@guolinke
Copy link
Collaborator

@aldanor very cool, would like to work together with it.

@guolinke
Copy link
Collaborator

guolinke commented Jan 23, 2018

It seems the MC(Monotonic constraints) could be cumulative, that is, if both model A and B is MC, then A+B is MC.
So we only need to enable MC in decision tree learning.

combine @chivee 's pseudo code and @AbdealiJK 's suggestion.

I think the algorithm is:


min_value = node.min_value
max_value = node.max_value

check(min_value <= split.left_output) 
check(min_value <= split.right_output)
check(max_value >= split.left_otput)
check(max_value >= split.right_output)
mid = (split.left_output + split.right_output) / 2;

if (split.feature is monotonic increasing) {
  check(split.left_output <= split.right_output)
  node.left_child.set_max_value(mid)
  node.right_child.set_min_value(mid)
}
if (split.feature is monotonic decreasing ) {
  check(split.left_output >= split.right_output)
  node.left_child.set_min_value(mid)
  node.right_child.set_max_value(mid)
}

@guolinke
Copy link
Collaborator

@aldanor would you like to create a PR first ? I can provide my help in the PR.

@aldanor
Copy link

aldanor commented Jan 23, 2018

@guolinke I will give it a try, yep. Your suggested algorithm in the snippet above looks fine, that's kind of what like xgboost does (in exact mode though, not histogram; do you think there would be any complications here because of binning?)

Where would this code belong then, treelearner/feature_histogram.hpp? (I still have to read through most of the code).

Edit: what do you mean by check(...) here? E.g., if (!(...)) { return; }?

@guolinke
Copy link
Collaborator

guolinke commented Jan 23, 2018

@aldanor
The check means return gain with -inf if didn't meet the condition, as a result, that split will not be chosen.
I think there is not different for the MC in binned algorithm.

We need to update the calculation of gain: https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L354-L357 and https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L415-L418 .

We may need to wrap these to a new function, and implement both non-constraint and MC for them.

@guolinke
Copy link
Collaborator

@aldanor any updates ?

@redditur
Copy link

redditur commented Mar 6, 2018

@guolinke @chivee

I would also be very interested in seeing this feature implemented in LightGBM. As aldanor stated above the Pseudo-code suggested earlier is correct and is how XGBoost implements monotonic constraints.

As such this feature should be fairly trivial to implement for someone with an intimate knowledge of the codebase.

@j-mark-hou
Copy link
Contributor

j-mark-hou commented Mar 23, 2018

< removed due to irrelevance>

@guolinke
Copy link
Collaborator

@j-mark-hou
there is one bug in your code, refer to @AbdealiJK `s comment and my algorithm below.

@j-mark-hou
Copy link
Contributor

j-mark-hou commented Mar 23, 2018

got it, I'll wait for someone with a better understanding of the codebase to implement this then.

@guolinke
Copy link
Collaborator

you can try #1314

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants