-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve discreteness handling, allow binary outcomes #816
Conversation
…t=False Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: fverac <fabiovera@microsoft.com>
Signed-off-by: fverac <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: fverac <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks great. I've suggested a few minor changes.
Related to this PR, I think lines 52-57 of econml/dml/dml.py should be made more robust - what happens if the target is discrete and there is no predict_proba method, or target is not discrete and there is a predict_proba method? I think in either case, we should at least warn the user that they are passing a classifier where a regressor is expected or vice versa. When there is no predict_proba method but one is expected, there doesn't seem to be much harm in falling back to calling predict instead as long as the user is warned; in the opposite scenario it's less clear to me that calling predict_proba instead of just calling predict as usual is a good idea, but at least if we warn the user they can change the discreteness and get that behavior if they want.
Whatever we decide here, if there's a non-trivial amount of logic we do something similar with other estimators that don't use this first stage wrapper; probably that should happen in a new utility method that this module and others can all use consistently (perhaps something like get_prediction(estimator, expected_discrete)
).
Adding a warning when first stage is discrete target but model does not have predict_proba, |
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
87ccacf
to
0757d39
Compare
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good; I've made a couple of minor suggestion but you can merge as soon as you are comfortable with it.
econml/_ortho_learner.py
Outdated
if len(self.outcome_transformer.classes_) > 2: | ||
raise AttributeError( | ||
"More than 2 outcome classes detected. This method currently only supports binary outcomes") | ||
f"({self.outcome_transformer.classes_} outcome classes detected. \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're including the classes themselves rather than their count here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
econml/utilities.py
Outdated
@@ -1482,3 +1482,29 @@ def jacify_featurizer(featurizer): | |||
a function for calculating the jacobian | |||
""" | |||
return _TransformerWrapper(featurizer) | |||
|
|||
|
|||
def single_strata_from_discrete_arrays(arrs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] The singular for strata is stratum, so this name seems slightly weird. I think strata_from_discrete_arrays
(since this gets the strata for all of the rows at once) is shorter and just as clear.
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Signed-off-by: Fabio Vera <fabiovera@microsoft.com>
Adds a binary_outcome keyword arg to most estimators, where if True then the outcome nuisance model will be a classifier. Additionally add constraints to ensure nuisance model discreteness is handled appropriately by the user. If a nuisance model has a continuous target but a classifier is passed, then will raise an AttributeError. Conversely, if a nuisance model has a discrete target but a regressor is passed, then a warning is issued.
Adds a binary_outcome keyword arg to most estimators, where if True then the outcome nuisance model will be a classifier.