Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vasilis/casestudy #441

Merged
merged 62 commits into from
May 3, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
2d3fa56
started policy module
vasilismsr Jan 13, 2021
148104e
finished cython implementation
vasilismsr Jan 18, 2021
0abe48a
finished integration with cate interpreter. Enabled multiple treatmen…
vasilismsr Jan 18, 2021
58acc42
final touches in interpreter
vasilismsr Jan 18, 2021
ff43a1f
Merge branch 'master' into vasilis/policy
vasilismsr Jan 18, 2021
37a3635
merged with master
vasilismsr Jan 18, 2021
3afc13e
removed use('agg')
vasilismsr Jan 18, 2021
9bd27cf
added doubly robust policy learning methods
vasilismsr Jan 19, 2021
b6e3b60
fixed issues with interpreters
vasilismsr Jan 19, 2021
9500871
linting
vasilismsr Jan 19, 2021
225375a
fixd notedook
vasilismsr Jan 19, 2021
b32e857
fixed interpeer checks
vasilismsr Jan 19, 2021
0b452f5
merged master
vasilismsr Jan 19, 2021
07edcc4
fixed deprecated drlearner imports
vasilismsr Jan 19, 2021
308e315
Merge branch 'master' into vasilis/policy
vsyrgkanis Jan 20, 2021
0b2e149
added base policy
vasilismsr Jan 21, 2021
e63d5f8
Merge branch 'master' into vasilis/policy
vsyrgkanis Jan 29, 2021
709e9bb
merged with master
vasilismsr Mar 16, 2021
fb0a5ce
fixed merge bug
vasilismsr Mar 17, 2021
30a3033
cleaned notebook changes
vasilismsr Mar 17, 2021
4318743
notebook changes cleanup
vasilismsr Mar 17, 2021
9c7255c
bug in case study notebook
vasilismsr Mar 17, 2021
1d3e318
perfected coverage of cate interpreters. fixed small nan bug in corne…
vasilismsr Mar 17, 2021
c933cb9
merged policy tree and grf tree in same base class
vasilismsr Mar 18, 2021
26266a1
added tests for all policy methods for good coverage
vasilismsr Mar 19, 2021
c276bad
added more tests
vasilismsr Mar 19, 2021
c5451c0
added missing docstrings
vasilismsr Mar 19, 2021
08ceaab
fixed docstrings in interpreters
vasilismsr Mar 19, 2021
63325f0
linting
vasilismsr Mar 19, 2021
65f90a7
linting
vasilismsr Mar 19, 2021
458f84d
changed plotting of policy tree. Fxied test to not use graphviz
vasilismsr Mar 19, 2021
511cb50
made policy/forest private
vasilismsr Mar 19, 2021
1012298
fixed imports
vasilismsr Mar 19, 2021
6c2aa7e
Merge branch 'master' into vasilis/policy
vsyrgkanis Mar 19, 2021
356ec7e
removed TODO for inference. Added policy learning module to docs. fix…
vasilismsr Mar 19, 2021
0cda93a
Merge branch 'vasilis/policy' of github.com:microsoft/EconML into vas…
vasilismsr Mar 19, 2021
a190ab6
test suggestion
vasilismsr Mar 20, 2021
5c32f90
fixed NaN bug when tree has single node
vasilismsr Mar 20, 2021
b7f82be
made rscorer test more robust
vasilismsr Mar 20, 2021
b7be4a3
added dosctrings and fixed other dosctrings
vasilismsr Mar 20, 2021
a439c4d
fixed docstring in tree
vasilismsr Mar 20, 2021
4f8b57a
added policy_treatment_names method that uses the parsed treatment n…
vasilismsr Mar 21, 2021
50736f1
fixed randomness in weightedkfold, that was causing tests to fail due…
vasilismsr Mar 21, 2021
b411039
fixed docstrings
vasilismsr Mar 21, 2021
206055a
fxied passing parameter bug
vasilismsr Mar 21, 2021
3ee6439
relaxed rscorer test with an absolute deviation
vasilismsr Mar 21, 2021
1467c2d
added policy learning to multi-investment notebook
vasilismsr Mar 22, 2021
6903592
changed how aggregation happens in policy ensembles. Added policy lea…
vasilismsr Mar 25, 2021
8edbac3
merged with master
vasilismsr Mar 25, 2021
bafa036
Merge branch 'master' into vasilis/casestudy
vsyrgkanis Mar 25, 2021
7ba1b27
added clipping to the denominator in the dr correction to avoid divis…
vasilismsr Mar 29, 2021
593f912
Merge branch 'vasilis/casestudy' of github.com:microsoft/EconML into …
vasilismsr Mar 29, 2021
31da1d3
Merge branch 'master' into vasilis/casestudy
vsyrgkanis Apr 1, 2021
dec216e
Update econml/tests/test_policy_forest.py
vsyrgkanis Apr 13, 2021
0b20ed7
Update econml/policy/_forest/_forest.py
vsyrgkanis Apr 13, 2021
24c246f
Update econml/policy/_forest/_forest.py
vsyrgkanis Apr 13, 2021
eaa5fec
Update econml/policy/_drlearner.py
vsyrgkanis Apr 13, 2021
e914b66
fixed commit bags
vasilismsr Apr 14, 2021
31382f2
linting
vasilismsr Apr 14, 2021
a39fd94
added jupyter test requirement
vsyrgkanis Apr 29, 2021
e3b0a8a
added jupyter test requirement
vsyrgkanis Apr 29, 2021
78d1467
fixed linting
vsyrgkanis Apr 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion azure-pipelines-steps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ jobs:
condition: and(succeeded(), eq(variables['Agent.OS'], 'Linux'))

# Install the package
- script: 'python -m pip install --upgrade pip && pip install --upgrade setuptools wheel Cython && pip install ${{ parameters.package }}'
- script: 'pip install --upgrade setuptools wheel Cython && pip install ${{ parameters.package }}'
displayName: 'Install dependencies'

- ${{ parameters.job.steps }}
2 changes: 1 addition & 1 deletion econml/dml/causal_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ def fit(self, X, T, T_res, Y_res, sample_weight=None, freq_weight=None, sample_v
"where available.")
residuals = Y_res - np.einsum('ijk,ik->ij', oob_preds, T_res)
propensities = T - T_res
VarT = propensities * (1 - propensities)
VarT = np.clip(propensities * (1 - propensities), 1e-10, np.inf)
drpreds = oob_preds
drpreds += cross_product(residuals, T_res / VarT).reshape((-1, Y_res.shape[1], T_res.shape[1]))
drpreds[np.isnan(oob_preds)] = np.nan
Expand Down
26 changes: 23 additions & 3 deletions econml/policy/_drlearner.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from warnings import warn
import numpy as np
from sklearn.base import clone
from ..utilities import check_inputs, filter_none_kwargs
from ..utilities import check_inputs, filter_none_kwargs, check_input_arrays
from ..dr import DRLearner
from ..dr._drlearner import _ModelFinal
from .._tree_exporter import _SingleTreeExporterMixin
Expand Down Expand Up @@ -98,6 +98,24 @@ def predict_value(self, X):
"""
return self.drlearner_.const_marginal_effect(X)

def predict_proba(self, X):
""" Predict the probability of recommending each treatment

Parameters
----------
X : array-like of shape (n_samples, n_features)
The input samples.

Returns
-------
treatment_proba : array-like of shape (n_samples, n_treatments)
The probability of each treatment recommendation
"""
X, = check_input_arrays(X)
if self.drlearner_.featurizer_ is not None:
X = self.drlearner_.featurizer_.fit_transform(X)
return self.policy_model_.predict_proba(X)

def predict(self, X):
""" Get recommended treatment for each sample.

Expand All @@ -111,9 +129,11 @@ def predict(self, X):
treatment : array-like of shape (n_samples,)
The index of the recommended treatment in the same order as in categories, or in
lexicographic order if `categories='auto'`. 0 corresponds to the baseline/control treatment.
For ensemble policy models, recommended treatments are aggregated from each model in the ensemble
and the treatment that receives the most votes is returned. Use `predict_proba` to get the fraction
of models in the ensemble that recommend each treatment for each sample.
"""
values = self.predict_value(X)
return np.argmax(np.hstack([np.zeros((values.shape[0], 1)), values]), axis=1)
return np.argmax(self.predict_proba(X), axis=1)

def policy_feature_names(self, *, feature_names=None):
"""
Expand Down
44 changes: 42 additions & 2 deletions econml/policy/_forest/_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,43 @@ def predict_value(self, X):

return y_hat

def predict_proba(self, X):
""" Predict the probability of recommending each treatment

Parameters
----------
X : {array-like} of shape (n_samples, n_features)
The input samples. Internally, it will be converted to
``dtype=np.float64``.
check_input : bool, default=True
Allow to bypass several input checking.
Don't use this parameter unless you know what you do.

Returns
-------
treatment_proba : array-like of shape (n_samples, n_treatments)
The probability of each treatment recommendation
"""
check_is_fitted(self)
# Check data
X = self._validate_X_predict(X)

# Assign chunk of trees to jobs
n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)

# avoid storing the output of every estimator by summing them here
y_hat = np.zeros((X.shape[0], self.n_outputs_), dtype=np.float64)

# Parallel loop
lock = threading.Lock()
Parallel(n_jobs=n_jobs, verbose=self.verbose, require="sharedmem")(
delayed(_accumulate_prediction)(e.predict_proba, X, [y_hat], lock)
for e in self.estimators_)

y_hat /= len(self.estimators_)

return y_hat

def predict(self, X):
""" Predict the best treatment for each sample

Expand All @@ -467,6 +504,9 @@ def predict(self, X):
Returns
-------
treatment : array-like of shape (n_samples)
The recommded treatment, i.e. the treatment index with the largest reward for each sample
The recommded treatment, i.e. the treatment index most often predicted to have the highest reward
for each sample. Recommended treatments are aggregated from each tree in the ensemble and the treatment
that receives the most votes is returned. Use `predict_proba` to get the fraction of trees in the ensemble
that recommend each treatment for each sample.
"""
return np.argmax(self.predict_value(X), axis=1)
return np.argmax(self.predict_proba(X), axis=1)
24 changes: 24 additions & 0 deletions econml/policy/_forest/_tree.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,30 @@ def predict(self, X, check_input=True):
pred = self.tree_.predict(X)
return np.argmax(pred, axis=1)

def predict_proba(self, X, check_input=True):
""" Predict the probability of recommending each treatment

Parameters
----------
X : {array-like} of shape (n_samples, n_features)
The input samples. Internally, it will be converted to
``dtype=np.float64``.
check_input : bool, default=True
Allow to bypass several input checking.
Don't use this parameter unless you know what you do.

Returns
-------
treatment_proba : array-like of shape (n_samples, n_treatments)
The probability of each treatment recommendation
"""
check_is_fitted(self)
X = self._validate_X_predict(X, check_input)
pred = self.tree_.predict(X)
proba = np.zeros(pred.shape)
proba[np.arange(X.shape[0]), np.argmax(pred, axis=1)] = 1
kbattocchi marked this conversation as resolved.
Show resolved Hide resolved
return proba

def predict_value(self, X, check_input=True):
""" Predict the expected value of each treatment for each sample

Expand Down
14 changes: 14 additions & 0 deletions econml/tests/test_policy_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,8 @@ def test_non_standard_input(self,):
forest = PolicyForest(n_estimators=20, n_jobs=1, random_state=123).fit(X, y)
pred = forest.predict(X)
pred_val = forest.predict_value(X)
pred_prob = forest.predict_proba(X)
assert pred_prob.shape == (X.shape[0], 2)
feat_imp = forest.feature_importances()
forest = PolicyForest(n_estimators=20, n_jobs=1, random_state=123).fit(X.astype(np.float32),
np.asfortranarray(y))
Expand All @@ -298,12 +300,15 @@ def test_non_standard_input(self,):
forest = PolicyForest(n_estimators=20, n_jobs=1, random_state=123).fit(tuple(X), tuple(y))
np.testing.assert_allclose(pred, forest.predict(tuple(X)))
np.testing.assert_allclose(pred_val, forest.predict_value(tuple(X)))
np.testing.assert_allclose(pred_prob, forest.predict_proba(tuple(X)))
forest = PolicyForest(n_estimators=20, n_jobs=1, random_state=123).fit(list(X), list(y))
np.testing.assert_allclose(pred, forest.predict(list(X)))
np.testing.assert_allclose(pred_val, forest.predict_value(list(X)))
np.testing.assert_allclose(pred_prob, forest.predict_proba(list(X)))
forest = PolicyForest(n_estimators=20, n_jobs=1, random_state=123).fit(pd.DataFrame(X), pd.DataFrame(y))
np.testing.assert_allclose(pred, forest.predict(pd.DataFrame(X)))
np.testing.assert_allclose(pred_val, forest.predict_value(pd.DataFrame(X)))
np.testing.assert_allclose(pred_prob, forest.predict_proba(pd.DataFrame(X)))

groups = np.repeat(np.arange(X.shape[0]), 2)
Xraw = X.copy()
Expand All @@ -324,6 +329,15 @@ def test_non_standard_input(self,):
forest.predict_value(Xraw[mask]).flatten(), atol=.08)
np.testing.assert_allclose(feat_imp, forest.feature_importances(), atol=1e-4)
np.testing.assert_allclose(feat_imp, forest.feature_importances_, atol=1e-4)
pred = forest.predict(X)
pred_val = forest.predict_value(X)
pred_prob = forest.predict_proba(X)
np.testing.assert_allclose(pred, forest.predict(tuple(X)))
np.testing.assert_allclose(pred_val, forest.predict_value(tuple(X)))
np.testing.assert_allclose(pred, forest.predict(pd.DataFrame(X)))
np.testing.assert_allclose(pred_val, forest.predict_value(pd.DataFrame(X)))
np.testing.assert_allclose(pred_prob, forest.predict_proba(pd.DataFrame(X)))

return

def test_raise_exceptions(self,):
Expand Down

Large diffs are not rendered by default.

35 changes: 17 additions & 18 deletions notebooks/Policy Learning with Trees and Forests.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ tests_require =
seaborn
lightgbm
xgboost
jupyter-client <= 6.1.12

[options.extras_require]
automl =
Expand Down