Skip to content

Commit

Permalink
Bug Fix: Corrected the definition of natural direct and indirect effe…
Browse files Browse the repository at this point in the history
…ct (#211)

* corrected the definition of nde and nie

* undoing a change to retain support for frontoor
  • Loading branch information
amit-sharma authored Dec 12, 2020
1 parent f20d8d4 commit 014f6ea
Show file tree
Hide file tree
Showing 3 changed files with 71 additions and 62 deletions.
60 changes: 31 additions & 29 deletions docs/source/example_notebooks/dowhy_mediation_analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
" \n",
"from dowhy import CausalModel\n",
"import dowhy.datasets\n",
"\n",
Expand All @@ -42,12 +42,12 @@
"name": "stdout",
"output_type": "stream",
"text": [
" FD0 W0 v0 y\n",
"0 5.282186 -0.888827 1.082915 3.229273\n",
"1 -3.783050 1.175773 -0.678321 -1.485859\n",
"2 -0.372423 -0.170469 0.023981 -0.561856\n",
"3 -7.833069 -1.658670 -1.691441 -9.154887\n",
"4 -3.602776 -3.500774 -1.070929 -8.273203\n"
" FD0 W0 v0 y\n",
"0 1.661677 0.635219 0.720002 7.712011\n",
"1 0.859131 -0.922282 -1.202562 -1.613252\n",
"2 -3.239856 -0.529720 -0.938528 -11.898849\n",
"3 -1.646743 -1.118923 -1.896495 -9.831622\n",
"4 -1.763967 0.305422 -1.590754 -3.820278\n"
]
}
],
Expand Down Expand Up @@ -113,8 +113,9 @@
"## Step 2: Identifying the natural direct and indirect effects\n",
"We use the `estimand_type` argument to specify that the target estimand should be for a **natural direct effect** or the **natural indirect effect**. For definitions, see [Interpretation and Identification of Causal Mediation](https://ftp.cs.ucla.edu/pub/stat_ser/r389-imai-etal-commentary-r421-reprint.pdf) by Judea Pearl.\n",
"\n",
"Natural direct effect: Effect due to the path v0->y\n",
"Natural indirect effect: Effece due to the path v0->FD0->y (mediated by FD0)."
"**Natural direct effect**: Effect due to the path v0->y\n",
"\n",
"**Natural indirect effect**: Effect due to the path v0->FD0->y (mediated by FD0)."
]
},
{
Expand Down Expand Up @@ -142,7 +143,7 @@
"### Estimand : 1\n",
"Estimand name: mediation\n",
"Estimand expression:\n",
"Expectation(Derivative(y, [FD0])*Derivative([FD0], [v0]))\n",
"Expectation(Derivative(y|FD0, [v0]))\n",
"Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.\n",
"Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)\n",
"Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→{y} then P(y|FD0, v0, U) = P(y|FD0, v0)\n",
Expand Down Expand Up @@ -182,7 +183,7 @@
"### Estimand : 1\n",
"Estimand name: mediation\n",
"Estimand expression:\n",
"\n",
"Expectation(Derivative(y, [FD0])*Derivative([FD0], [v0]))\n",
"Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.\n",
"Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)\n",
"Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→{y} then P(y|FD0, v0, U) = P(y|FD0, v0)\n",
Expand All @@ -204,11 +205,10 @@
"## Step 3: Estimation of the effect\n",
"Currently only two stage linear regression is supported for estimation. We plan to add a non-parametric Monte Carlo method soon as described in [Imai, Keele and Yamamoto (2010)](https://projecteuclid.org/euclid.ss/1280841733).\n",
"\n",
"#### Natural Indirect Effect\n",
"The estimator converts the mediation effect estimation to a series of backdoor effect estimations. \n",
"1. The first-stage model estimates the effect from treatment (v0) to the mediator (FD0).\n",
"2. The second-stage model estimates the effect from mediator (FD0) to the outcome (Y).\n",
"\n",
"For estimating the natural indirect effect, there is also an additional second-stage model that estimates the effect of treatment on the outcome, conditioned on the mediator. It assumes the same model as given for for the `second_stage_model` parameter."
"2. The second-stage model estimates the effect from mediator (FD0) to the outcome (Y)."
]
},
{
Expand All @@ -223,7 +223,7 @@
"INFO:dowhy.causal_estimator:INFO: Using Two Stage Regression Estimator\n",
"INFO:dowhy.causal_estimator:b: FD0~v0+W0\n",
"INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator\n",
"INFO:dowhy.causal_estimator:b: y~FD0+W0+v0\n",
"INFO:dowhy.causal_estimator:b: y~FD0+v0+W0\n",
"INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator\n"
]
},
Expand All @@ -234,7 +234,7 @@
"*** Causal Estimate ***\n",
"\n",
"## Identified estimand\n",
"Estimand type: nonparametric-nde\n",
"Estimand type: nonparametric-nie\n",
"\n",
"### Estimand : 1\n",
"Estimand name: mediation\n",
Expand All @@ -245,18 +245,18 @@
"Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→{y} then P(y|FD0, v0, U) = P(y|FD0, v0)\n",
"\n",
"## Realized estimand\n",
"(b: FD0~v0+W0) * (b: y~FD0+W0+v0)\n",
"(b: FD0~v0+W0)*(b: y~FD0+v0+W0)\n",
"Target units: ate\n",
"\n",
"## Estimate\n",
"Mean value: 3.6829216959906774\n",
"Mean value: 1.1930272534225996\n",
"\n"
]
}
],
"source": [
"import dowhy.causal_estimators.linear_regression_estimator\n",
"causal_estimate_nde = model.estimate_effect(identified_estimand_nde,\n",
"causal_estimate_nde = model.estimate_effect(identified_estimand_nie,\n",
" method_name=\"mediation.two_stage_regression\",\n",
" confidence_intervals=False,\n",
" test_significance=False,\n",
Expand All @@ -272,7 +272,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the value equals the true value of the natural direct effect (up to random noise). "
"Note that the value equals the true value of the natural indirect effect (up to random noise). "
]
},
{
Expand All @@ -284,7 +284,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"3.6829216959906774 3.67848696823473\n"
"1.1930272534225996 1.1789881278328593\n"
]
}
],
Expand All @@ -296,8 +296,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The parameter is called ate because in the simulated dataset, the indirect effect is set to be zero. \n",
"Now let us check whether the indirect effect estimator returns the (correct) estimate of zero."
"The parameter is called `ate` because in the simulated dataset, the direct effect is set to be zero. \n",
"\n",
"#### Natural Direct Effect\n",
"Now let us check whether the direct effect estimator returns the (correct) estimate of zero."
]
},
{
Expand All @@ -312,7 +314,7 @@
"INFO:dowhy.causal_estimator:INFO: Using Two Stage Regression Estimator\n",
"INFO:dowhy.causal_estimator:b: FD0~v0+W0\n",
"INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator\n",
"INFO:dowhy.causal_estimator:b: y~FD0+W0+v0\n",
"INFO:dowhy.causal_estimator:b: y~FD0+v0+W0\n",
"INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator\n",
"INFO:dowhy.causal_estimator:b: y~v0+W0\n",
"INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator\n"
Expand All @@ -325,28 +327,28 @@
"*** Causal Estimate ***\n",
"\n",
"## Identified estimand\n",
"Estimand type: nonparametric-nie\n",
"Estimand type: nonparametric-nde\n",
"\n",
"### Estimand : 1\n",
"Estimand name: mediation\n",
"Estimand expression:\n",
"\n",
"Expectation(Derivative(y|FD0, [v0]))\n",
"Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.\n",
"Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)\n",
"Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→{y} then P(y|FD0, v0, U) = P(y|FD0, v0)\n",
"\n",
"## Realized estimand\n",
"b: y~v0+W0-(b: FD0~v0+W0) * (b: y~FD0+W0+v0)\n",
"(b: y~v0+W0) - ((b: FD0~v0+W0)*(b: y~FD0+v0+W0))\n",
"Target units: ate\n",
"\n",
"## Estimate\n",
"Mean value: -0.0006802376984835767\n",
"Mean value: 9.684579195123888e-05\n",
"\n"
]
}
],
"source": [
"causal_estimate_nie = model.estimate_effect(identified_estimand_nie,\n",
"causal_estimate_nie = model.estimate_effect(identified_estimand_nde,\n",
" method_name=\"mediation.two_stage_regression\",\n",
" confidence_intervals=False,\n",
" test_significance=False,\n",
Expand Down
54 changes: 27 additions & 27 deletions dowhy/causal_estimators/two_stage_regression_estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,16 +91,16 @@ def _estimate_effect(self):
elif self._target_estimand.identifier_method=="mediation":
modified_target_estimand.outcome_variable = parse_state(self._mediators_names)

first_stage_estimate = self.first_stage_model(self._data,
first_stage_estimate = self.first_stage_model(self._data,
modified_target_estimand,
self._treatment_name,
parse_state(modified_target_estimand.outcome_variable),
control_value=self._control_value,
parse_state(modified_target_estimand.outcome_variable),
control_value=self._control_value,
treatment_value=self._treatment_value,
test_significance=self._significance_test,
test_significance=self._significance_test,
evaluate_effect_strength=self._effect_strength_eval,
confidence_intervals = self._confidence_intervals,
target_units=self._target_units,
target_units=self._target_units,
effect_modifiers=self._effect_modifier_names,
params=self.method_params)._estimate_effect()

Expand All @@ -113,45 +113,45 @@ def _estimate_effect(self):
elif self._target_estimand.identifier_method=="mediation":
modified_target_estimand.treatment_variable = parse_state(self._mediators_names)

second_stage_estimate = self.second_stage_model(self._data,
second_stage_estimate = self.second_stage_model(self._data,
modified_target_estimand,
parse_state(modified_target_estimand.treatment_variable),
parse_state(modified_target_estimand.treatment_variable),
self._outcome_name,
control_value=self._control_value,
control_value=self._control_value,
treatment_value=self._treatment_value,
test_significance=self._significance_test,
test_significance=self._significance_test,
evaluate_effect_strength=self._effect_strength_eval,
confidence_intervals = self._confidence_intervals,
target_units=self._target_units,
target_units=self._target_units,
effect_modifiers=self._effect_modifier_names,
params=self.method_params)._estimate_effect()
# Combining the two estimates
natural_direct_effect = first_stage_estimate.value * second_stage_estimate.value
estimate_value = natural_direct_effect
natural_indirect_effect = first_stage_estimate.value * second_stage_estimate.value
# This same estimate is valid for frontdoor as well as mediation (NIE)
estimate_value = natural_indirect_effect
self.symbolic_estimator = self.construct_symbolic_estimator(
first_stage_estimate.realized_estimand_expr,
second_stage_estimate.realized_estimand_expr,
estimand_type=CausalIdentifier.NONPARAMETRIC_NDE)

if self._target_estimand.estimand_type == CausalIdentifier.NONPARAMETRIC_NIE:
second_stage_estimate.realized_estimand_expr,
estimand_type=CausalIdentifier.NONPARAMETRIC_NIE)
if self._target_estimand.estimand_type == CausalIdentifier.NONPARAMETRIC_NDE:
# Total effect of treatment
modified_target_estimand = copy.deepcopy(self._target_estimand)
modified_target_estimand.identifier_method="backdoor"

total_effect_estimate = self.second_stage_model(self._data,
total_effect_estimate = self.second_stage_model(self._data,
modified_target_estimand,
self._treatment_name,
self._outcome_name,
control_value=self._control_value,
control_value=self._control_value,
treatment_value=self._treatment_value,
test_significance=self._significance_test,
test_significance=self._significance_test,
evaluate_effect_strength=self._effect_strength_eval,
confidence_intervals = self._confidence_intervals,
target_units=self._target_units,
target_units=self._target_units,
effect_modifiers=self._effect_modifier_names,
params=self.method_params)._estimate_effect()
natural_indirect_effect = total_effect_estimate.value - natural_direct_effect
estimate_value = natural_indirect_effect
natural_direct_effect = total_effect_estimate.value - natural_indirect_effect
estimate_value = natural_direct_effect
self.symbolic_estimator = self.construct_symbolic_estimator(
first_stage_estimate.realized_estimand_expr,
second_stage_estimate.realized_estimand_expr,
Expand Down Expand Up @@ -194,9 +194,9 @@ def build_first_stage_features(self):

def construct_symbolic_estimator(self, first_stage_symbolic,
second_stage_symbolic, total_effect_symbolic=None, estimand_type=None):
nde_symbolic = "(" + first_stage_symbolic + ") * (" + second_stage_symbolic + ")"
if estimand_type == CausalIdentifier.NONPARAMETRIC_NDE:
return nde_symbolic
elif estimand_type == CausalIdentifier.NONPARAMETRIC_NIE:
return total_effect_symbolic + "-" + nde_symbolic
nie_symbolic = "(" + first_stage_symbolic + ")*(" + second_stage_symbolic + ")"
if estimand_type == CausalIdentifier.NONPARAMETRIC_NIE:
return nie_symbolic
elif estimand_type == CausalIdentifier.NONPARAMETRIC_NDE:
return "(" + total_effect_symbolic + ") - (" + nie_symbolic + ")"

19 changes: 13 additions & 6 deletions dowhy/causal_identifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ def identify_ate_effect(self):
)
return estimand

def identify_nde_effect(self):
def identify_nie_effect(self):
estimands_dict = {}
### 1. FIRST DOING BACKDOOR IDENTIFICATION
# First, checking if there are any valid backdoor adjustment sets
Expand Down Expand Up @@ -178,7 +178,7 @@ def identify_nde_effect(self):
)
return estimand

def identify_nie_effect(self):
def identify_nde_effect(self):
estimands_dict = {}
### 1. FIRST DOING BACKDOOR IDENTIFICATION
# First, checking if there are any valid backdoor adjustment sets
Expand Down Expand Up @@ -449,7 +449,6 @@ def construct_backdoor_estimand(self, estimand_type, treatment_name,
sym_mu = sp.Symbol("mu")
sym_sigma = sp.Symbol("sigma", positive=True)
sym_outcome = spstats.Normal(num_expr_str, sym_mu, sym_sigma)
# sym_common_causes = [sp.stats.Normal(common_cause, sym_mu, sym_sigma) for common_cause in common_causes]
sym_treatment_symbols = [sp.Symbol(t) for t in treatment_name]
sym_treatment = sp.Array(sym_treatment_symbols)
sym_conditional_outcome = spstats.Expectation(sym_outcome)
Expand Down Expand Up @@ -545,10 +544,18 @@ def construct_mediation_estimand(self, estimand_type, treatment_name,
sym_mediators = sp.Array(sym_mediators_symbols)
sym_outcome_derivative = sp.Derivative(sym_outcome, sym_mediators)
sym_treatment_derivative = sp.Derivative(sym_mediators, sym_treatment)
if estimand_type == CausalIdentifier.NONPARAMETRIC_NDE:
# For direct effect
num_expr_str = outcome_name
if len(mediators_names)>0:
num_expr_str += "|" + ",".join(mediators_names)
sym_mu = sp.Symbol("mu")
sym_sigma = sp.Symbol("sigma", positive=True)
sym_conditional_outcome = spstats.Normal(num_expr_str, sym_mu, sym_sigma)
sym_directeffect_derivative = sp.Derivative(sym_conditional_outcome, sym_treatment)
if estimand_type == CausalIdentifier.NONPARAMETRIC_NIE:
sym_effect = spstats.Expectation(sym_treatment_derivative * sym_outcome_derivative)
elif estimand_type == CausalIdentifier.NONPARAMETRIC_NIE:
sym_effect = ""
elif estimand_type == CausalIdentifier.NONPARAMETRIC_NDE:
sym_effect = spstats.Expectation(sym_directeffect_derivative)
sym_assumptions = {
"Mediation": (
"{2} intercepts (blocks) all directed paths from {0} to {1} except the path {{{0}}}\N{RIGHTWARDS ARROW}{{{1}}}."
Expand Down

0 comments on commit 014f6ea

Please sign in to comment.