LinearDRIV effect vs effect_interval shape inconsistency #687

fverac · 2022-10-10T16:53:58Z

Output array shapes are inconsistent for LinearDRIV.effect() and LinearDRIV.effect_interval()[0]. Should make these consistent.

Seems to occur when one of either Y, T, or Z is an array instead of a vector.

Repro:

import numpy as np
import pandas as pd
from econml.iv.dr import LinearDRIV

df = pd.DataFrame(np.random.normal(size = (100, 6)))

Y = df[[0]]
T = df[[1]]
X = df[[2,3,4]]
Z = df[[5]]

est = LinearDRIV().fit(Y=Y, T=T, Z=Z, X=X)

eff = est.effect(X)
lb, ub = est.effect_interval(X)

print(eff.shape)
print(lb.shape)

The text was updated successfully, but these errors were encountered:

ssemov · 2024-08-18T14:33:09Z

Hi @kbattocchi, I would like to take this issue.

ssemov · 2024-08-27T19:51:42Z

I believe the issue is with the _BaseDRIVModelFinal class. For example, I can replicate the shape inconsistency with SparseLinearDRIV too.

I avoid the shape inconsistency issue, if I modify the predict method in _BaseDRIVModelFinal as below.

This works because it makes eff = self.const_marginal_effect(X) with dimensions (m,1) instead of (m,1,1). Then the effect method returns an output array with dimensions (m,).

@kbattocchi and @fverac does this look like a reasonable approach?

One concern I have is that this conditional check might miss some T and Y dimension configurations of the _BaseDRIV class though I wasn't able to reproduce any.

        X = self._transform_X(X, fitting=False)
        # handle df input dimensions
        if self.d_y == (1,) and self.d_t == (1,):
            return self._model_final.predict(X).reshape((-1,) + self.d_y + self.d_t).reshape(-1,1)
        else:
            return self._model_final.predict(X).reshape((-1,) + self.d_y + self.d_t)

fverac · 2024-08-27T21:17:42Z

I believe the issue is with the _BaseDRIVModelFinal class. For example, I can replicate the shape inconsistency with SparseLinearDRIV too.

I avoid the shape inconsistency issue, if I modify the predict method in _BaseDRIVModelFinal as below.

This works because it makes eff = self.const_marginal_effect(X) with dimensions (m,1) instead of (m,1,1). Then the effect method returns an output array with dimensions (m,).

@kbattocchi and @fverac does this look like a reasonable approach?

One concern I have is that this conditional check might miss some T and Y dimension configurations of the _BaseDRIV class though I wasn't able to reproduce any.
        X = self._transform_X(X, fitting=False)
        # handle df input dimensions
        if self.d_y == (1,) and self.d_t == (1,):
            return self._model_final.predict(X).reshape((-1,) + self.d_y + self.d_t).reshape(-1,1)
        else:
            return self._model_final.predict(X).reshape((-1,) + self.d_y + self.d_t)

@ssemov Thanks for volunteering to look into this. It might help to outline the intended shapes of these methods:

.effect should return a matrix of shape (n_obs, d_y). Each element in the .effect_interval tuple should also be of the same shape.

.constant_marginal_effect should return a matrix of shape (n_obs, d_y, d_f_t) where d_f_t==d_t is there is no treatment featurizer (as in our repro). Each element in the .constant_marginal_effect_interval tuple should also be of the same shape.

.marginal_effect should return a matrix of shape (n_obs, d_y, d_t). Each element in the .marginal_effect_interval tuple should also be of the same shape.

Judging from the above specification, I would say that in the original repro, the shape of eff is correct in that it is (100, 1) i.e. (n_obs, d_y) and that the shape of lb is what needs to be aligned to match eff, rather than the other way around.

I think your fix makes eff become shape (100,) for the original repro code, which is not aligned with our specification. Moving forward, I would recommend looking for a fix that corrects/updates the shape output of LinearDRIV's effect_interval, rather than the shape of .effect. Probably somewhere in the inference object that LinearDRIV uses.

Hope this helps!

P.S. extended the repro code a little to demonstrate more

df = pd.DataFrame(np.random.normal(size = (100, 6)))

Y = df[[0]]
T = df[[1]]
X = df[[2,3,4]]
Z = df[[5]]

est = LinearDRIV().fit(Y=Y, T=T, Z=Z, X=X)

eff = est.effect(X)
cme = est.const_marginal_effect(X)
me = est.marginal_effect(T, X)

lb, ub = est.effect_interval(X)
cme_lb, cme_ub = est.const_marginal_effect_interval(X)
me_lb, me_ub = est.marginal_effect_interval(T, X)


print('effect shapes. all as expected')
print(eff.shape)
print(cme.shape)
print(me.shape)

print('interval lower bound shapes: Note that the effect lower bound shape is the only one inconsistent with the "point estimate" equivalent')
print(lb.shape)
print(cme_lb.shape)
print(me_lb.shape)

ssemov · 2024-09-04T19:37:58Z

Thanks, @fverac! This is super helpful. I looked into your suggestion, and indeed the error is with the inference object LinearDRIV uses.

LinearDRIV uses the StatsModelsInference class, which inherits its effect_interval method from LinearModelFinalInference.

That outputs a vector of dimensions (m,) instead of a matrix (m,dy). .effect has dimensions (m,dy), so we get the inconsistency.

This only happens when Y is an array. When Y is a vector (only 1 outcome), both .effect and .effect_interval have dimensions (m,).

I'll send a fix.

kbattocchi added the up for grabs Issues anyone can take ownership of label Aug 11, 2024

kbattocchi assigned kbattocchi and ssemov and unassigned kbattocchi Aug 19, 2024

ssemov mentioned this issue Sep 10, 2024

"Fixes linear DRIV shape inconsistency" #920

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LinearDRIV effect vs effect_interval shape inconsistency #687

LinearDRIV effect vs effect_interval shape inconsistency #687

fverac commented Oct 10, 2022

ssemov commented Aug 18, 2024

ssemov commented Aug 27, 2024

fverac commented Aug 27, 2024

ssemov commented Sep 4, 2024 •

edited

Loading

LinearDRIV effect vs effect_interval shape inconsistency #687

LinearDRIV effect vs effect_interval shape inconsistency #687

Comments

fverac commented Oct 10, 2022

ssemov commented Aug 18, 2024

ssemov commented Aug 27, 2024

fverac commented Aug 27, 2024

ssemov commented Sep 4, 2024 • edited Loading

ssemov commented Sep 4, 2024 •

edited

Loading