Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add updates to wald_test method and unit testing files #536

Merged
merged 8 commits into from
Jul 11, 2024

Conversation

Jayhyung
Copy link
Member

@Jayhyung Jayhyung commented Jul 3, 2024

Hi # @s3alfisc !

More details will follow with this PR soon, but I have one question before I proceed.

If you see the first commit of this PR, I added # type: ignore to several spots in feols_.py to avoid some error messages. I don't know why, but I encountered an incompatible data frame type errors, saying like the following;

pyfixest/estimation/feols_.py:411: error: Argument "data" to "get_cluster_df" has incompatible type "pandas.core.frame.DataFrame | polars.dataframe.frame.DataFrame"; expected "pandas.core.frame.DataFrame" [arg-type]
pyfixest/estimation/feols
.py:413: error: Argument "data" to "_check_cluster_df" has incompatible type "pandas.core.frame.DataFrame | polars.dataframe.frame.DataFrame"; expected "pandas.core.frame.DataFrame" [arg-type]
Found 2 errors in 1 file (checked 1 source file)

Fixing this issue is not hard. Just converting data dataframe into Pandas dataframe if it is in Polars fixes the issue, but I was wondering whether this is okay to do in feols_.py . I thought that any dataframe that the variable data takes is already processed to be Pandas dataframe by the dev util files dev_utils.py and 'vcov_utils.py'.
How should i deal with this?

Thanks!

Copy link

codecov bot commented Jul 3, 2024

Codecov Report

Attention: Patch coverage is 98.62069% with 2 lines in your changes missing coverage. Please review.

Files Coverage Δ
tests/test_errors.py 98.78% <100.00%> (+1.69%) ⬆️
tests/test_wald_test.py 100.00% <100.00%> (+44.00%) ⬆️
pyfixest/estimation/feols_.py 91.09% <93.93%> (ø)

... and 26 files with indirect coverage changes

@s3alfisc
Copy link
Member

s3alfisc commented Jul 3, 2024

Should I just ignore this error message and keep working on wald_test method? Thanks!

I think the alert might be raised because the vcov methods allows a DataFrameType, which can either be a pl.DataFrame or a pd.DataFrame:

. If the input is a polars DataFrame, it is never converted to pandas, hence when it reaches
self._cluster_df = _get_cluster_df(
, the type checker complains because it expects a pd.DataFrame.

I think if we apply _polars_to_pandas(data) at the beginning of the vcov method, the error might vanish? Though admittedly I am slightly puzzled why this issue arrives only now? 🤔

Anyways, If that is not it I would suggest you continue with # ignore and I try to debug this in the next days!

@Jayhyung
Copy link
Member Author

Jayhyung commented Jul 3, 2024

Yes, this certainly fixes the issue. This also puzzles me as I didn't encounter this error before ;). Stay tuned with my next update!

@s3alfisc
Copy link
Member

s3alfisc commented Jul 3, 2024

Stay tuned with my next update!

Looking forward to it! 😄

@Jayhyung
Copy link
Member Author

Jayhyung commented Jul 5, 2024

Several points should be mentioned regarding Wald test.

  1. I tried to build wald_test method as comprehensive as possible. For example, the F statistics is computed differently depending on whether users set up clusters for computing standard errors, or the target statistics can be switched to chi-square if R matrix or q vector are not identity matrix and zero vector. I think this is more natural as F test is a special case of Wald test, when R matrix or q vector satisfy certain conditions
  2. I updated testing scripts to test 1) single equation case(where R.shape[0]=1 ), 2) multiple equations case (where R.shape[0]=n ) with and without cluster robust standard error, and all the test statistics(F-statistics and p-value) are tested against R package, clubSandwich. Also, all error flags are tested in test_errors.py.

One concern that I have is that, I didn't test the case with q != 0. Only zero vector in q is allowed in clubSandwich according to my understanding(They have arg rhs(right hand side) to specify q, but setting different q vector never changes the result according to my testing.). Also, their documentation was not much informative regarding this (Link) so we may have to use different stats package to be compared with. Do you have any clue on which package is great with Walt test?

Thanks!

@s3alfisc
Copy link
Member

s3alfisc commented Jul 7, 2024

Hi @Jayhyung , thanks for the PR! I have started with a first review. Overall, this looks very good! =)

I am really surprised that clubSandwich does not allow users to vary q - I really think it is one of the best packages for Wald Testing in R. It adds a lot of very nice convenience features for users, i.e. it allows to easily create constraint matrices (as described in the vignette).

One way to test for different values of q is to test "identical" but "numerically different" hypotheses internally.

I.e. we could test that H_{0}: np.eye(3) = 1 vs H_{1}: np.eye(3) != 1 produces identical p-values to e.g. the test of H_{0}: np.eye(3) * 2 = 2 vs H_{1}: np.eye(3) * 2 != 2?

@s3alfisc
Copy link
Member

s3alfisc commented Jul 7, 2024

On the dof correction fixest employs for its wald function:

Here is the code for fixest::wald():

wald = function (x, keep = NULL, drop = NULL, print = TRUE, vcov, se, 
          cluster, ...) 
{
  check_arg(x, "class(fixest)")
  check_arg(keep, drop, "NULL character vector no na")
  if (isTRUE(x$onlyFixef)) 
    return(NA)
  dots = list(...)
  if (!isTRUE(x$summary) || !missing(vcov) || !missing(se) || 
      !missing(cluster) || ...length() > 0) {
    x = summary(x, vcov = vcov, se = se, cluster = cluster, 
                ...)
  }
  if (missing(keep) && missing(drop)) {
    drop = "\\(Intercept\\)"
  }
  coef_name = names(x$coefficients)
  coef_name = keep_apply(coef_name, keep)
  coef_name = drop_apply(coef_name, drop)
  if (length(coef_name) == 0) 
    return(NA)
  qui = names(x$coefficients) %in% coef_name
  my_coef = x$coefficients[qui]
  df1 = length(my_coef)
  df2 = x$nobs - x$nparams
  stat = drop(my_coef %*% solve(x$cov.scaled[qui, qui]) %*% 
                my_coef)/df1
  p = pf(stat, df1, df2, lower.tail = FALSE)
  vcov = attr(x$cov.scaled, "type")
  vec = list(stat = stat, p = p, df1 = df1, df2 = df2, vcov = vcov)
  if (print) {
    cat("Wald test, H0: ", ifsingle(coef_name, "", "joint "), 
        "nullity of ", enumerate_items(coef_name), "\n", 
        sep = "")
    cat(" stat = ", numberFormatNormal(stat), ", p-value ", 
        ifelse(p < 2.2e-16, "< 2.2e-16", paste0("= ", numberFormatNormal(p))), 
        ", on ", numberFormatNormal(df1), " and ", numberFormatNormal(df2), 
        " DoF, ", "VCOV: ", vcov, ".", sep = "")
    return(invisible(vec))
  }
  else {
    return(vec)
  }
}

The relevant part of the code is this one

  df1 = length(my_coef)
  df2 = x$nobs - x$nparams

nparams appears to combine the number of covariates plus the number of swiped out dummies:

library(fixest)

data(mtcars)

fit = feols(mpg ~ cyl + as.factor(hp), data = mtcars)
my_coef = fit$coefficients
df1 = length(my_coef)
df2 = fit$nobs -fit$nparams
df1 # 23
df2 # 9
fit$nparams #23 -> covariates + swiped out fixed effects


fit = feols(mpg ~ cyl | hp, data = mtcars)
my_coef = fit$coefficients
df1 = length(my_coef)
df2 = fit$nobs -fit$nparams

df1 # 1
df2 # 9
fit$nparams # 23

For pyfixest, this would then translate to

dfn = _N - _k_fe - _k
dfd = _k

which is already what is implemented =)

Interestingly, there is no special logic for clustered errors, i.e. dfd is just _k and not _G-1. To align with fixest, I think we can then delete the

        if self._is_clustered:
            self._dfd = np.min(np.array(self._G)) - 1

clause?

Additionally, would you mind adding an exact test of fixest::wald() against our implementation @Jayhyung? Note that despite being called wald, it only ever runs an F-Test =)

@s3alfisc
Copy link
Member

s3alfisc commented Jul 7, 2024

Oh, and of course there is another way to test against clubSandwich by simply transforming the null!

Suppose we want to test the null of

$$ R' \hat{\beta} - d = 0 $$

with $d \in \mathbb{R}^{k}$.

Then we can define

$$
R'= (R, -I_{k} )

with $I_{k}$ being the identity matrix, and

$$ \hat{\beta} = (\beta, d)^{T} $$

and then we can simply test the null of $R' \beta' = 0$.

@Jayhyung
Copy link
Member Author

Jayhyung commented Jul 9, 2024

Thanks for the comment!

  1. About testing different q vector and augmenting R as R=(R,-I_{k}), it still eludes me even though I read the vignette . First, the vignette only shows the cases with linear combination of specified coefficients being 0. Second, Wald_test in clubSandwich package outputs error flags if the number of columns of R matrix is not the same with the number of estimated coefficients(this includes constant term). Hmm, I'm not sure but we can still test q! = 0 cases using wald from fixest. If there is no objection, I will proceed in this direction. Please, let me know if anything! (I may reach out the author of the vignette regarding this).
  2. Actually, the degree of freedom in the denominator(i.e. dfd) is adjusted in theclubSandwich package according whether the users specify clusters or not. The following is the snapshot from Wald_test in clubSandwich (Line 429 shows how they compute p-value):
    Here

The degree of the freedom in the denominator in the F test is set up to be J-1(if test is set up to be Naive-F), where the J is actually equal to the number of clusters, if clusters are set up in the regression specification. Also, setting up the dfd to be G-1 in the case when there are clusters actually produces the p-value that is a lot much closer to the result from clubSandwich. Theoretical foundation of this can be found in the guide ( Page 350 "A.Cluster- Robust F- tests" is relevant to the discussion. Sorry for just putting the link here, as this is the published version of this famous guide to all the empirical economists ;)).
I can just erase the part as you suggested, but the F statistics and p-value would not be adjusted to the clusters. Please let me know how you want me to proceed.

  1. Yes, we only tested F-test ;). I will add more testing cases that encompasses the wald-test.

Thanks for the comments!

@s3alfisc
Copy link
Member

The degree of the freedom in the denominator in the F test is set up to be J-1(if test is set up to be Naive-F), where the J is actually equal to the number of clusters, if clusters are set up in the regression specification. Also, setting up the dfd to be G-1 in the case when there are clusters actually produces the p-value that is a lot much closer to the result from clubSandwich. Theoretical foundation of this can be found in the guide ( Page 350 "A.Cluster- Robust F- tests" is relevant to the discussion. Sorry for just putting the link here, as this is the published version of this famous guide to all the empirical economists ;)).

Ah, great! Then let's follow what Cameron & Miller recommend. Even better if this aligns closely with the implementation in clubSandwich =)

@s3alfisc
Copy link
Member

s3alfisc commented Jul 10, 2024

Second, Wald_test in clubSandwich package outputs error flags if the number of columns of R matrix is not the same with the number of estimated coefficients(this includes constant term).

Hm, in this case, the described method indeed does not work with clubSandwich. And Feols.waldttest() also throw an error if the length of the R vector does not equal the number of coefficients, right?

Alternatively, we could use R's car::linearHypothesis function to test against, which is nice because the car package is included in R's base distribution.

linearHypothesis(model, hypothesis.matrix, rhs=NULL,
    test=c("F", "Chisq"), vcov.=NULL, 
	white.adjust=c(FALSE, TRUE, "hc3", "hc0", "hc1", "hc2", "hc4"), 
	singular.ok=FALSE, ...)

It's only available for objects of type lm, but we can provide a custom vcov matrix (it's iid by default). As this is only about testing that the offset r works correctly, it should be sufficient to only test for the iid case (given that we that for hetero and clustered vcovs against clubSandwich).

This is how it would work via rpy2:

%load_ext autoreload
%autoreload 2

import pyfixest as pf 
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri

# rpy2 imports
from rpy2.robjects.packages import importr

pandas2ri.activate()

fixest = importr("fixest")
stats = importr("stats")
broom = importr("broom")
car = importr("car")

data = pf.get_data()
fit_r = stats.lm("Y ~ X1 + X2", data = data)
R = ro.r.matrix(np.array([0, 1, 1]), nrow=1)
car.linearHypothesis(fit_r, R, 1)

@Jayhyung
Copy link
Member Author

I just added wald test cases to the testing scripts. Let me know if something extra should be done!

Copy link
Member

@s3alfisc s3alfisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks phenomenal @Jayhyung. I particularly dig the super thorough unit tests! Thank you 🎉

], "distribution must be either 'F' or 'chi2'."
if self._is_clustered:
self._dfd = np.min(np.array(self._G)) - 1
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it standard to include the fixed effects in the dof computation? I am actually not sure about this and will take a look at the fixest docs!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry for the late reply, I didn't notice your comment until now. According to my impression of clubSanwich package, the fixed effects didn't change anything related with the inference part(it was just like demeaning the whole dataset and implementing estimation on this demeaned data and then doing inference). I will take a close look at this and update if serious corrections are necessary(I will PR this if they really are)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries - this is at least what fixest seems to be doing (see the function I posted somewhere in the PR) =) I think clubSandwich is not supporting fixed effects / fixest as far as I am aware (due to difficulties computing CRV2 and CRV3 vcov's with fixed effects I believe) - so I am afraid your chances to find anything there are probably not super big :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I actually meant to say fixest. Thanks for letting me know. :)

r_fstat = pd.DataFrame(r_wald).T[1].values[0]
r_pvalue = pd.DataFrame(r_wald).T[5].values[0]

np.testing.assert_allclose(f_stat, r_fstat, rtol=1e-02, atol=1e-02)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the tests still pass if we increase the error tolerance to e.g. 1e-04? At some point I think not exactly matching the clubSandwich dof defauls might bite us?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the testing case for the regression specifications without clusters. I just tested with different error tolerance. It survives 1e-05 for relative error tolerance, but fails for 1e-03 for absolute tolerance. Here is the comparison of statistics between the two packages.

Python f_stat: 1.3298150228659409, R f_stat: 1.3220891299061177
Python p_stat: 0.2628217235735699, R p_stat: 0.2653441458766282

Reasonably close enough, but there is a bit of discrepancy. The following is the case when I set up the constant term being 0.1

Python f_stat: 15.748365910640432, R f_stat: 15.708691053689362
Python p_stat: 3.67809116497142e-10, R p_stat: 3.8949330392594636e-10

Do you think this is reasonable enough? I may take a look at this again in different PRs if it is not.

Thanks!

@s3alfisc s3alfisc merged commit ca1200e into py-econometrics:master Jul 11, 2024
7 checks passed
@Jayhyung Jayhyung deleted the dev2 branch July 11, 2024 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants