Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Westfall and Young p-values for feols #698

Closed
wants to merge 28 commits into from

Conversation

marcandre259
Copy link
Contributor

@marcandre259 marcandre259 commented Nov 6, 2024

In response to #485, I followed the algorithm given in paper in the appendix. Only difference is that we are working with t statistics instead of p-values directly

A lot of code is copied from the rwolf function. I've removed the randomization option, and kept only the wild bootstrap, because I was getting strange results with randomization. Did not deeply look into it however, so it could be a simple issue.

For testing, I wrote a private "slow" function that follows the algorithm given in the above paper quite literaly. It's not ideal, but I do not have access to Stata. I did find out about @s3alfisc experimental R package, and that could maybe also be used as a baseline?

Cheers

@marcandre259
Copy link
Contributor Author

pre-commit.ci autofix

@s3alfisc
Copy link
Member

s3alfisc commented Nov 6, 2024

Very cool @marcandre259! I'll review tomorrow after work. Will also check if we can test against my r package - I recall not being happy with some simulation results I had seen, will try to piece together why I never properly released it. Re stata - the problem is that both the wyoung and rwolf implementations are very slow, so in order to get results matching at even the first digit behind the comma, we'd have to wait quite a long time. And then if results don't match, we'd have to increase the number of iterations in stata, and wait even longer... Though if we prepare tests, we could ask @leostimpfle to make use of his license and help us with stata results? 😀

@s3alfisc
Copy link
Member

s3alfisc commented Nov 7, 2024

Hi @marcandre259 , the implementation looks good to me, though I am sceptical about the use of t-stats over pvalues. Though I think it is fine as long as the t-stats are rank preserving? I would prefer to use p-values though, as it will help users to follow the implementation if we link to the stata journal paper.

Beyond that, the implementation looks very good & I think the algorithm is implemented correctly =)

I have two other suggestions:

a) I would like to add support for randomization inference, just as in the rwolf method; and

b) what do you think about economizing a bit between the rwolf and wyoung functions? I think that both functions could share all of their code except for this part:

        t_stats[i] = wildboot_res_df["t value"]
        boot_t_stats[:, i] = bootstrapped_t_stats

    pval = _get_wyoung_pval(t_stats, boot_t_stats)

For example, we could define a function

def _multcomp_resample(
   type: str, 
    models: list[Union[Feols, Fepois]],
    param: str,
    reps: int,
    seed: int,
) -> pd.DataFrame:

    models = _post_processing_input_checks(models)
    all_model_stats = pd.DataFrame()
    full_enumeration = False

    S = 0
    for model in models:
        if param not in model._coefnames:
            raise ValueError(
                f"Parameter '{param}' not found in the model {model._fml}."
            )

        if model._is_clustered:
            # model._G a list of length 3
            # for oneway clusering: repeated three times
            G = min(model._G)
            if reps > 2**G:
                warnings.warn(
                    f"""
                              2^(the number of clusters) < the number of boot iterations for at least one model,
                              setting full_enumeration to True and reps = {2**G}.
                              """
                )
                full_enumeration = True

        model_tidy = model.tidy().xs(param)
        all_model_stats = pd.concat([all_model_stats, model_tidy], axis=1)
        S += 1
   
    if type == "rwolf": 
       t_stats = np.zeros(S)
       boot_t_stats = np.zeros((2**G, S)) if full_enumeration else np.zeros((reps, S))
   else: 
       p_vals = np.zeros(S)
       boot_p_vals = np.zeros((2**G, S)) if full_enumeration else np.zeros((reps, S))

    for i in range(S):
        model = models[i]

        wildboot_res_df, bootstrapped_t_stats = model.wildboottest(
            param=param,
            reps=reps,
            return_bootstrapped_t_stats=True,
            seed=seed,  # all S iterations require the same bootstrap samples, hence seed needs to be reset
        )

        if type == "rwolf": 
           t_stats[i] = wildboot_res_df["t value"]
           boot_t_stats[:, i] = bootstrapped_t_stats
           pval = _get_rwolf_pval(t_stats, boot_t_stats)
        else: 
           p_vals[i]= ...
           boot_p_vals[i] = ...
           pval = _get_wyoung_pval(p_vals, boot_p_vals)

    all_model_stats.loc["WY Pr(>|t|)"] = pval
    all_model_stats.columns = pd.Index([f"est{i}" for i, _ in enumerate(models)])
    return all_model_stats

and then define

def rwolf(...): 
   return _multcomp_resample(type = "rwolf", ...)

def wyoung(...): 
   return _multcomp_resample(type = "wyoung", ...)

?

@marcandre259
Copy link
Contributor Author

Hi @s3alfisc,

Thank you for the great suggestions.

Indeed, I agree with the refactoring you suggest, I'll implement this.

How hard is it to get the unadjusted p-values out of the developed randomization/bootstrapping functions?

It would be nice if I can keep using the existing tools.

Best,

@marcandre259

@s3alfisc
Copy link
Member

s3alfisc commented Nov 7, 2024

How hard is it to get the unadjusted p-values out of the developed randomization/bootstrapping functions?

You can take the unadjusted p-values directly from the feols object. I only take the t-stat from the wildboottest output because wildboottest() might use different small sample corrections than pyfixest, so the comparison of the "original" t-stat and the "bootstrapped" t-stats might be incorrect. I.e. we might do something as

$$ (N-1) / N \times \beta / ses > (N-1) / (N-k) \times \beta_{boot} / ses_boot $$

which I thought we should avoid =)

For the p-value, this does not matter, so you can simply call Feols.pvalues().values.
For the bootstrapped, randomization inference p-values, you can take them directly from the wildboottest or ritest output, which is of the following form:

import pyfixest as pf 
data = pf.get_data()

fit = pf.feols("Y ~ X1", data=data)
fit.wildboottest(param = "X1", reps = 10).xs("Pr(>|t|)")
fit.ritest(resampvar = "X1", reps = 10).xs("Pr(>|t|)")

@marcandre259
Copy link
Contributor Author

marcandre259 commented Nov 9, 2024

Hi @s3alfisc,

the issue I'm currently facing is that

fit.wildboottest(param = "X1", reps = 10).xs("Pr(>|t|)")

returns a single p-value based on the method:

  def get_pvalue(self, pval_type = "two-tailed"):

    if pval_type == "two-tailed":
      self.pvalue = np.mean(np.abs(self.t_stat) < abs(self.t_boot))
    elif pval_type == "equal-tailed":
      pl = np.mean(self.t_stat < self.t_boot)
      ph = np.mean(self.t_stat > self.t_boot)
      self.pvalue = 2 * min(pl, ph)
    elif pval_type == ">":
      self.pvalue = np.mean(self.t_stat < self.t_boot)
    else:
      self.pvalue = np.mean(self.t_stat > self.t_boot)

of wildboottest. In the wyoung algorithm, I need to do a comparison against each of the resampled B sets of p-values.

I'm not sure what is the best way to go from there. On the one-hand, the wildboottest could give the option to output an array of B p-values. My question is then how to compute them?

On the other hand, keeping things as they are in my initial commit w.r.t. to getting the S adjusted p-values is not really different from what is already going with rwolf? Namely, the same small sample corrections are applied, although they differ from those of fixest.

I'm looking forward to getting your opinion :)

@s3alfisc
Copy link
Member

s3alfisc commented Nov 10, 2024

I'm not sure what is the best way to go from there. On the one-hand, the wildboottest could give the option to output an array of B p-values. My question is then how to compute them?

The wild bootstrap computes a "bootstrapped p-value" by counting how often the "sample" t statistic is larger than the individual bootstrap t-statistics. Thats why we only get a single pvalue computed e.g. via np.mean(np.abs(self.t_stat) < abs(self.t_boot)).

If I understand the WY method correctly, we compute a p-value for the test of interest for each of the B bootstrap samples. So if we have the t-statistic, we could do something like this:

import pyfixest as pf
from scipy.stats import t
import numpy as np

data = pf.get_data()
fit = pf.feols("Y ~ X1", data = data)

_, t_boot = fit.wildboottest(param = "X1", reps = 1000, return_bootstrapped_t_stats = True)
t_boot[0:5]
# array([ 0.75552776, -0.84928264, -1.04821196,  0.84407764,  1.15154982])
#df = _N - _k if _vcov_type in ["iid", "hetero"] else _G - 1
df = fit._N - fit._k if fit._vcov_type in ["iid", "hetero"] else fit._G - 1
p_vals_boot = 2 * (1 - t.cdf(np.abs(t_boot), df))
p_vals_boot[0:5]
# array([0.45011102, 0.39592812, 0.29479525, 0.3988287 , 0.24978246])

Here I compute p-values as in the Feols.get_inference() method. Btw, we might want to make df and attribute so we can re-use it, e.g. in the WY method?

@s3alfisc
Copy link
Member

s3alfisc commented Nov 10, 2024

On the other hand, keeping things as they are in my initial commit w.r.t. to getting the S adjusted p-values is not really different from what is already going with rwolf? Namely, the same small sample corrections are applied, although they differ from those of fixest.

Yes and no, I think =D

Per se, it does not matter which small sample correction wildboottest applies. When computing p-values, the small sample corrections will cancel out in this step:

wb = fit.wildboottest() # wb is a WildBootsStrap object
np.mean(np.abs(ssc * wb .t_stat) < abs(ssc * wb .t_boot))

If we now would do something as

np.mean(np.abs(ssc_feols * self.t_stat) < abs(ssc_wb * wb .t_boot))

i.e. compare the t-stat from pf.feols() with bootstrapped t-stats from the wild bootstrap, we would potentially get incorrect results in case the small sample corrections are drastically different (i.e. if we have small N or large k).

So we always need to make sure that in all comparisons, we apply the small sample corrections consistently.

@marcandre259
Copy link
Contributor Author

Hi @s3alfisc,

Thank you for the extensive response, learning a lot..

Based on your feedback, I will include this step prior to running the main WY algorithm:

df = fit._N - fit._k if fit._vcov_type in ["iid", "hetero"] else fit._G - 1
p_vals_boot = 2 * (1 - t.cdf(np.abs(t_boot), df))

That way there will at least be more clarity as far following the steps. I am also going to use the t-stats obtained from the wildbootstrap program.

marcandre259 and others added 6 commits November 13, 2024 20:28
…dding solver argument and integration tests to fepois and feols (py-econometrics#661)

* Adding solver to feols and fepois api's

* Benchmarks with linearmodels absorbingls

* Ran pixi lint

* Default solver to _estimate_all_models is np.linalg.solve

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Replace exact equality with a precision tolerance for test_fepois_args

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* add info on discourse to docs

* test predict with newdata against fixest

* updates

* pass tests

* fix test bug
* Adding literals to feols and fepois api's

* Adding docstring

* Fix lpdid argument vcov

* literal type

* hunting vcov literals

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changes refering to initial review

* Whac-a-mole based development

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Check length of valid_types instead looking for generic literal type

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@s3alfisc
Copy link
Member

Sounds like a good plan to me! Let me know when I should take another look =)

@marcandre259
Copy link
Contributor Author

Going to close this request and refer to it in a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants