Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterative rectifier #534

Merged
merged 21 commits into from
Oct 19, 2024
Merged

Conversation

leostimpfle
Copy link
Collaborator

This PR implements the iterative rectifier algorithm by Correia et al. (2021, http://arxiv.org/abs/1903.01633). This is the first step of issue #457 .

Updates

The main updates are:

  • _check_for_separation is now a wrapper calling the individual methods specified via the optional methods arguments. By default all methods are executed _check_for_separation_fe and _check_for_separation_ir
  • _check_for_separation_ir is the newly implemented iterative rectifier check
  • the APIs have now an optional keyword argument separation_methods that allows the user to specify which separation checks to run (and defaults to all methods)

Tests

I have updated test_poisson with an additional very basic example of separation that only works with the iterative rectifier.

The ppmlhdfe package contains many additional test cases here and I have tested my implementation against these cases. All cases work apart

  • from the datasets without regressors (02.csv, 03.csv, 04.csv, 12.csv, 13.csv): ppmlhdfe allows empty regressors while pyfixest does not (i.e., formulae of the form y | fe1 + fe2)
  • 07.csv: running fpeois on this dataset throws ValueError: Demeaning failed after 100_000 iterations. This dataset is also excluded ppmlhdfe's separation tests in validate_tagsep.do and I have therefore not further investigated the cause.

If useful, I can extend test_poisson to include further test cases from palmhdfe?

@leostimpfle
Copy link
Collaborator Author

For reference, here's the code snippet I used to run the additional test cases:

import os
import pandas as pd

import pandas as pd
from pyfixest.estimation.estimation import fepois, feols

folder = r'/Users/leonardstimpfle/code/ppmlhdfe/test/separation_datasets'
fns = os.listdir(folder)
fns.sort()
for fn in fns:
    if fn in ['07.csv']: continue
    data = pd.read_csv(os.path.join(folder, fn))

    fml = "y"
    regressors = data.columns[data.columns.str.startswith('x')]
    if not regressors.empty:
        fml += f" ~ {' + '.join(regressors)}"
    else:
        # pyfixest currently does not allow empty regressors
        continue
    fixed_effects = data.columns[data.columns.str.startswith('id')]
    if not fixed_effects.empty:
        fml += f" | {' + '.join(fixed_effects)}"

    print(fn)
    print(f'Expect {data.separated.sum()} separated observations')
    fitted_ir = fepois(fml, data=data, vcov="hetero", separation_check=["ir"])

@s3alfisc
Copy link
Member

s3alfisc commented Jul 3, 2024

Wow, super cool, thank you Leo (@leostimpfle)! Did not expect this PR to come so soon =) I'll take a first look tonight, but I think reviewing the algos more carefully will take me until this weekend. Thank you! =)

@leostimpfle
Copy link
Collaborator Author

Thanks @s3alfisc ! No rush, happy to have a chat to run through the code if helpful.

In the meantime I will try to review the failed checks. It looks like I did something wrong with the typed arguments.

Copy link

codecov bot commented Jul 14, 2024

Codecov Report

Attention: Patch coverage is 88.57143% with 8 lines in your changes missing coverage. Please review.

Files Coverage Δ
pyfixest/estimation/FixestMulti_.py 82.87% <100.00%> (ø)
pyfixest/estimation/estimation.py 75.71% <100.00%> (ø)
tests/test_poisson.py 100.00% <100.00%> (ø)
pyfixest/estimation/fepois_.py 89.50% <86.20%> (ø)

... and 31 files with indirect coverage changes

@leostimpfle
Copy link
Collaborator Author

My initial implementation didn't account for complex formulae such as "Y ~ X1 + i(f1,X2, ref=4.0)". The issue is that while the iterative rectifier should work on the raw data with the original formula, _check_for_separation was applied to the already modified arrays Y and X in FixestMulti_.

In the latest version, _check_for_separation has the additional arguments fml and data and the iterative rectifier algorithm is applied to these arguments. While_check_for_separation_fe and _check_for_separation_ir have the same call signature, the args actually used don't overlap between the two functions (the former uses Y, X, and fe, the latter fml and data).

@s3alfisc Please let me know any thoughts (appreciate it may not be the most elegant solution).

@s3alfisc
Copy link
Member

This weekend I will review your PR @leostimpfle! Sorry about the long delay :-/

@s3alfisc s3alfisc self-requested a review July 31, 2024 19:36
@s3alfisc
Copy link
Member

Hi @leostimpfle , finally got around to taking a look at the algo in more detail. Looks good to me! I merged the master branch into the PR and did the required updates. I have one minor question:

Currently, fepois only checks for separation when there are fixed effects. There is this line

if self._fe is not None:
which avoids separation checks if there is no fixed effect. Is this something we want to keep? In this case, the test that you've implemented should not work as no fixed effect is included. Strangely, it passed on your CI run 🤔 One reason to keep things as they are is that we could not check for categoricals that are encoded as dummies, as the dummy encoding happens before the separation checks. So maybe better to keep things as they are?

If you have the time, it would also be awesome if we could implement a few more tests against pplmhdfe based on the test data sets you linked; though if you're time constraint, I am also happy to do implement the tests myself =)

Best, Alex

@leostimpfle
Copy link
Collaborator Author

Hi @s3alfisc ! In principle, the iterative rectifier should work without fixed effects but I think it's fine to only run it if fixed effects are included in the specification. It's been a while since I looked at it but I'll have a closer look at it over the next week. Will also implement further tests against ppmlhdfe. Thanks again!

@s3alfisc
Copy link
Member

s3alfisc commented Oct 5, 2024

Very cool, thanks Leo! =)

@leostimpfle
Copy link
Collaborator Author

Hi @s3alfisc , I have updated test_separation to include ppmlhdfe's test cases. Note, however, that most test cases are skipped because they don't have any fixed effects or use a specification with intercept only. As discussed, I have kept the condition that separation checks are performed only if there are fixed effects.

https://github.com/leostimpfle/pyfixest/blob/5d95f3349daba9e9765eb02385b50f8cb04a0edd/pyfixest/estimation/fepois_.py#L135-L138

Let me know any feedback!

@s3alfisc
Copy link
Member

Hi @leostimpfle , this should be good to merge after we fix the merge conflicts as far as I can see, do you agree? If yes I can fix the conflicts and merge later =)

@s3alfisc s3alfisc merged commit bce0209 into py-econometrics:master Oct 19, 2024
1 check was pending
@s3alfisc
Copy link
Member

Sorry, accidentally merged to master. Will open a new PR based on this branch!

@leostimpfle leostimpfle deleted the iterative-rectifier branch October 26, 2024 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants