Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for pyfixest #105

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

s3alfisc
Copy link

@s3alfisc s3alfisc commented May 31, 2024

Hi @toobaz - this is a draft PR to add basic support for the pyfixest project.

It's still a WIP - I would like to add information on the fixed effects employed in a given model & add support for a range of additional model statistics that pyfixest is not yet computing.

This is how it currently looks:

%load_ext autoreload
%autoreload 2

import pyfixest as pf
from stargazer.stargazer import Stargazer

fit1 = pf.feols("Y ~ X1 + X2", data = pf.get_data())
fit2 = pf.feols("Y ~ X1 + X2 | f1", data = pf.get_data())
fit3 = pf.feols("Y2 ~ X1 + X2| f1", data = pf.get_data())

Stargazer([fit1, fit2, fit3])

image

Would you be happy to accept such a PR? Is there anything else I should take a look at?

Pinging @Asquidy as he'd expressed interest in this feature =)

Merging this PR would close py-econometrics/pyfixest#283.

All the best, Alex

@s3alfisc s3alfisc marked this pull request as draft May 31, 2024 11:20
@s3alfisc
Copy link
Author

Just started to play around more with stargazer - what a nice package! I'm thoroughly enjoying it. Thanks! =)

@aeturrell
Copy link

This would be a really useful feature.

@toobaz
Copy link
Collaborator

toobaz commented Jun 4, 2024

Would you be happy to accept such a PR? Is there anything else I should take a look at?

Yes, definitely this is a very welcome contribution. And the PR code looks good to me so far!

@s3alfisc
Copy link
Author

s3alfisc commented Jun 4, 2024

Awesome, then I will take a second look at the PR tomorrow evening to prepare it for a proper review =)

@toobaz, so far, pyfixest does not support a range of default values (e.g. F-statistics). Would it be ok to merge the PR as is and update the pyfixest classes in a second, later PR? Else I would prioritize to add these statistics in the next week.

Best, Alex

@toobaz
Copy link
Collaborator

toobaz commented Jun 4, 2024

@toobaz, so far, pyfixest does not support a range of default values (e.g. F-statistics)

You mean this PR? Or the pyfixest project itself?

In the first case: I would prefer a PR that produces, at least in the most basic/standard case, a complete output just as a pyfixest user would likely expect it.

In the second case: just get as close as possible to the above given the current pyfixest upstream code base, and it will be fine.

@s3alfisc
Copy link
Author

s3alfisc commented Jun 9, 2024

You mean this PR? Or the pyfixest project itself?

I was referring to pyfixest itself - so far, we haven't implemented a range of statistics, e.g. F-tests and adjusted R2.

So I indeed think that the current PR produces all of the supported and expected output of pyfixest =) @aeturrell , as a pyfixest user, do you agree? =)

I'd love to add some other information, e.g. columns for the employed fixed effects, but I think this might be better suited as a utils module within the pyfixest code base?

I.e. smth along these lines:

from stargazer.stargazer import Stargazer
import pyfixest as pf

fit = pf.feols("Y ~ csw(X1, X2) | csw0(f1, f2)", data=pf.get_data())

def deparse_fixef_for_stargazer(fixef_list: list): 

    """
    Deparse Feols._fixef to a dict of lists for easy use with Stargazer 
    to add fixed effects to the regression table. 

    Parameters
    ----------
    fixef_list : list
        List of fixed effects from Feols._fixef.
    
    Returns
    -------
    dict
        Dictionary of lists, where each list contains the fixed effects for a given variable.

    Example
    -------
        # basic example
        fixef_list = ['f1', 'f2', 'f1+f2', 'f1', 'f2', 'f1+f2']
        deparse_fixef_for_stargazer(fixef_list)
        # Output
        {'f1': ['f1', '-', 'f1', 'f1', '-', 'f1'],
         'f2': ['-', 'f2', 'f2', '-', 'f2', 'f2']
        }

        # for usage with Stargazer
        from stargazer.stargazer import Stargazer
        from stargazer.line_location import LineLocation
        import pyfixest as pf
        
        data = pf.get_data()
        fit = pf.feols("Y ~ X1 + X2 | f1 + f2", data=data)
        result_table = Stargazer(fit)

        deparsed_fixef_lists = deparse_fixef_for_stargazer([x._fixef for x in fit.to_list()])
        for key in deparsed_fixef_lists:
            result_table.add_line(key, deparsed_fixef_lists[key])
    """

    def identify_variables(lst):
        variables = set()
        for item in lst:
            if item:
                parts = item.split('+')
                for part in parts:
                    variables.add(part)
        return list(variables)

    # Identify unique variables
    unique_variables = identify_variables(fixef_list)

    # Initialize a dictionary to hold the lists for each variable
    variable_lists = {var: [] for var in unique_variables}

    # Populate the lists based on the presence of each variable
    for item in fixef_list:
        for var in unique_variables:
            if item and var in item:
                variable_lists[var].append('x')
            else:
                variable_lists[var].append('-')

    return variable_lists

which could be used as

deparsed_fixef_lists = deparse_fixef_for_stargazer([x._fixef for x in fit.to_list()])

result_table = Stargazer(fit.to_list())

for key in deparsed_fixef_lists:
    #print(key, deparse_fixef_lists[key])
    result_table.add_line(key, deparsed_fixef_lists[key])

and produce

image

So from my side, I think this PR is ready =) Should I do anything else, e.g. implement some tests @toobaz?

@toobaz
Copy link
Collaborator

toobaz commented Jun 13, 2024

I'd love to add some other information, e.g. columns for the employed fixed effects, but I think this might be better suited as a utils module within the pyfixest code base?

Yes and no: I am already planning, and testing in my own research code, a way to add "smart lines" that associate to each model a with a key and will use that key to retrieve from each included model the content of the related cell (this could be the presence/absence of fixed effects, as in your case, but also more general information, e.g. indication of sample used, variable definition...). This is my current code:

def add_custom_lines(table, models, exclude=[]):
    for k in models:
        if not hasattr(models[k], 'flags'):
            models[k].flags = {}
    # We want to process each line only once, but in the order
    # in which they appear (while making a set of the list would change the order):
    seen = set()
    all_lines = [f for k in models for f in models[k].flags]
    for f in all_lines:
        if f in seen or f in exclude:
            continue
        table.add_line(f, [models[k].flags.get(f, '') for k in models.keys()])
        seen.add(f)

Hence, deparse_fixef_for_stargazer wouldn't have to create a line, it would be enough to have each model state, (with an appropriate API, possibly to be implemented/overridden inside translators - flags is just the first attribute name that came to my mind), how to populate its cell for a given row;

Translators could even indicate which of these lines should be displayed by default.

"When is this gonna happen?", I hear you ask. Well, I guess in the summer I should be able to get this merged. But should you prefer to meanwhile implement something temporary on the pyfixest side, I wouldn't blame you.

My main doubt at the moment is whether it is acceptable in general to ask users to add and populate a flags dict attribute in an instance of a class (e.g. RegressionResults, Feols) that has no such method. Because while I'm happy to have e.g. the pyfixest translator use this mechanism to add FE lines, my priority would be that users could use it freely to add lines of any kind.

@aeturrell
Copy link

Hey both, any update on this? Very keen to be able to point people to this magical integration in my teaching, and update Coding for Economists too. Love the idea of smart lines but even the basic functionality that's already in the PR would be a game-changer!

@toobaz
Copy link
Collaborator

toobaz commented Jul 23, 2024

Hey both, any update on this?

None on my side... I'm still very interested in evaluating the PR, when it will be ready!

@s3alfisc
Copy link
Author

s3alfisc commented Jul 23, 2024

Hi @toobaz, sorry for never really getting back to you about this! I had a thousand things going on plus was on vacation... Sorry! From my perspective, this PR is ready for a review from your side. I have decided to include a customized Stargazer class into pyfixest that directly inherits from Stargazer, but adds information on fixed effects to the output. You can check out how it looks in this vignette =) Once this PR is merged, I will add Stargazer as a soft dependency to pyfixest.

@toobaz
Copy link
Collaborator

toobaz commented Jul 23, 2024

Hi @s3alfisc , thanks for the update! I wasn't reviewing the PR because it is a "draft", I now gave it more attention.

The result looks great, but to be honest I am not enthusiastic about Stargazer having a different interface with pyfixest than it has with other packages.

I understand the proposal I wrote above is way too uncertain, in particular for what concerns the timing, for you to base your solution on it... but would an (at least temporary) option be that I add some hooks to execute the code you need, defining appropriate functions inside the translator module (pyfixest.py, in your case)?

In particular, if I understand correctly, the problem you have with the current state of the stargazer code is that everything that is defined inside the translator module operates model by model, while you need to be able to operate on the entire list at once, correct?

Would it be OK if I allowed the translator to define a function called at the end of __init__, triggered whenever there are models of one of the classes indicated in the translator's classes argument? Basically, in your case it would be invoking add_fixest.

By the way: I see __init__.py inside your code does a check

if any([x._fixef is not None for x in self.models]):
            self.add_fixef()

... did you happen to test mixing pyfixest and other models?

@s3alfisc
Copy link
Author

Hi @s3alfisc , thanks for the update! I wasn't reviewing the PR because it is a "draft"

Ouhh, yes, just noticed this now, sorry about that!

I understand the proposal I wrote #105 (comment) is way too uncertain, in particular for what concerns the timing, for you to base your solution on it...

That's the main reason I added the Stargazer class to pyfixest =) people have been asking about the feature for a while (and I'd also love to use Stargazer with pyfixest myself), so I thought that it would be most convenient for me to simply define a custom class within pyfixest.

The main advantage I see for this approach is that it allows me to built a highly opinionated regression table for pyfixest objects, while users interested in a more customized solution could build around the Stargazer class within stargazer. In that sense I think having pf.Stargazer is quite user friendly, and that's really important to me =)

But I am also happy to build on solution that you've proposed above, it's just a little bit more complicated for me 😄

Out of curiosity, what is the disadvantage that you see with a custom Stargazer class for pyfixest? Is it potentially not having visibility on potentially breaking downstream dependencies with changes to stargazer?

@s3alfisc
Copy link
Author

s3alfisc commented Jul 23, 2024

... did you happen to test mixing pyfixest and other models?

This of course fails... for statsmodels, I get an

AttributeError: 'OLSResults' object has no attribute '_fixef'

error. Thanks for the pointer, will fix this asap =)

@toobaz
Copy link
Collaborator

toobaz commented Jul 23, 2024

Out of curiosity, what is the disadvantage that you see with a custom Stargazer class for pyfixest?

It depends.

  1. If it was the way to build tables for pyfixest, then stargazer would be claiming support for pyfixest, include code for pyfixest, but the stargazer documentation would not apply to pyfixest (or even worse: it would work, but not produce the result users desire), and this would be a big source of confusion.

  2. If there was "the stargazer way" and "the highly opinionated pyfixest way", then this would just introduce confusion. stargazer is designed in a way - and will be even more so in the future - that should make it easy to get very customized tables, without requiring two different code bases.

Whatever the case, I would have to test (well, when stargazer will have a test suite) the "pyfixest approach" at every internal change, even though it is code not under my responsibility, because otherwise any change in my code (and sooner or later there will be big changes - just to give an example, Stargazer will sooner or later have arguments beyond models) would risk breaking the user experience of pyfixest users. Now in principle I can still break the pyfixest workflow despite just providing hooks for it, but having a well defined API reduces the risk.

Not to mention that if a project comes out with similar necessities to pyfixest, we (it/I) will have to reinvent the wheel.

So this said: I would love to support pyfixest in a clean way, but I guess if the (actual) Stargazer interface is not what you, or even pyfixest users, will use, and the only one they will use, then it might be better that all code for pyfixest tables stays in the pyfixest codebase.

Vice-versa, if timing is the problem, it is a problem solved: I can implement the hooks immediately and do a release, then we can polish the interaction between the two code bases when stargazer will have cleaner support for "smart lines".

Note that I have nothing against:

  • pyfixest methods to customize tables in ways that are particularly useful for pyfixest users, as long as they use the standard stargazer API (basically syntactic sugar around it). I would not call either of them pf.Stargazer, however. It is not jealousy, I just think it is a disservice to the users.
  • a pf.Stargazer method that is just the same as stargazer.Stargazer - a shortcut.

@s3alfisc
Copy link
Author

Ok, sold! So the next steps would be that

  • I'll remove thepf.Stargazer class from pyfixest & update the vignette (i.e. prompt users to use from stargazer.stargazer import Stargazer
  • We jointly work on this PR to get merged into stargazer
  • I then come up with a plan how to potentially expose stargazer directly from pyfixest to users (e.g. I could add an option to the internal pf.etable() function to internally run Stargazer) and share some options with you?

@toobaz
Copy link
Collaborator

toobaz commented Jul 24, 2024

I'll remove thepf.Stargazer class from pyfixest & update the vignette (i.e. prompt users to use from stargazer.stargazer import Stargazer

Thanks!

We jointly work on this PR to get merged into stargazer

Sure. If you agree that having a hook called at the end of Stargazer.__init__() is a solution, I will do it immediately (maybe separately from this PR).

For instance, we could have that whenever:

  • at least one of the models is of a class indicated in the translator's classes, and
  • the translator has a function named process_models
    ... then that function is called, with the Stargazer instance as argument, at the end of the instance's __init__.

In your case, I guess process_models would basically be what is now called add_fixef. Which, by the way, only interacts with the Stargazer object via the official API (add_line), which is great.

I then come up with a plan how to potentially expose stargazer directly from pyfixest to users (e.g. I could add an option to the internal pf.etable() function to internally run Stargazer) and share some options with you?

Sure. But just to clarify: I'm not pretending I have the right to dictate how other projects should, or shouldn't provide shortcuts for stargazer usage. pf.Stargazer is fine, an option to pf.etable() is fine, maybe even a method of the fit object would be fine! All I (strongly) suggest is that 1) this "shortcut" relies on the standard Stargazer API (and this is possibly documented), and 2) whatever pyfixest users will presumably want as default can also be done without this shortcut.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stargazer compatibility for nice model output
3 participants