-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecation of Panel ? #13563
Comments
I'm +1 on moving to xarray, but GitHub search shows the deprecation is not easy... As long as I know about popular packages, pydata/data-reader and quantopian/zipline uses |
No change this end - we are still using xarray heavily, and it's working beautifully. We've also improved the integration of xarray & pandas, so that should ease the path to deprecation. |
I'm +1 on deprecating Panels; @jreback moved mountains to create a consistent internal object model from 1 to N dimensions, but there is still a feeling of second-class citizenry when it comes to working with data over 2 dimensions. I think we would be better served in the long run by really optimizing for the 1 and 2-dimensional use cases (similar to what the R community has done, though the API surface area of dplyr, data.table, and built-in data frames is quite a bit smaller than pandas -- primarily lacking in the level of indexing complexity). I maintain that we should plan for a pandas 0.X.Y long-term support LTS release branch that becomes bugfix only so that we can start investing in renovations. I'm interested in feedback from the other core devs how realistic you feel this is. I've long worried about the amount of baggage we are carrying forward -- there are many organizations with large codebases that have made their peace with pandas's rough edges (data type issues, view / copying semantics, etc.), and it doesn't make sense to abandon them. On the flip side, it would be a shame to be held back from undertaking a more aggressive cleanup and retool of the internals to introduce better performance, extensibility, missing data / data type issues, etc. I regret that 6 months have passed since I brought up this grand scheme and I haven't been able to carve out the time to make a dent, beyond demo'ing a proof-of-concept of integer NAs. Also, I would feel much better about working on this on a long-lived branch (similar to what happened with IPython) under some kind of feature freeze. Anyway, some of these comments are beyond the scope of this issue. I don't think we should deprecate Panel unless we're collectively on board to the idea of cleaning up pandas internals over the next 12-24 months (which is as much of a code organization problem as anything -- particularly quarantining unit tests that we are contemplating "breaking"). |
There are plenty of examples using panel in SO: https://stackoverflow.com/questions/tagged/panel+pandas One particular one I'm not sure how to port and do not want to depend on xarray is this one: |
I noticed today that none of the docs for the panel class/methods seem to have notification around the fact that it's deprecated. There's the 'deprecate panel' in the 0.22.0 'what's new', but it seems likely that people may not see that if they're searching for panel or following direct links to the docs. I can see this example of a deprecation note in a docstring, which subjectively doesn't seem to draw a lot of attention to itself. Is there a convention for these that's a little bit more 'attention-grabbing'? Once I know of the best way, I'm happy to submit a PR. Edit: Actually, just found 1d32264 which seems to indicate exactly what to do in this instance. |
There is a deprecation in the user guide, and a warning when you actually use it, but you are certainly correct we could add a notice in all docstrings as well to give this more visibility. Typically a PR very welcome! |
I'll be the first to protest deprecation of panels, specifically the need to rewrite legacy code. I have plenty of legacy code for finance for which conversion to multi-index is very painful, code which now spews panel warnings despite working flawlessly. Of course, I write any new code only using multi-index dataframes (which have a significantly higher learning curve, which I am happy that I overcame). Note about feeling that "3 or more dimensions feels like second-class usage", I would note that there is a deep asymmetry even between the dimensions of a 2D pandas object - columns and rows are explicitly treated differently in pandas, with rows being second-class to columns in a highly non-intuitive way, disobeying the mathematical symmetries of matrices. Food for thought. Then again, often the dimensions of real-life data are inherently asymmetric, since time is a very special type of dimension. |
@joseortiz3 the problem has less to do whether there are users of the code and more about whether there is sufficient bandwidth to maintain the code. If there isn't a motivated developer base to support a component of an open source software project, it doesn't seem reasonable that maintainers of the rest of the project should be burdened by it. The general thinking (and @jreback and others can comment) is that having > 2 dimensional data structures has made many parts of the codebase significantly more difficult to develop and maintain. This has a high long term cost. Given pandas's funding situation (or lack thereof) I don't see how it is tenable |
This is exactly right. Furthermore, pandas has quite a number of pull requests coming daily and many open issues (2600+). We have a limited amount of core devs (12), so there is a natural limitation to how much the (already huge) scope of pandas can be. Panel is not nearly as mature as other aspects of pandas and would be better served by separate motivated maintainers. Note that there is already quite an overlap with the |
Totally reasonable, of course. Would it be so difficult to write a "panel wrapper" that has a panel-like interface to what is actually a multi-index dataframe? It wouldn't need to implement all of the methods of panel, it would just allow the for 90% of legacy code to be rewritten via a simple |
This is a topic that has come up recently (#10000, #8906, pandas-dev mailing list discussion), let's make this an issue to track the discussion about it.
Deprecating Panels would be a rather large change, so:
cc @pydata/pandas @MaximilianR
The text was updated successfully, but these errors were encountered: