-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove SparseSeries and SparseDataFrame #28425
Remove SparseSeries and SparseDataFrame #28425
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do a search for @pytest.mark.filterwarnings("ignore:Sparse:FutureWarning")
, I thnk there are a few left (or at least at the moment I was reviewing)
@@ -1,7 +1,7 @@ | |||
""" | |||
Interaction with scipy.sparse matrices. | |||
|
|||
Currently only includes SparseSeries.to_coo helpers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this file to arrays/sparse.py
(or as separate file next to it), since it is only used there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was going to do moves as a followup. I'd also like to move the last bit of pandas/tests/sparse/test_combine_compat.py
, which had a SparseArray test in it.
@@ -53,10 +53,7 @@ | |||
Period, | |||
RangeIndex, | |||
Series, | |||
SparseDataFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can leave this file intact for now?
(isn't it meant to be run with older version of pandas? In case we need to regenerate old pickle files, we might still want to generate sparse ones to check that reading them works / errors with a good message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we need to remove it so that the sparse structures are not included, re-run it with an older pandas, and commit the new binary file.
I will update to use SparseArray though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd can you comment here? I think we import this file when running the tests, so we need to be able to import it under both the old and new pandas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this file now be restored now you added the pickle compat? (I also think you now no longer update the old pickle files, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file gets imported on master, so I need to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right the file would be created with an old release but read and tested against master I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we want to mimic what users can do and what Tom is trying to support (i.e. write a pickle file with SparseDataFrame in an older release and read this file with the new release), this file needs to be able to generate such pickle files with SparseDataFrame?
Now, I don't really care that much about SparseDataFrame, and there are currently already legacy pickle files that contain it, so the need to be able to recreate those files before we will actually remove that support is maybe not that high, and I am maybe splitting hairs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @jorisvandenbossche, I think I've misunderstood you this whole time (and I was confused why you kept pushing on this point 😄). You're correct that if we relied on these legacy files for testing reading files written with SparseSeries & SparseDataFrame, then we'd need to keep the imports here (at conditionally).
But a8b0d65 added tests dedicated for that (we need dedicated tests, since the existing ones silence all warnings).
So from a test coverage POV, we're OK with removing writing SparseSeries & SparseDataFrame here. It's just a matter of code style / prep for running this file when 1.0.0 becomes the "legacy" pickle version we support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So from a test coverage POV, we're OK with removing writing SparseSeries & SparseDataFrame here.
Yep, if we have files now, it seems ok to remove it from this file. It seems rather unlikely that after the next bump in legacy support for pickle files, we still want to keep support for SparseDataFrame, I suppose, as we now warn for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds right to me. Technically, we could chose to bump the next legacy to version 0.25.x, which has SparseDataFrame, but I think we're more likely to choose 1.0.x since it's a round number.
@WillAyd it looks like you fixed up the pickle panel stuff in https://github.com/pandas-dev/pandas/pull/27082/files. Would you recommend removing SparseSeries / SparseDataFrame from the pickle files as a precursor to this PR? |
3cc4765 has the changes to the pickle stuff. I can split that out into a precursor PR if people want. I regenerated the pickle files with python 3.5.6 and pandas 0.20.3. |
Do we still have pickle files with sparse data in them? |
As of 3cc4765 we have a pickle file with Series[Sparse] and DataFrame[Sparse], but no SparseSeries and SparseDataFrame. That converted the use of SparseSeries/Frame to Series/Frame with sparse values. With 3cc4765, the pickle tests on this branch pass. It would also pass on master, so I could split those changes out. |
I don't have a strong opinion on splitting or not, either way. For older pickle files with SparseDataFrame in it, you currently get:
With our pickle_compat, I suppose we could actually map this to a function that creates a DataFrame[sparse] of it. But not sure that's worth the effort. |
Huh, 766a2f2 has a bit of compat code to allow reading a file saved with SparseSeries / SparseFrame. I don't really know how it works, but it seems to. I've restored the old pickle file with SparseSeries / SparseFrame and it passes locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
For the docs, we need to see if it is needed to add actual output to the code-blocks
For the issues you are closing with this, we actually need to see if some of them are not also valid for Series[sparse] / SparseArray
@@ -53,10 +53,7 @@ | |||
Period, | |||
RangeIndex, | |||
Series, | |||
SparseDataFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this file now be restored now you added the pickle compat? (I also think you now no longer update the old pickle files, right?)
I already went through the most recent ones, and crossed a few in the top post. |
@@ -53,10 +53,7 @@ | |||
Period, | |||
RangeIndex, | |||
Series, | |||
SparseDataFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here are correct; you just need to re-run this in an environment with pandas 0.20.3 ideally using the minimum supported Python version, which I guess right now is 3.5.3 though by the time this goes live should be 3.6, so maybe stick to the latter (I see there is one for 3.5.6 already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally nice! This is quite the clean up. Just some comments to consider but I think this looks really good
@@ -91,8 +91,7 @@ Interaction with scipy.sparse | |||
|
|||
Added :meth:`SparseSeries.to_coo` and :meth:`SparseSeries.from_coo` methods (:issue:`8048`) for converting to and from ``scipy.sparse.coo_matrix`` instances (see :ref:`here <sparse.scipysparse>`). For example, given a SparseSeries with MultiIndex we can convert to a `scipy.sparse.coo_matrix` by specifying the row and column labels as index levels: | |||
|
|||
.. ipython:: python | |||
:okwarning: | |||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be done as a follow up and probably something easy for the community to contribute to, but ideally since we are converting these from ipython
to python
directives should add the original output (applicable in a few places below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly wouldn't object, but I think that's fairly low-value since, e.g. the 0.16 version of the docs at https://pandas.pydata.org/pandas-docs/version/0.16.0/whatsnew.html will have the correct rendering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I don’t think you should do here as there’s already enough - just a nice to have a first time contributor could pick up.
I can open an issue for it later
Maybe rather add files for a different (later) version, since we already have them for 0.21? |
Just FYI, since the last round of reviews I've added
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last changes look good to me!
Going through the rest of the issues to validate that they are specific to the sparse subclasses. Edit: done. The list at the top should be accurate. All the remaining sparse issues still open should be relevant to SparseArray. |
Let's merge this? |
lgtm. I am -0 on providing the pickle compat. but I guess ok. |
merge when ready. |
Gave another glance through, and I think we're good. Mostly agreed about the pickle compat. But since it turned out to be so easy (same internal representation) I think we're OK. |
Woohoo! :)
I think a reason to have it (apart from that it was easy to add) is because this is a very recent deprecation, and that it also maps nicely to a supported data structure. |
As discussed on the dev call.
Closes #28407
Closes #28025
Closes #26777
Closes #26227
Not #26123Closes #25694
Closes #25378
Not #25270Not #24817Closes #24270
Closes #22630
Closes #21993
Closes #21909
Not #21818Closes #21231
Closes #20766
Closes #20172
Not #19278Closes #18969
Not #18414Closes #17993
Closes #15737
Closes #15634
Closes #14427
Closes #14310
Closes #13665
Not #11726Closes #10627
Closes #9275
Not #9265Closes #6427
Closes #5470
Closes #5078