Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subclassed reshape clean #15655

Closed
wants to merge 25 commits into from
Closed

Subclassed reshape clean #15655

wants to merge 25 commits into from

Conversation

delgadom
Copy link

This PR enables reshape operations on subclassed DataFrame and Series objects to preserve subclass families through the use of _constructor* properties.

See discussion on PR #15564. Thanks for the help @jreback.

This PR has a cleaner implementation of subclassed unstack operations by modifying the _Unstacker initializer to allow an optional constructor argument.

Additionally, this PR implements tests for a wider set of reshape operations. It now covers:

  • DataFrame.stack(), DataFrame.unstack(), DataFrame.pivot(), and series.unstack() for containers with Index and MultiIndex indices and/or columns
  • pd.melt()
  • pd.wide_to_long()

Finally, the pandas internals docs have been edited for clarity and additional examples have been added to showcase subclassed reshape and math operations.

@delgadom
Copy link
Author

delgadom commented Mar 11, 2017

Oops just saw #13563/#15601. Want me to remove SubclassedPanel from the subclassing docs or is that something for another PR once panels are officially on their way out?

@jreback
Copy link
Contributor

jreback commented Mar 11, 2017

yeah don't add anything about panel

@delgadom
Copy link
Author

I left the Panel column in the default _constructor* property table but removed references to panels from the examples

@codecov-io
Copy link

codecov-io commented Mar 11, 2017

Codecov Report

Merging #15655 into master will increase coverage by 0.63%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15655      +/-   ##
==========================================
+ Coverage   90.38%   91.01%   +0.63%     
==========================================
  Files         161      143      -18     
  Lines       50916    49383    -1533     
==========================================
- Hits        46019    44947    -1072     
+ Misses       4897     4436     -461
Flag Coverage Δ
#multiple ?
#single ?
Impacted Files Coverage Δ
pandas/core/reshape.py 99.27% <100%> (ø)
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/tools/plotting.py 71.79% <0%> (-10.03%) ⬇️
pandas/util/_tester.py 35.29% <0%> (-3.6%) ⬇️
pandas/types/concat.py 98.06% <0%> (-1.94%) ⬇️
pandas/conftest.py 94.11% <0%> (-1.72%) ⬇️
pandas/compat/pickle_compat.py 68.29% <0%> (-1.22%) ⬇️
pandas/tools/hashing.py 99.02% <0%> (-0.98%) ⬇️
pandas/io/packers.py 87.57% <0%> (-0.94%) ⬇️
pandas/core/base.py 95.51% <0%> (-0.67%) ⬇️
... and 189 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0ea0f25...dc3b07e. Read the comment docs.

@delgadom
Copy link
Author

Any thoughts as to why coverage would have decreased when the diff was 100% covered and I only added (didn't modify) tests? I don't like all that red.

@@ -152,44 +152,110 @@ Below example shows how to define ``SubclassedSeries`` and ``SubclassedDataFrame
def _constructor_sliced(self):
return SubclassedSeries


Overriding constructor properties allows subclass families to be preserved across slice and reshape operations:

.. code-block:: python

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I am puzzled why these are code-blocks at all, they should all simply be ipython-blocks. much more simple and will show an error if something is wrong. can you change.

@@ -308,6 +308,9 @@ Other enhancements
- ``pd.types.concat.union_categoricals`` gained the ``ignore_ordered`` argument to allow ignoring the ordered attribute of unioned categoricals (:issue:`13410`). See the :ref:`categorical union docs <categorical.union>` for more information.
- ``pandas.io.json.json_normalize()`` with an empty ``list`` will return an empty ``DataFrame`` (:issue:`15534`)
- ``pd.DataFrame.to_latex`` and ``pd.DataFrame.to_string`` now allow optional header aliases. (:issue:`15536`)
- Reshape operations now preserve subclass family. This includes ``DataFrame``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reshape operations on Series/DataFrame will now preserve subclasses. The following operations ......

@@ -37,9 +37,29 @@ class _Unstacker(object):

Parameters
----------
values : ndarray
Values of DataFrame to "Unstack"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't put blank lines in-between parameters

Values of DataFrame to "Unstack"

index : object
Pandas ``Index`` or ``MultiIndex``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index : Index is all that is needed here. MultiIndex is a valid Index.

level : int or str, default last level
Level to "unstack". Accepts a name for the level.

value_columns : object, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value_columns: Index, optional
A MultiIndex must be specified to unstack a DataFrame

@@ -69,7 +89,8 @@ class _Unstacker(object):
"""

def __init__(self, values, index, level=-1, value_columns=None,
fill_value=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what I would do instead of this, is simply change values -> obj. Where obj is a Series/DataFrame. Then you don't need a constructor argument at all, and if needed you can always .values. Though to be honest this should be completely changed. That coerces all dtypes to a common one.

@@ -452,7 +475,8 @@ def unstack(obj, level, fill_value=None):
return obj.T.stack(dropna=False)
else:
unstacker = _Unstacker(obj.values, obj.index, level=level,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. it makes much more sense to simply pass in obj

@@ -920,7 +945,7 @@ def lreshape(data, groups, dropna=True, label=None):
if not mask.all():
mdata = dict((k, v[mask]) for k, v in compat.iteritems(mdata))

return DataFrame(mdata, columns=id_cols + pivot_cols)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way to really do this nicely is to add two properties to _Unstacker

def _constructor(self):
    return self.obj._constructor

def _constructor_sliced(self):
    return self.obj._constructor_sliced

@jreback jreback added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 12, 2017
@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

can you rebase / update

@delgadom
Copy link
Author

Will do. I was traveling for the last few weeks but I'll update this when I find a bit of time. Thanks!

@jreback
Copy link
Contributor

jreback commented Jul 26, 2017

needs a rebase / update. pls comment if you would like to continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preserve subclass family on reshape operations
3 participants