Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define boxplot inside DataFrameGroupBy definition #17179

Closed
wants to merge 7 commits into from

Conversation

jbrockmendel
Copy link
Member

instead of pinning it on right after the definition; move the import of plotting._core.boxplot_frame_groupby to the top of the module.

Start getting rid of the exec(...) statements. In particular, the ones that define
properties. Still need to figure out the ones defining methods.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

instead of pinning it on right after the definition; move the import of boxplot_frame_groupby
to the top of the module.

Start getting rid of the exec(foo) statements.  In particular, the ones that define
properties.  Still need to figure out the ones defining methods.
@jbrockmendel
Copy link
Member Author

I can't get CircleCI to load the error messages, so no idea why it's failing.

@gfyoung gfyoung added the Clean label Aug 7, 2017
@jbrockmendel
Copy link
Member Author

@gfyoung Can you see what the problem is with CircleCI?

@gfyoung
Copy link
Member

gfyoung commented Aug 10, 2017

@jbrockmendel : you can see for yourself in fact. Under the test summary, for each failing test, there's a "more" button to the right-hand side that you can click.

@gfyoung
Copy link
Member

gfyoung commented Aug 10, 2017

As for why your tests are failing, they're failing because they aren't catching a UserWarning somewhere, at least according to pytest.

@jbrockmendel
Copy link
Member Author

It was stuck in spinning-umbrella mode last time I checked, but I'm getting the error messages now. Thanks.

@codecov
Copy link

codecov bot commented Aug 10, 2017

Codecov Report

Merging #17179 into master will decrease coverage by 0.02%.
The diff coverage is 89.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17179      +/-   ##
==========================================
- Coverage   91.02%      91%   -0.03%     
==========================================
  Files         162      162              
  Lines       49517    49523       +6     
==========================================
- Hits        45075    45068       -7     
- Misses       4442     4455      +13
Flag Coverage Δ
#multiple 88.78% <89.28%> (-0.01%) ⬇️
#single 40.28% <85.71%> (-0.04%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby.py 92.03% <89.28%> (-0.05%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/indexes/accessors.py 88.09% <0%> (-0.67%) ⬇️
pandas/core/internals.py 94.04% <0%> (-0.17%) ⬇️
pandas/core/frame.py 97.66% <0%> (-0.16%) ⬇️
pandas/core/computation/eval.py 96.93% <0%> (-0.04%) ⬇️
pandas/core/categorical.py 95.46% <0%> (-0.03%) ⬇️
pandas/core/resample.py 96.13% <0%> (-0.03%) ⬇️
pandas/core/computation/pytables.py 92.35% <0%> (-0.03%) ⬇️
pandas/core/computation/align.py 97.91% <0%> (-0.03%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f165b90...1d59eff. Read the comment docs.

@codecov
Copy link

codecov bot commented Aug 10, 2017

Codecov Report

Merging #17179 into master will decrease coverage by 0.03%.
The diff coverage is 84.21%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17179      +/-   ##
==========================================
- Coverage   91.02%   90.99%   -0.04%     
==========================================
  Files         162      162              
  Lines       49517    49655     +138     
==========================================
+ Hits        45075    45185     +110     
- Misses       4442     4470      +28
Flag Coverage Δ
#multiple 88.78% <84.21%> (-0.01%) ⬇️
#single 40.27% <51.31%> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby.py 91.63% <84.21%> (-0.44%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️
pandas/core/tools/datetimes.py 66.85% <0%> (ø) ⬆️
pandas/core/tools/numeric.py 100% <0%> (ø) ⬆️
pandas/util/_print_versions.py 15.71% <0%> (ø) ⬆️
pandas/core/generic.py 92.03% <0%> (ø) ⬆️
pandas/core/tools/timedeltas.py 98.33% <0%> (ø) ⬆️
pandas/core/reshape/pivot.py 95.4% <0%> (+0.1%) ⬆️
pandas/util/_decorators.py 66% <0%> (+0.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f165b90...c3c0e0a. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fixing the general case should happen first (IOW actually generate the groupby whitelisted methods rather than using exec). then this would be simpler and not as special casey.

@jbrockmendel
Copy link
Member Author

actually generate the groupby whitelisted methods rather than using exec

That sounds great. I'll try to put that together this afternoon.

@pep8speaks
Copy link

pep8speaks commented Aug 11, 2017

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 12, 2017 at 01:18 Hours UTC

@jbrockmendel
Copy link
Member Author

Just pushed a commit that explicitly defines all of the inherited methods instead of generating them with exec.

closes #16959

Aside from using Appender to inherit docstrings, the definitions here are identical to how they were in the exec version. If we can reach a consensus, I'd like to go a step further and get rid of the __getattr__ magic these methods use. e.g.

    def __getattr__(self, attr):
        if attr in self._internal_names_set:
            return object.__getattribute__(self, attr)
        if attr in self.obj:
            return self[attr]
        if hasattr(self.obj, attr):
            return self._make_wrapper(attr)

        raise AttributeError("%r object has no attribute %r" %
                             (type(self).__name__, attr))

[...]
    @Appender(Series.idxmax.__doc__)
    def idxmax(self, axis=None, skipna=True, *args, **kwargs):
        f = self.__getattr__('idxmax')
        return f(axis=axis, skipna=skipna)

A casual reader could easily look at f = self.__getattr__('idxmax') and think it is equivalent to self.idxmax or getattr(self, 'idxmax'). This confusion can be avoided by just writing f = self._make_wrapper('idxmax'). Ideally we could take the _make_wrapper branch out of __getattr__ altogether.

In fact, gb.idxmax() will raise a TypeError if there is ever a column named idxmax. There was a related issue around not long ago but I can't seem to find it.

Side-note: Most of the inherited methods allow for **kwargs, but then ignore anything that is passed. Is this intentional?

return f(other=other, method=method, min_periods=min_periods)

@Appender(Series.fillna.__doc__)
def fillna(self, value=None, method=None, axis=None, inplace=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many of these keywords need to be eliminated. e.g. axis and inplace, they don't make sense on groupby objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I suggest this should happen in a follow-up. This is not a new problem, but an existing problem that was hidden by exec until now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is not enough testing around this. i am concerned you are actually changing the signatures. can you add covereage for all of the exposed functions (just assert that they work with a full spread of arguments). xfail the ones that are problematic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't address this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. This point in my "get back to it later" pile, as it is several steps removed from the original goal of the PR.

_series_apply_whitelist):
exec(_def_str)

@Appender(Series.all.__doc__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just adding the doc-string is not really enough. you need to have a bit of substitution and/or mention that this is actually a groupby method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Under the status-quo the docstring is attached verbatim but with some extra whitespace. My preference would be to address this in a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is fine

@@ -3964,13 +4012,117 @@ def filter(self, func, dropna=True, *args, **kwargs): # noqa

class DataFrameGroupBy(NDFrameGroupBy):
_apply_whitelist = _dataframe_apply_whitelist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this concept of a whitelist may not be necessary anymore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still referenced in _make_wrapper and some of the tests. It might make sense to remove it at the same time as the __getattr__ change suggested above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be referenced in the tests at all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See tests/groupby/test_whitelist.py

Looks like it also shows up in _dir_additions:

def _dir_additions(self):
        return self.obj._dir_additions() | self._apply_whitelist

AFAICT _apply_whitelist is a historical artifact that may not be needed anymore, but is not actively causing any problems at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would remove the whitelist entirely; since you are defining the whitelist the methods no longer are referenced thru that mechanism (and instead thru explicit methods). you can simply setup a list in the tests that is a list of these methods and check that they are defined.

return f(other=other, method=method, min_periods=min_periods)

@Appender(Series.fillna.__doc__)
def fillna(self, value=None, method=None, axis=None, inplace=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't address this point.

@@ -3964,13 +4012,117 @@ def filter(self, func, dropna=True, *args, **kwargs): # noqa

class DataFrameGroupBy(NDFrameGroupBy):
_apply_whitelist = _dataframe_apply_whitelist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would remove the whitelist entirely; since you are defining the whitelist the methods no longer are referenced thru that mechanism (and instead thru explicit methods). you can simply setup a list in the tests that is a list of these methods and check that they are defined.

"""
return the dtype object of the underlying data
"""
return self.__getattr__('dtype')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this indirection is extremely odd to do, simply use getattr(self.obj, 'dtype')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Again: this is the code that is generated by exec in the status quo.

@jbrockmendel
Copy link
Member Author

This PR has gotten stuck in the mud. I propose swiss-cheesing the problem into smaller pieces.

  1. A new PR that addresses the original issue by just taking boxplot out of _dataframe_apply_whitelist.

  2. A PR that removes the whitelist properties generated using exec (and the weird __getattr__ redirection). With properties we won't have to worry about the signatures being changed.

  3. Remaining methods that don't have *args, **kwargs.

  4. The rest.

@gfyoung
Copy link
Member

gfyoung commented Aug 26, 2017

@jbrockmendel : Given how the conversation has gone, I think that sounds like a good idea. Keep this PR open until you're done dividing it up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants