Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Return axes from boxplot #7096

Closed
wants to merge 3 commits into from

Conversation

TomAugspurger
Copy link
Contributor

Closes: #4264
Will go in place of #4472

In #985 we intentionally decided to have boxplot return a dict with the Lines of the boxplot.

I've added a kwarg return_dict. By default, return_dict is False. In this case only a matplotlib.axes is returned. When return_dict is True, we should either

  • return the matplotlib.axes and the dict in a namedtuple
  • return just the dict

Both of these are API breaking. Right now I've got both in the named tuple, but I may change that to just return just the dict. If we just return the dict, the only change people will have to make to their code is to add return_dict to True in their boxplot calls. Thoughts?

cc @jseabold, @fonnesbeck

@TomAugspurger
Copy link
Contributor Author

FYI I don't see any uses of boxplot in Wes's book.

@cpcloud
Copy link
Member

cpcloud commented May 12, 2014

i think should return the dict by default ... unless there's a reason to break the API more than necessary (i think ok to break by returning the named tuple) but defaulting to not return the boxplot anymore seems too much ... not that big of a deal

@jreback
Copy link
Contributor

jreback commented May 12, 2014

@TomAugspurger I think your soln seems reasonable.

@cpcloud I think returning BOTH breaks much more here; with @TomAugspurger PR, you can get both just by adding a kw, otherwise it doesn't break

@jseabold ?

@jreback
Copy link
Contributor

jreback commented May 14, 2014

@TomAugspurger ?

@TomAugspurger
Copy link
Contributor Author

Shall we have a vote?

  1. Return dict only (no change)
  2. Default: Return ax and dict in named tuple (API breaking)
  3. Default: return dict only, add kwarg return_ax (True or False)
  4. Default: return dict only, add kwarg return_ax (True, False, or 'both')
  5. Default: return ax only, add kwarg return_dict (True or False) (API breaking)
  6. Default: return ax only, add kwarg return_dict (True, False, or 'both') (API breaking)

2, 5, and 6 will break peoples' code.
3 & 4 are similar, just whether to take True/False or True/False/both; likewise with 5 & 6.

I vote for 5 or 6.

@TomAugspurger
Copy link
Contributor Author

I should note that this only applies when by is None. when by is not None, there's no change and an array of axes will be returned.

@jreback
Copy link
Contributor

jreback commented May 14, 2014

@jseabold ?

@fonnesbeck
Copy link

5 gets my vote also, FWIW

@cpcloud
Copy link
Member

cpcloud commented May 14, 2014

👍 on 5 from me too.

@jreback
Copy link
Contributor

jreback commented May 14, 2014

@TomAugspurger since current a dict is returned, and proposal is for 5 returning an ax will be returned instead (with return_dict kw), this should raise in user code if you are attempting to use this, right? IOW the break in the API will make it an obvious error

just thinking that user upgrades and doesn't read the API changes in whatsnew (doesn't everyone??? ) ... lol

@jseabold
Copy link
Contributor

5 is ok I suppose. I will only offer as anecdotal evidence that whenever I update my pandas, parts of my large projects inevitably break and it's a huge time-sink to track things down (happened as recently as yesterday). Breaking code is "bad." I've been busy lately, so I've just stopped trying to report these types of things. OTOH, this is a bit of a wart, and it should be pretty clear what's going on. If you decide on 5, then I'd issue a warning for a few releases to help people figure out what's going on.

Another option, that's a pain, but is the way we went with statsmodels in a few cases. Keep returning a dict, so as not to break people's code and add return_ax keyword that is False by default. Using inspect, if return_ax is not explicitly specified, then issue a deprecation warning that in the next release (or whenever) boxplot will return a tuple (or whatever you decide to move to). If it is explicitly given, then assume the user knows what they're doing.

@TomAugspurger
Copy link
Contributor Author

Thanks @jseabold. Should we try to deprecate / warn for a release?

I could add a return_type keyword, which expects one of {'dict', 'axes', 'both'}. In .14 the default is 'dict' and will behave exactly as .13, but it will warn that in a future version the default will change to axes. They can future proof themselves now by setting return_type='axes'. Thoughts?

@jseabold
Copy link
Contributor

That sounds reasonable to me, but I'll defer to whatever others want to do. API breakage isn't the end of the world in this case. As has been pointed out this isn't exactly a subtle change. IMO deprecation is just a good software habit for such a heavily used package.

@jorisvandenbossche
Copy link
Member

I am +1 on (eventually) returning the axis by default.
I also think you should have the possibility to return both (so not only or axes or dict, in the case you want to use both to customize your plot).

And +1 on having a deprecation cycle, keeping the default for now but raising a warning this will change in the future. return_type is good for me. Are there precedents of keywords that determine the return type? (to see what name we used there?)

@jreback
Copy link
Contributor

jreback commented May 15, 2014

I am with @jorisvandenbossche on first point return_type

but +0 on deprecation cycle, I think this would break loudly and this has been a wart for a long time

@jreback
Copy link
Contributor

jreback commented May 15, 2014

so since majority want deprecation @TomAugspurger why don't we use the return_type and deprecate for 0.14, chaning in 0.15 then

@TomAugspurger
Copy link
Contributor Author

OK. So to summarize:

  • This PR adds a new kwarg: return_type (tentative name; pending investigation if we have any other similar cases).
  • return_type can be 'dict' (0.14 and earlier behavior), 'axes' (0.15+ behavior) or ('both' namedtuple)
  • In 0.14 the default is return_type=None so that I can detect if a warning needs to be raised.
    • If return_type is None, a FutureWarning is issued and a dict is returned
    • If return_type is 'dict', a warning is not raised
  • In the Future (0.15?) the default of return_type will be changed to 'axes'

And I should note that all of this goes out the window if 'by' is not None, in which case we return an array of axes. I suspect some documentation is in order.

I pretty much have this done btw. I'm just in the midst of a messy rebase.

@jreback
Copy link
Contributor

jreback commented May 15, 2014

@TomAugspurger sounds good to me; create a new issue for DEPR review in 0.15 (the other is for 0.16); just because this has been outstanding for SO long

@jorisvandenbossche
Copy link
Member

good summary!

@TomAugspurger TomAugspurger mentioned this pull request May 15, 2014
3 tasks
@TomAugspurger
Copy link
Contributor Author

Added to issue for 0.15 deprecations: #7136

What I just pushed up should implement the keep same default for now, but deprecate and warn strategy.

I also made a shared docstring for DataFrame.boxplot and pandas.tools.plotting.boxplot.

Finally, I was getting an error on one of the tests that checked a figsize since I change my default figsize in my .matplotlib.matplotlibrc file. So I reset the matpltolib paramsRC in the setUp of the plotting tests.

fontsize : int or string
rot : label rotation angle
figsize : A tuple (width, height) in inches
grid : Setting this to True will show the grid
layout : tuple (optional)
(rows, columns) for the layout of the plot
return_dict : {'axes', 'dict', 'both'}, default 'dict'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return_type here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@TomAugspurger
Copy link
Contributor Author

Oh my, the return type gets more complicated. df.groupby('g').boxplot() will always return a dict of whatever return_type is. So either a {dicts}, {axes}, {namedtuples} I'll add a note to the plotting docs.

I also should look into changing the repr of a named tuple. This isn't great.

In [16]: df.boxplot(return_type='both')
Out[16]: Boxplot(ax=<matplotlib.axes._subplots.AxesSubplot object at 0x112e64090>, lines={'boxes': [<matplotlib.lines.Line2D object at 0x112f024d0>, <matplotlib.lines.Line2D object at 0x112f09590>, <matplotlib.lines.Line2D object at 0x112f04f10>, <matplotlib.lines.Line2D object at 0x110af9dd0>, <matplotlib.lines.Line2D object at 0x110b02a10>], 'fliers': [<matplotlib.lines.Line2D object at 0x112f07f10>, <matplotlib.lines.Line2D object at 0x112f04b50>, <matplotlib.lines.Line2D object at 0x110af9790>, <matplotlib.lines.Line2D object at 0x110b023d0>, <matplotlib.lines.Line2D object at 0x110b09fd0>], 'medians': [<matplotlib.lines.Line2D object at 0x112f078d0>, <matplotlib.lines.Line2D object at 0x112f04510>, <matplotlib.lines.Line2D object at 0x110af9150>, <matplotlib.lines.Line2D object at 0x110afdd50>, <matplotlib.lines.Line2D object at 0x110b09990>], 'means': [], 'whiskers': [<matplotlib.lines.Line2D object at 0x112f02750>, <matplotlib.lines.Line2D object at 0x11590a290>, <matplotlib.lines.Line2D object at 0x112f09b90>, <matplotlib.lines.Line2D object at 0x112f0c210>, <matplotlib.lines.Line2D object at 0x112f117d0>, <matplotlib.lines.Line2D object at 0x112f11e10>, <matplotlib.lines.Line2D object at 0x110afb410>, <matplotlib.lines.Line2D object at 0x110afba50>, <matplotlib.lines.Line2D object at 0x110b05050>, <matplotlib.lines.Line2D object at 0x110b05690>], 'caps': [<matplotlib.lines.Line2D object at 0x112f02c10>, <matplotlib.lines.Line2D object at 0x112f07290>, <matplotlib.lines.Line2D object at 0x112f0c850>, <matplotlib.lines.Line2D object at 0x112f0ce90>, <matplotlib.lines.Line2D object at 0x112f15490>, <matplotlib.lines.Line2D object at 0x112f15ad0>, <matplotlib.lines.Line2D object at 0x110afd0d0>, <matplotlib.lines.Line2D object at 0x110afd710>, <matplotlib.lines.Line2D object at 0x110b05cd0>, <matplotlib.lines.Line2D object at 0x110b09350>]})

@TomAugspurger
Copy link
Contributor Author

Fixed that wrong kwarg in the docstring and added a note about the return type from a grouped boxplot.

This should be ready

@jreback
Copy link
Contributor

jreback commented May 16, 2014

looks good

@@ -258,6 +261,10 @@ Deprecations
Use the `percentiles` keyword instead, which takes a list of percentiles to display. The
default output is unchanged.

- The default return type of :func:`boxplot` will change from a dict to a matpltolib Axes
in a future release. You can use the future behavior now by passing ``return_type='dict'``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be 'axes'?

@TomAugspurger
Copy link
Contributor Author

Thanks.

On May 16, 2014, at 3:26 PM, "jreback" <notifications@github.commailto:notifications@github.com> wrote:

merged via a0692behttps://github.com/pydata/pandas/commit/a0692be4ed56ab6c2eb91462a03d992d48ecb915


Reply to this email directly or view it on GitHubhttps://github.com//pull/7096#issuecomment-43375630.

@jreback
Copy link
Contributor

jreback commented Aug 23, 2015

@TomAugspurger this is deprecated; shall we fix for 0.17.0?

TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Sep 3, 2016
Part of pandas-dev#6581
Deprecation started in pandas-dev#7096

Changes the default value of `return_type` in DataFrame.boxplot
and DataFrame.plot.box from None to 'axes'.
jorisvandenbossche pushed a commit that referenced this pull request Sep 4, 2016
* DEPR: Change boxplot return_type kwarg

Part of #6581
Deprecation started in #7096

Changes the default value of `return_type` in DataFrame.boxplot
and DataFrame.plot.box from None to 'axes'.

* API: Change faceted boxplot return_type

Aligns behavior of `Groupby.boxplot` and DataFrame.boxplot(by=.)
to return a Series.
@TomAugspurger TomAugspurger deleted the boxplot-ax branch November 3, 2016 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.boxplot returns a dict instead of axes
6 participants