Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Restore implicit converter registration #18307

Merged
merged 26 commits into from
Dec 7, 2017

Conversation

TomAugspurger
Copy link
Contributor

Closes #18301

TODO:

  • find best home for the import
  • update deprecation message

@TomAugspurger
Copy link
Contributor Author

I added the "special" section in the whatsnew for this and the changes we're making to NaN/empty sum.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!


This caused problems for some users, so we're temporarily reverting that change;
pandas will again register the converters on import. Using the converters
without explicitly registering the formatters will cause a ``FutureWarning``:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not use 'the formatters' as it actually points to the same as 'converters' but with another word, just making it confusing. I would either just leave it out or use 'them' to avoid repeating converters

Copy link
Contributor Author

@TomAugspurger TomAugspurger Nov 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant to use formatters converters everywhere. Fixing.

>>> from pandas.tseries import converter
>>> converter.register()

As the error message says, you'll need to register the converts if you intend to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converts -> converters


As the error message says, you'll need to register the converts if you intend to
use them with matplotlib plotting functions. Pandas plotting functions, such as
``Series.plot``, will register them for you. (:issue:`18301`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a bit more explicit like: "When using pandas plotting functions, such as Series.plot, the converters will be registered for you and you do not need to do this explicitly"

@@ -2352,7 +2354,6 @@ def boxplot_frame_groupby(grouped, subplots=True, column=None, fontsize=None,
>>> grouped = df.unstack(level='lvl1').groupby(level=0, axis=1)
>>> boxplot_frame_groupby(grouped, subplots=False)
"""
_setup()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one doesn't need to be replaced with _converter.WARN = False ?

plt = pytest.importorskip('matplotlib.pyplot')


def test_warns():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this duplicate with the other test file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, forgot to remove it from my first commit.

@TomAugspurger
Copy link
Contributor Author

What are people's thoughts on moving the register import to pandas.plotting instead of pandas.tseries? I'm not sure I see the benefit, unless we're planning to remove pandas.tseries in the future.

@codecov
Copy link

codecov bot commented Nov 15, 2017

Codecov Report

Merging #18307 into master will decrease coverage by 0.01%.
The diff coverage is 84.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18307      +/-   ##
==========================================
- Coverage   91.39%   91.38%   -0.02%     
==========================================
  Files         164      164              
  Lines       49854    49869      +15     
==========================================
+ Hits        45566    45571       +5     
- Misses       4288     4298      +10
Flag Coverage Δ
#multiple 89.18% <84.61%> (ø) ⬆️
#single 39.45% <30.76%> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/plotting/_core.py 82.37% <66.66%> (-0.09%) ⬇️
pandas/plotting/_converter.py 65.92% <94.11%> (+0.67%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/tseries/converter.py 100% <0%> (+100%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c799e2...ba88a04. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 15, 2017

Codecov Report

Merging #18307 into master will increase coverage by 0.08%.
The diff coverage is 92.06%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18307      +/-   ##
==========================================
+ Coverage   91.35%   91.43%   +0.08%     
==========================================
  Files         164      157       -7     
  Lines       49802    51424    +1622     
==========================================
+ Hits        45496    47019    +1523     
- Misses       4306     4405      +99
Flag Coverage Δ
#multiple 89.3% <92.06%> (+0.16%) ⬆️
#single 40.58% <42.85%> (-0.3%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/converter.py 100% <100%> (+100%) ⬆️
pandas/core/config_init.py 98.46% <100%> (+0.11%) ⬆️
pandas/plotting/_core.py 82.37% <66.66%> (-0.13%) ⬇️
pandas/plotting/_converter.py 66.52% <97.14%> (+1.26%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/util/testing.py 81.6% <0%> (-18.4%) ⬇️
pandas/io/json/json.py 91.75% <0%> (-8.25%) ⬇️
pandas/io/formats/style.py 96.21% <0%> (-3.79%) ⬇️
pandas/core/indexes/interval.py 93.05% <0%> (-0.17%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7627cca...ee7b457. Read the comment docs.

@jorisvandenbossche
Copy link
Member

Should we also add some language on how to avoid this warning without registering the converters by default? So if you in the future don't want to manually register them, but rely on the matplotlib ones (eg when working with datetime.datetime) ?

@TomAugspurger
Copy link
Contributor Author

I guess a pandas option?

pd.options.register_formatters : bool

And we check that before ever calling register?

@TomAugspurger
Copy link
Contributor Author

Though, future-proofing a bit, should that be spread across several options, one per type? We currently register for

    units.registry[lib.Timestamp] = DatetimeConverter()
    units.registry[Period] = PeriodConverter()
    units.registry[pydt.datetime] = DatetimeConverter()
    units.registry[pydt.date] = DatetimeConverter()
    units.registry[pydt.time] = TimeConverter()
    units.registry[np.datetime64] = DatetimeConverter()

In the ideal future, we would I think just do Timestamp and Period, and matplotlib would have datetime64, I don't know who wants to own python dates / times / datetimes.

cc @tacaswell if any of this interests you.

@TomAugspurger
Copy link
Contributor Author

Added a rough implementation of the option in 16c33f1.

In [1]: import pandas as pd

In [2]: import matplotlib.pyplot as plt

In [3]: fig, ax = plt.subplots()

In [4]: s = pd.Series(range(12), index=pd.date_range('2017', periods=12))

In [5]: pd.options.plotting.matplotlib.register_formatters = False

In [6]: ax.plot(s)
Out[6]: [<matplotlib.lines.Line2D at 0x1131ce588>]

@jreback
Copy link
Contributor

jreback commented Nov 16, 2017

What are people's thoughts on moving the register import to pandas.plotting instead of pandas.tseries? I'm not sure I see the benefit, unless we're planning to remove pandas.tseries in the future.

if its not too much trouble. I would like to at least deprecate pandas.tseries

@tacaswell
Copy link
Contributor

Thanks ❤️ .


As the error message says, you'll need to register the converters if you intend
to use them with matplotlib plotting functions. Pandas plotting functions, such
as ``Series.plot``, will register them for you calling ``converter.register()``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for you calling" -> "for you, and calling"

as ``Series.plot``, will register them for you calling ``converter.register()``
first is not necessary.

Finally, control the formatters, we've added a new option:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to" control

@jorisvandenbossche
Copy link
Member

Some other things related to the option:

  • It seems you kept the terminology of 'converters' (and not 'formatters') throughout the whatsnew and code? (which is I think the terminology of matplotlib, as the converter calls a formatter). But the option is called 'register_formatters'. So we should choose here.

  • When you set the pd.options.plotting.matplotlib.register_formatters = False, they will also not get registered when using pandas' plot methods ? I don't think that is the case, but shouldn't it be?

  • The "unregistering" (setting pd.options.plotting.matplotlib.register_formatters = False) will also remove the converters for datetime.datetime, which are actually the default ones. Ideally they are set back as well. @tacaswell Is there a way to restore the default units? (like with rcParams). Of course we can also save them when we overwrite them, to set it back properly.

  • I would make the option maybe a bit shorter, like pd.options.plotting.register_matplotlib_formatters (not shorter, but a '.' less so quicker to tab-complete) ?

converter.register()
with ctx:
with tm.assert_produces_warning(None) as w:
ax.plot(s.index, s.values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you did set the option to False, it should not warn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forget, that is what is being tested :-) (and they have been registered manually afterwards)

Should we test what happens if you do not register manually after you set the option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean set the option to True or False? I think my latest commit does that.

After setting it to False, I think we should not warn even if they don't register manually (which is what my tests checks hopefully)

@TomAugspurger
Copy link
Contributor Author

It seems you kept the terminology of 'converters' (and not 'formatters')

Switched to formatters everywhere.

they will also not get registered when using pandas' plot methods ?

Correct. To be more specific, they're always registered when pandas is imported. The option toggles whether or not they're in the matplotlib.units.registry dict. True adds them, False removes them. That means that doing

In [8]: pd.options.plotting.matplotlib.register_formatters = False

In [9]: s.plot()

Will fall back to whatever matplotlib does with datetimes / periods (potentially an exception, e.g. for periods).

will also remove the converters for datetime.datetime, which are actually the default ones

Hmm, not sure about that. My registry is empty upon import:

In [1]: import matplotlib.units

In [2]: matplotlib.units.registry
Out[2]: {}

Though maybe I'm misunderstanding. The intent is return the registry to the state before we mucked with it.

I would make the option maybe a bit shorter

Yeah, thoughts on pd.options.plotting.mpl.converters? I want the .mpl / matplotlib namespace for if / when we allow different engines for .plot.

@tacaswell
Copy link
Contributor

The unit registry gets filled on import. If you import matplotlib.dates they will be added. These imports are done in pyplot so most users have this import done.

@TomAugspurger
Copy link
Contributor Author

Ah thanks.

In [1]: import matplotlib.units

In [2]: import matplotlib.pyplot as plt

In [3]: matplotlib.units.registry
Out[3]:
{str: <matplotlib.category.StrCategoryConverter at 0x10beb7f60>,
 numpy.str_: <matplotlib.category.StrCategoryConverter at 0x10beb7f28>,
 bytes: <matplotlib.category.StrCategoryConverter at 0x10beb7ef0>,
 numpy.bytes_: <matplotlib.category.StrCategoryConverter at 0x10beb7f98>,
 datetime.date: <matplotlib.dates.DateConverter at 0x10bf2c208>,
 datetime.datetime: <matplotlib.dates.DateConverter at 0x10bf2c240>}

In [4]: import pandas as pd

In [5]: matplotlib.units.registry
Out[5]:
{str: <matplotlib.category.StrCategoryConverter at 0x10beb7f60>,
 numpy.str_: <matplotlib.category.StrCategoryConverter at 0x10beb7f28>,
 bytes: <matplotlib.category.StrCategoryConverter at 0x10beb7ef0>,
 numpy.bytes_: <matplotlib.category.StrCategoryConverter at 0x10beb7f98>,
 datetime.date: <pandas.plotting._converter.DatetimeConverter at 0x10bf2c240>,
 datetime.datetime: <pandas.plotting._converter.DatetimeConverter at 0x10eaa3828>,
 pandas._libs.tslib.Timestamp: <pandas.plotting._converter.DatetimeConverter at 0x10eaa3780>,
 pandas._libs.period.Period: <pandas.plotting._converter.PeriodConverter at 0x10eaa37b8>,
 datetime.time: <pandas.plotting._converter.TimeConverter at 0x10bf2c208>,
 numpy.datetime64: <pandas.plotting._converter.DatetimeConverter at 0x10eaac0f0>}

In [6]: pd.options.plotting.matplotlib.register_formatters = False

In [7]: matplotlib.units.registry
Out[7]:
{str: <matplotlib.category.StrCategoryConverter at 0x10beb7f60>,
 numpy.str_: <matplotlib.category.StrCategoryConverter at 0x10beb7f28>,
 bytes: <matplotlib.category.StrCategoryConverter at 0x10beb7ef0>,
 numpy.bytes_: <matplotlib.category.StrCategoryConverter at 0x10beb7f98>}

So not great... We could of course maintain our own global state when we overwrite them... Though we'll have to make sure matplotlib.pyplot is imported first. Let me give that a shot.

@jorisvandenbossche
Copy link
Member

I have no problem with providing this optinon, but why would you make this True by default? it is conceptually false now.

It has always been conceptually True (up to 0.21.0), so if we revert, then the default would be True. And from 0.21.0 you could say that it is conceptually False, but that is also not really correct, as it is False on import but True once you do plotting with pandas. In this PR, the False option will also not register them when plotting datetimes (so that will raise an error with current matplotlib)

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Nov 29, 2017

Another issue for down the road matplotlib/matplotlib#9779 doesn't handle DatetimeIndexes:

screen shot 2017-11-29 at 10 09 36 am

Though hopefully this can be handled on the matplotlib side by getting the values.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import units

x = pd.date_range("2017", periods=120, freq='H')
y = range(120)
z = x.tz_localize("US/Central")
%matplotlib inline

plt.plot(x, y);

@jorisvandenbossche
Copy link
Member

Yes, I think that should be easily solvable in matplotlib. I was still planning to run all the different reported examples in pandas issues related to this with matplotlib master to see if it would solve those cases.

@TomAugspurger
Copy link
Contributor Author

For MPL 2.2+, do we want to overwrite their converters with our own? I think no, but I'll have to check the differences.

@jorisvandenbossche
Copy link
Member

To summarize, I think in the long term we want those options:

  • False: never register our converters (also not when pandas plotting is used)
  • True: register our converters, so also available for pure matplotlib plots
  • 'auto': only use our converters for plots made by our .plot(..) method. Here I think we still want to overwrite matplotlib's converters (but that's maybe up for discussion)

Before 0.21.0, 'True' was the default, and in the long term we want 'auto' to become the default.

But as a transition period, I think we want a hybrid of True/'auto', which would be 'auto' + deprecation warning for implicitly used converters (i.e. register our converters but with warn=True). And indeed there I would only do this warning for those types that are in the meantime not supported by matplotlib itself, and thus not overwrite the ones matplotlib in the meantime added.
Not overwriting those can still mean some breaking differences for users, but much less as is the case now (the underlying plotted float data are different, but in principle a normal user never cares about that + the formatting of the axes will be a bit different)

But, my proposal it to now do a complete revert for 0.21.1, and start the transition period only in 0.22.0 (or the first release when matplotlib 2.2 is out).

@TomAugspurger
Copy link
Contributor Author

Sorry for the delay on this. I'm finally done with traveling and conferences for a bit.

This PR now

  • Adds the options for easily adding / removing our converters
  • Moved pandas.tseries.register to pandas.plotting.register_matplotlib_converters
  • Adds the scaffolding for warning in the next version about implicitly registered formatters, but doesn't actually use this yet, as the default is to not warn

So, after MPL 2.2, we should just need to change the default in register to be True, in which case we'll warn on matplotlib plotting methods.

- actually switch the default to not warn
- We do overwrite matplotlib's formatters
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me, thanks a lot!

Some minor comments.
Should we add a test that currenlty on import they are already registered?

for regular ``matplotlib.pyplot`` plotting methods, so we're temporarily
reverting that change; pandas will again register the converters on import.
Pandas plotting functions, such as ``Series.plot``, will continue to register
them for you.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence now feels a bit out of place, as the previous one already said that they are registered again on import.

.. code-block:: python

>>> from pandas.plotting import register_matplotlib_converters
>>> register_matplotlib_converters()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I would show that here. Because people who glance over it might think they need to add that to their code, which is not the case.

def register():
from pandas.plotting._converter import register as register_
msg = ("'pandas.tseries.converter.register' has been moved and renamed to "
"'pandas.plotting.register_converters'. ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register_converters -> register_matplotlib_converters()

@jreback
Copy link
Contributor

jreback commented Dec 1, 2017

@tacaswell can you have a look

@TomAugspurger
Copy link
Contributor Author

Should we add a test that currenlty on import they are already registered?

I thought about that, but there are test-order issues with that. Do you know a clean way of running a single test in it's own process? I could use subprocess, but that seems fragile.

@jorisvandenbossche
Copy link
Member

I thought about that, but there are test-order issues with that. Do you know a clean way of running a single test in it's own process? I could use subprocess, but that seems fragile.

Not directly, googling said pytest does not have a way to specify this, but found this: https://stackoverflow.com/questions/45462374/mark-test-to-be-run-in-independent-process (not sure how robust it is)

@TomAugspurger
Copy link
Contributor Author

Added ee7b457 with a test using subprocess. I worry that's fragile to PATH issues, but let's see.

Copy link
Contributor

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with this long term plan.

@jorisvandenbossche
Copy link
Member

@TomAugspurger Let's merge this?

@TomAugspurger TomAugspurger merged commit 24b8f1e into pandas-dev:master Dec 7, 2017
@TomAugspurger TomAugspurger deleted the depr-converter branch December 7, 2017 15:04
TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017
* API: Restore implicit converter registration

* Remove matplotlib from blacklist

* fixup! Remove matplotlib from blacklist

* Add option for toggling formatters

* Remove move

* Handle no matplotlib

* Cleanup

* Test no register

* Restore original state

* Added deregister

* Doc, naming

* Naming

* Added deprecation

* PEP8

* Fix typos

* Rename it all

* Missed one

* Check version

* No warnings by default

* Update release notes

* Test fixup

- actually switch the default to not warn
- We do overwrite matplotlib's formatters

* Doc update

* Fix deprecation message

* Test added by default
@jorisvandenbossche
Copy link
Member

I opened #18720 as a follow-up issue about the 'auto' behaviour (only modifying matplotlibs units registry in pandas' plotting context)

TomAugspurger added a commit that referenced this pull request Dec 11, 2017
* API: Restore implicit converter registration

* Remove matplotlib from blacklist

* fixup! Remove matplotlib from blacklist

* Add option for toggling formatters

* Remove move

* Handle no matplotlib

* Cleanup

* Test no register

* Restore original state

* Added deregister

* Doc, naming

* Naming

* Added deprecation

* PEP8

* Fix typos

* Rename it all

* Missed one

* Check version

* No warnings by default

* Update release notes

* Test fixup

- actually switch the default to not warn
- We do overwrite matplotlib's formatters

* Doc update

* Fix deprecation message

* Test added by default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants