Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds custom plot formatting for TimedeltaIndex. #15067

Closed
wants to merge 16 commits into from

Conversation

jgoppert
Copy link
Contributor

@jgoppert jgoppert commented Jan 5, 2017

Instead of hacking the PeriodIndex plotting code to work for TimedeltaIndex, this just creates a simple formatter unique to TimedeltaIndex. It does fine for regular intervals up to a day, then the labels and run into eachothter. If this happens, you can use plt.autofmt_xdate() to fix it.

d.t_sensor_gyro_0__f_x.plot()
plt.grid()

image

index =  pd.timedelta_range('1 day 2 hr 30 min 10 s', periods=10, freq='1 d')
s = pd.Series(np.random.randn(len(index)), index)
s.plot()
plt.gcf().autofmt_xdate()
plt.grid()

image

index =  pd.timedelta_range('1 day 2 hr 30 min 10 s', periods=10, freq='1 ns')
s = pd.Series(np.random.randn(len(index)), index)
s.plot()
plt.gcf().autofmt_xdate()
plt.grid()

image

index =  pd.timedelta_range('0s', periods=10, freq='100000 ns')
s = pd.Series(np.random.randn(len(index)), index)
s.plot()
plt.gcf().autofmt_xdate()
plt.grid()
plt.xlabel('time')
plt.ylabel('y')

image

@sinhrks sinhrks added Timedelta Timedelta data type Visualization plotting labels Jan 5, 2017
@jgoppert
Copy link
Contributor Author

jgoppert commented Jan 5, 2017

I think this is ready to merge, it is functional for me now.

@codecov-io
Copy link

codecov-io commented Jan 5, 2017

Codecov Report

Merging #15067 into master will decrease coverage by -4.85%.
The diff coverage is 73.46%.

@@            Coverage Diff             @@
##           master   #15067      +/-   ##
==========================================
- Coverage   90.37%   85.52%   -4.85%     
==========================================
  Files         135      145      +10     
  Lines       49476    51325    +1849     
==========================================
- Hits        44713    43896     -817     
- Misses       4763     7429    +2666
Impacted Files Coverage Δ
pandas/tseries/converter.py 62.95% <100%> (+1.1%)
pandas/tools/plotting.py 68.48% <100%> (-3.31%)
pandas/tseries/plotting.py 61.78% <55.17%> (-2.96%)
pandas/io/sas/sas_constants.py 0% <ø> (-100%)
pandas/io/sas/sas7bdat.py 0% <ø> (-90.67%)
pandas/io/sas/sas_xport.py 0% <ø> (-90.28%)
pandas/io/sas/sasreader.py 7.4% <ø> (-77.78%)
pandas/computation/init.py 76.92% <ø> (-13.99%)
pandas/util/decorators.py 60.78% <ø> (-7.83%)
... and 93 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f62e8f2...945ec14. Read the comment docs.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this @jgoppert. A few inline comments.

Could you also

  • Add a note in doc/source/visualization.rst showing how to use plt.gcf().autofmt_xdate() after plotting
  • Add a note in doc/source/whatsnew/v0.20.0.txt

raise IOError('index type not supported')

# Note, DatetimeIndex does not use this
# interface. DatetimeIndex uses matplotlib.date directly

# x and y coord info
subplot.format_coord = lambda t, y: (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this code be running when index is a TimedeltaIndex? I think it's controlling the stuff in the bottom right of the attached screenshot.

screen shot 2017-01-05 at 8 30 12 am

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the format_coord section to the period index section, the draw_if_interactive I left common.

subplot.xaxis.set_major_formatter(majformatter)
subplot.xaxis.set_minor_formatter(minformatter)

elif isinstance(index, TimedeltaIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for this in pandas/tests/plotting/test_datetimelike.py?

You should just need a couple of datasets

  1. with n_decimals > 9
  2. with n_decimals < 9

In the tests, if you could check that the formatting is correct, and maybe check the number of ticklabels.

@@ -278,8 +280,22 @@ def _maybe_convert_index(ax, data):
# Patch methods for subplot. Only format_dateaxis is currently used.
# Do we need the rest for convenience?


def format_dateaxis(subplot, freq):
def timeTicks(x, pos, n_decimals):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add tests for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, needs more work though, see comment below.

@@ -278,8 +280,22 @@ def _maybe_convert_index(ax, data):
# Patch methods for subplot. Only format_dateaxis is currently used.
# Do we need the rest for convenience?


def format_dateaxis(subplot, freq):
def timeTicks(x, pos, n_decimals):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> timeTicks -> format_time_ticks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched it to format_timedelta_ticks to be very clear it isn't being used for period index.

d, h = divmod(h, 24)
decimals = int(ns * 10**(n_decimals - 9))
s = r'{:02d}:{:02d}:{:02d}'.format(int(h), int(m), int(s))
if n_decimals > 0:
Copy link
Contributor

@jreback jreback Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are pretty much reinveting the wheel here.

In [2]: pd.Timedelta(20,unit='s')
Out[2]: Timedelta('0 days 00:00:20')

In [3]: pd.Timedelta(20.5,unit='s')
Out[3]: Timedelta('0 days 00:00:20.500000')

In [4]: pd.Timedelta(120.5,unit='s')
Out[4]: Timedelta('0 days 00:02:00.500000')

# if you want to get the data out for specific formatting
In [5]: pd.Timedelta(120.5,unit='s').components
Out[5]: Components(days=0, hours=0, minutes=2, seconds=0, milliseconds=500, microseconds=0, nanoseconds=0)

Copy link
Contributor

@jreback jreback Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can simply call Timedelta(..)._repr_base(format='...') with a new format if you really want it

In [16]: pd.Timedelta('1 day')._repr_base()
Out[16]: '1 days'

In [17]: pd.Timedelta('1 day')._repr_base(format='long')
Out[17]: '1 days 00:00:00'

In [18]: pd.Timedelta('1 day')._repr_base(format='all')
Out[18]: '1 days 00:00:00.000000000'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, python timedelta doesn't support nanoseconds.

In [4]: pd.Timedelta(1e-9, unit='s')
Out[4]: Timedelta('0 days 00:00:00')

In [5]: pd.Timedelta(1e-3, unit='s')
Out[5]: Timedelta('0 days 00:00:00.001000')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course it does. This is pandas Timedelta. (you are looking at the printing), which by-default suppres it

In [6]: pd.Timedelta(1, unit='ns')._repr_base(format='all')
Out[6]: '0 days 00:00:00.000000001'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great, got confused by the output. So how do I say I want n decimals, because that would be a drop-in for what I've already written.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, suppressing days when they are 0 would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the

Components(days=0, hours=0, minutes=2, seconds=0, milliseconds=500, microseconds=0, nanoseconds=0)

could compress the code a bit, but I'm not sure it would save significant space over what I've already got.

jgoppert added a commit to jgoppert/pandas that referenced this pull request Jan 5, 2017
Still need to improve unit testing.
for l in xaxis.get_ticklabels():
if len(l.get_text()) > 0:
self.assertEqual(l.get_rotation(), 30)

Copy link
Contributor Author

@jgoppert jgoppert Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test needs more work.

For this test, we basically want to check that for a narrow window (as shown below) we get nanosecond precision, and if at 0 to start without days.

rng = timedelta_range('0', periods=3, freq='ns')

# rough code to follow
expected_labels = [
 '00:00:00.000000001',
 '00:00:00.000000002,
 '00:00:00.000000001']
for label, expected_label in zip(ax.get_xtick_labels(), expected_labels):
   # check that there are no days
   self.assertEqual(label, expected_label)

We also want to check for a wide window that we have second precision.

rng = timedelta_range('0', periods=10, freq='1 d')

# test issue #8711
s = Series(range(5), timedelta_range('1day', periods=5))
_check_plot_works(s.plot)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea if check_plots_works will actually do anything useful here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically just tests that the plot call does not error. So, for example, that means that it will test that it does not raise a MemoryError like it does now in some cases

@@ -116,6 +116,7 @@ Other enhancements

- ``.select_dtypes()`` now allows the string 'datetimetz' to generically select datetimes with tz (:issue:`14910`)

- ``pd.TimedeltaIndex`` now has a custom datetick formatter specifically designed for nanosecond level precision (:issue:`8711`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a link to visualization.rst?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is certainly possible, we only have not really a section about timedelta plotting currently.

@jgoppert
Copy link
Contributor Author

jgoppert commented Jan 5, 2017

@TomAugspurger @jreback do you guys think you can help me clean this up. I'm not familiar with your docs and unit tests and have to get back to what I wrote this for today.

I gave you push access to my branch since I can't push the branch to upstream.

@jgoppert
Copy link
Contributor Author

jgoppert commented Jan 5, 2017

I just pushed the full implementation of the unit test. Can we merge this now? It is definitely an improvement over master's mem allocation error and we can revisit the small details later.

@jreback
Copy link
Contributor

jreback commented Jan 5, 2017

@jgoppert things are merged when they pass all tests, have documentation, and unit tests to the satisfaction of the maintainers. This has not been reviewed as yet.

of course if you or really anyone wants to pull in your commits on top of master, they can certainly do this.

@jgoppert
Copy link
Contributor Author

jgoppert commented Jan 5, 2017

I understand, I'm just running out of development bandwidth and don't want it to stall. Let me know what the delta is and I'll try to do what I can.

@jgoppert
Copy link
Contributor Author

jgoppert commented Jan 5, 2017

@jreback Any idea why clang 3.5 failed to set the xticks label?

@jreback
Copy link
Contributor

jreback commented Jan 5, 2017

cc @tacaswell

@jgoppert
Copy link
Contributor Author

jgoppert commented Jan 6, 2017

So weird that osx with python 3.5 keeps failing and all of the others work. I'm running out of ideas.

date tick adjustment from matplotlib for figures whose ticklabels overlap.

See the :meth:`autofmt_xdate <matplotlib.figure.autofmt_xdate>` method and the
`matplotlib documentation <http://matplotlib.org/api/figure_api.html#matplotlib.figure.Figure.autofmt_xdate>`__ for more.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refer here to http://matplotlib.org/users/recipes.html#fixing-common-date-annoyances instead (the link to the api docs will also be generated through the :meth:.. you did, and is also included in this recipe section.

@@ -116,6 +116,7 @@ Other enhancements

- ``.select_dtypes()`` now allows the string 'datetimetz' to generically select datetimes with tz (:issue:`14910`)

- ``pd.TimedeltaIndex`` now has a custom datetick formatter specifically designed for nanosecond level precision (:issue:`8711`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is certainly possible, we only have not really a section about timedelta plotting currently.

def test_timedelta_plot(self):
# test issue #8711
s = Series(range(5), timedelta_range('1day', periods=5))
_check_plot_works(s.plot)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here the three examples you showed in the top post of the PR? (just with the same _check_plot call as you already did for 1 series). At least one of them was producing a memory error currently, so it would be good to include that here to ensure this does not happen again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, just added the 2 easy ones above.

@jgoppert
Copy link
Contributor Author

@jorisvandenbossche I think I addressed your review. Hopefully mac passes CI now.

lambda x, pos: format_timedelta_ticks(x, pos, n_decimals))
subplot.xaxis.set_major_formatter(formatter)
else:
raise IOError('index type not supported')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use TypeError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

def format_dateaxis(subplot, freq):
def format_timedelta_ticks(x, pos, n_decimals):
''' Convert seconds to 'D days HH:MM:SS.F' '''
s, ns = divmod(x, 1e9)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use triple double quotes not single quotes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

elif isinstance(index, TimedeltaIndex):
from matplotlib import ticker
(vmin, vmax) = tuple(subplot.xaxis.get_view_interval())
n_decimals = int(np.ceil(np.log10(100 * 1e9 / (vmax - vmin))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason not to make this interface similar to the above?

e.g. why not create a TimeSeries_TimedeltaFormatter ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in latest commit.

@jgoppert
Copy link
Contributor Author

@jreback Any idea how to proceed with this. Can we put a skip in for mac and file an issue?

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, it would be OK to skip those two tests on travis for mac for now.
Tests that check the exact labels are not that robust anyway, as it can change with matplotlib versions.

What you could test additionaly (and what should pass on mac as well), is that the ax.xaxis.get_major_formatter is the expected formatter (the one you now coded), and pass in a known value to check the formatting.

"""

@staticmethod
def format_timedelta_ticks(x, pos, n_decimals):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the pos here

def format_timedelta_ticks(x, pos, n_decimals):
"""
Convert seconds to 'D days HH:MM:SS.F'
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function can be removed no that you moved it to a Formatter?

df = DataFrame(np.random.randn(len(rng), 3), rng)
ax = df.plot(fontsize=2)
fig = ax.get_figure()
fig.canvas.draw()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think those two lines above are needed?

(and the same for the test above)

@jorisvandenbossche
Copy link
Member

Roughly I mean something like:

# this is part of what you have now in a test
rng = timedelta_range('0', periods=10, freq='ns')
df = DataFrame(np.random.randn(len(rng), 3), rng)
ax = df.plot(fontsize=2)

# this could also test the formatting in another way that should not be skipped on mac osx
formatter =  ax.xaxis.get_major_formatter()
assert isinstance(formatter, TimeSeries_TimedeltaFormatter)
self.assertEqual(formatter(1.), '00:00:00.000000001')

@jreback
Copy link
Contributor

jreback commented Jan 15, 2017

I replicated the exact travis env for 3.5 / OSX and these test passed for me. so ok to skip i guess (though I wonder if these have something to do with some output settings, e.g. LOCALE).

@tacaswell
Copy link
Contributor

tacaswell commented Jan 18, 2017

This is not unreasonable. It might be worth adding a format_data_short method to the formatter to customize how the mouse-over gets displayed (instead of format_coords)

@jreback jreback closed this in 1400305 Feb 22, 2017
jreback added a commit that referenced this pull request Feb 22, 2017
@jreback
Copy link
Contributor

jreback commented Feb 22, 2017

thanks @jgoppert I merged this (and skip those 2 comparisons on mac).

They seem kind of odd, though I couldn't reproduce them with a very similar local environment.

If you want to keep looking at them great (otherwise ok too).

Similarly if you want to do a mouse-over formatter as @tacaswell suggest would be great as well.

@jreback jreback added this to the 0.20.0 milestone Feb 22, 2017
@jreback
Copy link
Contributor

jreback commented Feb 22, 2017

image

hmm the tick locations look a bit odd, is this normal? (this is the just merged PR)

cc @tacaswell
@jgoppert

@jgoppert jgoppert deleted the tdi_plot_fix branch February 22, 2017 18:21
@jgoppert
Copy link
Contributor Author

Thanks for getting this in:

image

When you do rot=45, it works pretty well, but it does't align to tics, the autofmt_xdate does. Also, internally the tics are just set by the nanoseconds, so it isn't intelligent about frequency. If you look at the plot you will notice that the tic locations are at meaningful locations for nanoseconds. Staring at 1 day with a period of 5 starts at an odd place in terms of nanoseconds. If you look at when I just do seconds, it works well because seconds are just related to nano-seconds via a power of 10.

@jreback
Copy link
Contributor

jreback commented Feb 22, 2017

thanks for the expl @jgoppert

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
Author: James Goppert <james.goppert@gmail.com>
Author: James Goppert <jgoppert@users.noreply.github.com>

Closes pandas-dev#8711
Closes pandas-dev#15067 from jgoppert/tdi_plot_fix and squashes the following commits:

945ec14 [James Goppert] Merge branch 'master' into tdi_plot_fix
7db61ec [James Goppert] Create TimeSeries_TimedeltaFormatter.
232efe6 [James Goppert] Fix comment format and exception type for tdi plotting.
4eff697 [James Goppert] Add more time delta series plotting tests.
f5f32bc [James Goppert] Link time delta index docs to better matplotlib docs.
d588c2c [James Goppert] Fixes test for tdi w/o autofmt_xdate.
b6e6a81 [James Goppert] Disables autofmt_xdate testing.
c7851e3 [James Goppert] Adjusts tdi test draw calls to try to fix CI issue.
7d28842 [James Goppert] Switch to draw_idle to try to fix bug on xticks update.
3abc310 [James Goppert] Try plt.draw() instead of canvas.draw() to fix issue on osx 3.5.
91954bd [James Goppert] Finished unit test for timedelta plotting.
41ebc85 [James Goppert] Fixes for review comments from pandas-dev#15067.
f021cbd [James Goppert] Support nano-second level precision x-axis labels.
5ec65fa [James Goppert] Plot fix for tdi and added more comments.
b967d24 [James Goppert] flake8 fixes for tdi plotting.
efe5636 [James Goppert] Adds custom plot formatting for TimedeltaIndex.
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
@arpheno
Copy link

arpheno commented Nov 15, 2017

Is it intentional that this doesn't work for histograms?
image

@timb82
Copy link

timb82 commented Sep 6, 2018

it works only for fixed time step between points on the plot

consider following code:
idx2[0] was changed from 00:00 to 00:01 and xlabels on the plot are missing

import pandas as pd
import matplotlib.pyplot as plt

idx1 = pd.to_timedelta(['00:00:00', '00:00:30', '00:01:00', '00:01:30', '00:02:00'])
idx2 = pd.to_timedelta(['00:00:01', '00:00:30', '00:01:00', '00:01:30', '00:02:00'])
vals = range(5)
s1= pd.Series(vals, idx1)
s2= pd.Series(vals, idx2)

# Labels on x are ok
plt.figure()
plt.gca().set_title('fixed frequency f=30s')
s1.plot()
plt.gcf().autofmt_xdate()
plt.show()

# Labels on x are messed up
plt.figure()
plt.gca().set_title('variable frequency')
s2.plot()
plt.gcf().autofmt_xdate()
plt.show()

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Timedelta Timedelta data type Visualization plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH/BUG: support TimedeltaIndex plotting
9 participants