Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't plot pd.Timedelta (or numpy.timedelta64) versus time #9154

Closed
s-celles opened this issue Dec 26, 2014 · 12 comments
Closed

Can't plot pd.Timedelta (or numpy.timedelta64) versus time #9154

s-celles opened this issue Dec 26, 2014 · 12 comments
Labels
Duplicate Report Duplicate issue or pull request Timedelta Timedelta data type Visualization plotting
Milestone

Comments

@s-celles
Copy link
Contributor

s-celles commented Dec 26, 2014

Hello,

I did

import numpy as np
import pandas as pd
from matplotlib.pyplot as plt

idx = pd.date_range('20140101', '20140201')
df = pd.DataFrame(index=idx)
df['col0'] = np.random.randn(len(idx))
s_idx = pd.Series(idx, index=idx) # need to do this because we can't shift index
diff_idx = (s_idx-s_idx.shift(1)).fillna(pd.Timedelta(0))
df['diff_dt'] = diff_idx
df['diff_dt'].plot()

but it raises Empty 'Series': no numeric data to plot

In [78]: df.dtypes
Out[78]:
col0               float64
diff_dt    timedelta64[ns]
dtype: object


In [79]: type(df['diff_dt'][1])
Out[79]: pandas.tslib.Timedelta

I don't understand if data inside diff_dt columns are numpy.timedelta64 or pd.Timedelta

df['diff_dt'].map(lambda x: x.value)

raises AttributeError: 'numpy.timedelta64' object has no attribute 'value'

but it seems that I can get valuefor a given data (let's say row 10)

In [97]: df['diff_dt'][10].value
Out[97]: 86400000000000

I don't understand why...

But I also don't understand how I could plot diff_dt column without doing an uggly:

df['diff_dt'].map(lambda x: x/np.timedelta64(1, 'ns')).plot()

But I don't know how I could automatically get this np.timedelta64(1, 'ns')

Maybe Pandas could plot Timedelta (or np.timedelta64) out of the box ?
because that's interesting to know for example if sampling period is constant.

Kind regards

@jreback
Copy link
Contributor

jreback commented Dec 26, 2014

dupe of #8711

@jreback jreback closed this as completed Dec 26, 2014
@jreback jreback added Timedelta Timedelta data type Visualization plotting labels Dec 26, 2014
@drevicko
Copy link

@jreback I don't think this is a dupe of #8711. That's about formatting, this one is about getting an exception when you try to plot timedelta64 data. Or rather, the first code snippet and subsequent exception is not yet solved, and still present in 0.20.3

@jreback
Copy link
Contributor

jreback commented Sep 14, 2017

pandas doesn't suooort plotting timedeltas out of the box

@drevicko
Copy link

ok. thanks for the quick response.

Is there an easy/best practice workaround?

I tried with astype('timedelta64[m]') and it worked - other units could be used there. There's an SO question about the exception - care to answer it (or I will if you're not keen).

@pratapvardhan
Copy link
Contributor

@drevicko -- try (x5.task_a / np.timedelta64(1, 'h')).plot.kde()? This would be hours on x-axis distribution, use np.timedelta64(1, 'm') for minutes.

@drevicko
Copy link

drevicko commented Sep 15, 2017

@pratapvardhan Thanks, though for me, x5.task_a.astype('timedelta64[h]') is a bit tidier but it rounds the conversion - perhaps not as a big problem if you use minutes as units ('timedelta64[m]'). Rounding actually perturbs the resulting plot/histogram, and can give you an incorrect idea bout the data!!!

I like plotting the with the kernel density estimate .plot.kde(). For others that come here, you can also do a histogram with x5.task_a.astype('timedelta64[m]').hist() and the usual matplotlib hist() parameters if you don't like the defaults.

@jorisvandenbossche
Copy link
Member

@jreback #8711 was merged for 0.20.0, so we do support timedelta plotting no?

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Sep 15, 2017

Actually, this works on master now (only the formatting of the axis is not that informative (uses the integer values)).

Suppose this was closed by #17430

@scls19fr @drevicko could you try on master to verify ?

@jorisvandenbossche
Copy link
Member

So closing as duplicate of #16953 (which was actually also opened by @scls19fr).

@jorisvandenbossche jorisvandenbossche added the Duplicate Report Duplicate issue or pull request label Sep 15, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.21.0 milestone Sep 15, 2017
@s-celles
Copy link
Contributor Author

I didn't remembered this 2014 bug when I opened #16953. I'm sorry about this duplicate.
But I'd like to say that plotting only integer values is not very informative (not informative enough I should probably say)

@jorisvandenbossche
Copy link
Member

Ah sorry, didn't see this issue was from 2014, I thought it was opened today :-)

You are certainly correct about it not being very informative. So PR #15067 added better formatting, but I suppose only for the x axis. Do you want to open an issue to generalize this better timedelta formatting to all cases ?

@s-celles
Copy link
Contributor Author

I'm currently building latest Pandas (with a MacBook Air)... so it may be long.
If I open an issue to generalize this better timedelta formatting to all cases, I prefer to do it with latest Pandas version (master) to show some screenshots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Timedelta Timedelta data type Visualization plotting
Projects
None yet
Development

No branches or pull requests

5 participants