Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resample yields empty groups #10603

Closed
JonasAbernot opened this issue Jul 16, 2015 · 9 comments · Fixed by #35799
Closed

Resample yields empty groups #10603

JonasAbernot opened this issue Jul 16, 2015 · 9 comments · Fixed by #35799
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@JonasAbernot
Copy link
Contributor

With some parameters, the last group yield by resample is empty. Example :

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.normal(size=(10000,4)))
df.index = pd.timedelta_range(start='0s', periods=10000, freq='3906250n')

df.loc['1s':,:].resample('3s',how=lambda x : len(x))

Depending of the 'how' function used, this can lead to surprising bugs.

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8

pandas: 0.16.0-294-g45f69cd
nose: 1.3.6
Cython: 0.20.2
numpy: 1.9.2
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 3.0.0-dev
sphinx: 1.2.2
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.3
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 0.8.2
pymysql: None
psycopg2: 2.5.3 (dt dec mx pq3 ext)
@jreback
Copy link
Contributor

jreback commented Jul 16, 2015

In [7]: pd.set_option('max_rows',12)

In [8]: df.loc['1s':,:]
Out[8]: 
                        0         1         2         3
00:00:01         0.767847 -1.805006  0.513914 -0.533759
00:00:01.003906  1.034297 -0.873930  1.254777 -0.460738
00:00:01.007812 -1.905457 -0.497061 -0.550036 -0.400423
00:00:01.011718  0.526214 -0.569812 -0.817764  1.204511
00:00:01.015625  0.061491  0.939611  0.308094  1.300434
00:00:01.019531 -0.147869  0.971442  1.239615  0.637635
...                   ...       ...       ...       ...
00:00:39.039062  1.664856 -0.821650 -0.551620 -0.442644
00:00:39.042968  1.133944  0.797726 -0.677378 -0.488098
00:00:39.046875 -0.343148 -0.123394 -1.010421  1.476257
00:00:39.050781  0.311632 -0.418035 -1.200112 -1.735927
00:00:39.054687  0.291330 -0.559795 -0.516269  1.088944
00:00:39.058593  0.918740 -0.516714 -0.415188  0.106167

[9744 rows x 4 columns]

In [9]: df.loc['1s':,:].resample('3s',how=lambda x : len(x))
Out[9]: 
            0    1    2    3
00:00:01  768  768  768  768
00:00:04  768  768  768  768
00:00:07  768  768  768  768
00:00:10  768  768  768  768
00:00:13  768  768  768  768
00:00:16  768  768  768  768
...       ...  ...  ...  ...
00:00:25  768  768  768  768
00:00:28  768  768  768  768
00:00:31  768  768  768  768
00:00:34  768  768  768  768
00:00:37  528  528  528  528
00:00:40    0    0    0    0

[14 rows x 4 columns]

looks correct. The last group is just a point as this is evently divisible.

@jreback
Copy link
Contributor

jreback commented Jul 16, 2015

FYI also show the actual data (generated from the code) if reporting a bug, as its then clear by simply looking what's the problem.

@jreback
Copy link
Contributor

jreback commented Jul 16, 2015

any reason you are not using how='count' (its the same result just much faster)

@sinhrks sinhrks added Usage Question Resample resample method labels Jul 16, 2015
@JonasAbernot
Copy link
Contributor Author

Yep, ok, I wasn't clear enough. The last group contains no point : The index finishes at 39'05, and the group begins at 40'. For the 'count' task, it is actually not a problem. But for others which can't support 0-length objects this can be annoying.

Example:

df = pd.DataFrame(np.random.normal(size=(10000,4)))
df.index = pd.timedelta_range(start='0s', periods=10000, freq='3906250n')
from scipy import fft

Something that works:

In [25]: df.resample('3s',how=lambda x : max(fft(x)))
Out[25]: 
                  0          1          2          3
00:00:00  55.527131  63.876320  50.189927  60.702282
00:00:03  53.586627  63.214890  55.694863  55.196211
00:00:06  63.159294  51.598472  61.389132  60.393747
00:00:09  73.133776  63.760377  69.555783  64.445265
00:00:12  60.349962  48.913074  50.045405  57.562742
00:00:15  58.858030  49.733304  55.012356  62.641561
...             ...        ...        ...        ...
00:00:24  59.661202  61.519860  49.886808  49.105434
00:00:27  48.506358  55.936740  52.039330  57.650969
00:00:30  52.030271  58.446403  59.234081  64.254844
00:00:33  57.767135  56.672450  52.793359  69.297208
00:00:36  56.431251  64.871565  63.356116  67.926122
00:00:39   5.341347   5.263054   5.745918   4.918816

[14 rows x 4 columns]

Something that doesn't :

In [26]: df.loc['1s':,:].resample('3s',how=lambda x : max(fft(x)))
Out[26]: 
Empty DataFrame
Columns: []
Index: []

Just because of generating a empty group (wich is weird), fft fails, and this leads (silently) to this empty DataFrame.

I hope this is clearer now.

(About not using 'count', the only reason is my lack of culture)

@mroeschke mroeschke added Apply Apply, Aggregate, Transform, Map Bug and removed Usage Question labels Oct 9, 2019
@mroeschke
Copy link
Member

This looks to work on master now. Could use a test

In [17]: import pandas as pd
    ...: import numpy as np
    ...:
    ...: df = pd.DataFrame(np.random.normal(size=(10000,4)))
    ...: df.index = pd.timedelta_range(start='0s', periods=10000, freq='3906250n')
    ...:
    ...: df.loc['1s':,:].resample('3s').apply(lambda x: len(x))
Out[17]:
                     0      1      2      3
0 days 00:00:01  768.0  768.0  768.0  768.0
0 days 00:00:04  768.0  768.0  768.0  768.0
0 days 00:00:07  768.0  768.0  768.0  768.0
0 days 00:00:10  768.0  768.0  768.0  768.0
0 days 00:00:13  768.0  768.0  768.0  768.0
0 days 00:00:16  768.0  768.0  768.0  768.0
0 days 00:00:19  768.0  768.0  768.0  768.0
0 days 00:00:22  768.0  768.0  768.0  768.0
0 days 00:00:25  768.0  768.0  768.0  768.0
0 days 00:00:28  768.0  768.0  768.0  768.0
0 days 00:00:31  768.0  768.0  768.0  768.0
0 days 00:00:34  768.0  768.0  768.0  768.0
0 days 00:00:37  528.0  528.0  528.0  528.0

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Apply Apply, Aggregate, Transform, Map Bug Resample resample method labels May 11, 2020
@rmsmani
Copy link

rmsmani commented Jul 29, 2020

@mroeschke
Tested in the latest version, getting the same above result

@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jul 31, 2020
@rmsmani
Copy link

rmsmani commented Aug 2, 2020

I think we can close this issue

@simonjayhawkins
Copy link
Member

I think we can close this issue

the issue is tagged as needs tests. if you can raises a PR adding a test to prevent regressions we could then close this issue.

@tkmz-n
Copy link
Contributor

tkmz-n commented Aug 19, 2020

take

mroeschke pushed a commit that referenced this issue Aug 31, 2020
* REF: remove unnecesary try/except

* TST: add test for agg on ordered categorical cols (#35630)

* TST: resample does not yield empty groups (#10603) (#35799)

* revert accidental rebase

* REF: use BlockManager.apply for Rolling.count

Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com>
Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>
jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Aug 31, 2020
* REF: remove unnecesary try/except

* TST: add test for agg on ordered categorical cols (pandas-dev#35630)

* TST: resample does not yield empty groups (pandas-dev#10603) (pandas-dev#35799)

* revert accidental rebase

* REF: use BlockManager.apply for Rolling.count

Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com>
Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>
jreback pushed a commit that referenced this issue Sep 2, 2020
…ce (#35899)

* REF: remove unnecesary try/except

* TST: add test for agg on ordered categorical cols (#35630)

* TST: resample does not yield empty groups (#10603) (#35799)

* revert accidental rebase

* REF: handle axis=None cases inside DataFrame.all/any

* annotate

* dummy commit to force Travis

Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com>
Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>
jreback pushed a commit that referenced this issue Sep 2, 2020
* REF: remove unnecesary try/except

* TST: add test for agg on ordered categorical cols (#35630)

* TST: resample does not yield empty groups (#10603) (#35799)

* revert accidental rebase

* BUG: BlockSlider not clearing index._cache

* update whatsnew

Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com>
Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>
jreback pushed a commit that referenced this issue Sep 2, 2020
…36045)

* REF: remove unnecesary try/except

* TST: add test for agg on ordered categorical cols (#35630)

* TST: resample does not yield empty groups (#10603) (#35799)

* revert accidental rebase

* BUG: NDFrame.replace wrong exception type, wrong return when size==0

* bool->bool_t

* whatsnew

Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com>
Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>
kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020
* REF: remove unnecesary try/except

* TST: add test for agg on ordered categorical cols (pandas-dev#35630)

* TST: resample does not yield empty groups (pandas-dev#10603) (pandas-dev#35799)

* revert accidental rebase

* REF: use BlockManager.apply for Rolling.count

Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com>
Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>
kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020
…ce (pandas-dev#35899)

* REF: remove unnecesary try/except

* TST: add test for agg on ordered categorical cols (pandas-dev#35630)

* TST: resample does not yield empty groups (pandas-dev#10603) (pandas-dev#35799)

* revert accidental rebase

* REF: handle axis=None cases inside DataFrame.all/any

* annotate

* dummy commit to force Travis

Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com>
Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>
kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020
* REF: remove unnecesary try/except

* TST: add test for agg on ordered categorical cols (pandas-dev#35630)

* TST: resample does not yield empty groups (pandas-dev#10603) (pandas-dev#35799)

* revert accidental rebase

* BUG: BlockSlider not clearing index._cache

* update whatsnew

Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com>
Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>
kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020
…andas-dev#36045)

* REF: remove unnecesary try/except

* TST: add test for agg on ordered categorical cols (pandas-dev#35630)

* TST: resample does not yield empty groups (pandas-dev#10603) (pandas-dev#35799)

* revert accidental rebase

* BUG: NDFrame.replace wrong exception type, wrong return when size==0

* bool->bool_t

* whatsnew

Co-authored-by: Karthik Mathur <22126205+mathurk1@users.noreply.github.com>
Co-authored-by: tkmz-n <60312218+tkmz-n@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants