Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. #20374

114 changes: 110 additions & 4 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1294,12 +1294,118 @@ def describe(self, **kwargs):
return result.T
return result.unstack()

@Substitution(name='groupby')
@Appender(_doc_template)
def resample(self, rule, *args, **kwargs):
"""
Provide resampling when using a TimeGrouper
Return a new grouper with our resampler appended
Provide resampling when using a TimeGrouper.

Given a grouper, the function resamples it according to a string
"string" -> "frequency".

See the :ref:`frequency aliases <timeseries.offset-aliases>`
documentation for more details.

Parameters
----------
rule : str or Offset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offset --> DateOffset

The offset string or object representing target grouper conversion.
*args, **kwargs
For compatibility with other groupby methods. Available keywor
arguments are:
datapythonista marked this conversation as resolved.
Show resolved Hide resolved

* closed : {'right', 'left'}
Which side of bin interval is closed.
* label : {'right', 'left'}
Which bin edge label to label bucket with.
* loffset : timedelta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "loffset" right? I don't know this section of the code all that well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an expert myself, I guess the type should be DateOffset, tseries.offsets, timedelta, or str?

For what I can see, valid keywords should be how, fill_method, limit, kind and on (in get_resampler_for_grouping), closed, label, how, axis, fill_method, limit, loffset, kind, convention, base (in TimeGrouper), and key, level, freq, axis, sort (in Grouper).

What I'd do is to add the ones from get_resampler_for_grouping as explicit arguments, and then document that **kwargs will be passed to TimeGrouper.

@jreback is it ok to change the signature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

level, axis, freq, key, sort are all part of the grouper and not args to .resample() or any aggregation function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pandres can you replace the description by something like:

Possible arguments are `how`, `fill_method`, `limit`, `kind` and `on`, and other arguments of `TimeGrouper`.

We can improve that later in a separate PR, but I think we can merge all the rest of the changes for now.

Thanks!

Adjust the resampled time labels.

Returns
-------
Grouper
Return a new grouper with our resampler appended.

See Also
--------
pandas.Grouper : Specify a frequency to resample with when
grouping by a key.
DatetimeIndex.resample : Frequency conversion and resampling of
time series.

Examples
--------
>>> idx = pd.date_range('1/1/2000', periods=4, freq='T')
>>> df = pd.DataFrame(data=4 * [range(2)],
... index=idx,
... columns=['a', 'b'])
>>> df.iloc[2, 0] = 5
>>> df
a b
2000-01-01 00:00:00 0 1
2000-01-01 00:01:00 0 1
2000-01-01 00:02:00 5 1
2000-01-01 00:03:00 0 1

Downsample the DataFrame into 3 minute bins and sum the values of
the timestamps falling into a bin.

>>> df.groupby('a').resample('3T').sum()
a b
a
0 2000-01-01 00:00:00 0 2
2000-01-01 00:03:00 0 1
5 2000-01-01 00:00:00 5 1

Upsample the series into 30 second bins.

>>> df.groupby('a').resample('30S').sum()
a b
a
0 2000-01-01 00:00:00 0 1
2000-01-01 00:00:30 0 0
2000-01-01 00:01:00 0 1
2000-01-01 00:01:30 0 0
2000-01-01 00:02:00 0 0
2000-01-01 00:02:30 0 0
2000-01-01 00:03:00 0 1
5 2000-01-01 00:02:00 5 1

Resample by month. Values are assigned to the month of the period.

>>> df.groupby('a').resample('M').sum()
a b
a
0 2000-01-31 0 3
5 2000-01-31 5 1

Downsample the series into 3 minute bins as above, but close the right
side of the bin interval.

>>> df.groupby('a').resample('3T', closed='right').sum()
a b
a
0 1999-12-31 23:57:00 0 1
2000-01-01 00:00:00 0 2
5 2000-01-01 00:00:00 5 1

Downsample the series into 3 minute bins and close the right side of
the bin interval, but label each bin using the right edge instead of
the left.

>>> df.groupby('a').resample('3T', closed='right', label='right').sum()
a b
a
0 2000-01-01 00:00:00 0 1
2000-01-01 00:03:00 0 2
5 2000-01-01 00:03:00 5 1

Add an offset of twenty seconds.

>>> df.groupby('a').resample('3T', loffset='20s').sum()
a b
a
0 2000-01-01 00:00:20 0 2
2000-01-01 00:03:20 0 1
5 2000-01-01 00:00:20 5 1
"""
from pandas.core.resample import get_resampler_for_grouping
return get_resampler_for_grouping(self, rule, *args, **kwargs)
Expand Down