-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix some PeriodIndex resampling issues #16153
BUG: Fix some PeriodIndex resampling issues #16153
Conversation
pandas/tests/test_resample.py
Outdated
|
||
with pytest.raises(NotImplementedError): | ||
df.resample('2D', level='d') | ||
for freq in ['H', '12H', '2D', 'W']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests are more clear with parametrize
but we still inherit from TestCase so can't change this
but you can start a new test class that does this
again if it's more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if I understood correctly - pushed 48e324f as an example, are you thinking along those lines of using pytest fixtures/mark.parametrize decorators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented directly on the commit
lmk when you need review. |
Codecov Report
@@ Coverage Diff @@
## master #16153 +/- ##
==========================================
- Coverage 91.27% 91.23% -0.05%
==========================================
Files 163 163
Lines 49765 49774 +9
==========================================
- Hits 45421 45409 -12
- Misses 4344 4365 +21
Continue to review full report at Codecov.
|
ok, gave parametrized tests a shot in 68f02f7. If you agree, I'd be happy to help moving/refactoring more of the tests in TestPeriodIndex. I'd like to tackle some more issues with PeriodIndex in a follow-up PR ( |
pandas/tests/test_resample.py
Outdated
return DataFrame({'value': np.arange(len(index))}, index=index) | ||
|
||
@pytest.fixture(params=[Series, DataFrame], scope='class') | ||
def pandas_obj(self, request, index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is probably a more elegant way to create such "meta-fixtures"? ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do it more like this
@pytest.fixture(params=[PI, DTI, TDI])
def index(klass):
# construct an index
return klass(.....)
@pytest.fixture
def series(index):
return Series(.....)
@pytest.fixture
def frame(index):
return DataFrame(....)
it will cartesian product the indexes separately for series/frame which would now be the fixture arg to a test function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried another approach by keeping the existing inheritance structure - we can inherit fixtures defined in the Base
class, index
fixture defined in Base
dispatches to the subclass-specific type via the existing _index_factory
. If you want to take a look:
https://github.com/winklerand/pandas/blob/bbf01b312a2d8be0b3c7681b9a46a039f33fdcc/pandas/tests/test_resample_pytest.py
Seems easier to me than parametrizing the index fixture, there's a lot of index type-specific test logic.
Should we move out this discussion to another issue or PR? This looks like a more general issue on how to (gradually) transition to pytest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO
we are removing the base classes soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry u meant the Base class and not TestCase base class
in any event it's not more clear to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, was maybe a bit confusing ;-)
pushed e497fc2, should illustrate what I meant: basically move fixtures to a Base_pytest(object)
class which can be inherited by TestPeriodIndex_pytest
, and DatetimeIndex_pytest
. This resembles the existing test organization structure, but allows for parametrization using the common fixtures.
pandas/core/resample.py
Outdated
|
||
if self.loffset is not None: | ||
if self.kind == 'period': | ||
print('Warning: loffset -> convert PeriodIndex to timestamps') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use warnings.warn
rather than print
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this here before? we don't warn on things like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a similar print-style warning is currently in master: https://github.com/pandas-dev/pandas/blob/master/pandas/core/resample.py#L800:L802
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I defer to @jreback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the warning in 8a4182e - any objections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good. I have to do another look on the actual tests.
pandas/core/resample.py
Outdated
elif is_superperiod(ax.freq, self.freq): | ||
if how == 'ohlc': | ||
# upsampling to subperiods is handled as an asfreq, which works |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue reference here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/resample.py
Outdated
'an instance of %r' % type(ax).__name__) | ||
|
||
memb = ax.asfreq(self.freq, how=self.convention) | ||
# NaT handling as in pandas._lib.lib.generate_bins_dt64() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/resample.py
Outdated
nat_count = 0 | ||
if memb.hasnans: | ||
import warnings | ||
with warnings.catch_warnings(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this
memb._isnan
already has this mask.
pandas/core/resample.py
Outdated
|
||
i8 = memb.asi8 | ||
freq_mult = self.freq.n | ||
# when upsampling to subperiods, we need to generate enough bins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/resample.py
Outdated
freq_mult = self.freq.n | ||
# when upsampling to subperiods, we need to generate enough bins | ||
expected_bins_count = len(binner) * freq_mult | ||
i8_extend = expected_bins_count - (i8[-1] - i8[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will fail if not len(memb)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some lines above (after masking out NaTs), we check for if not len(memb)
and return and empty index, so this case should be caught.
Maybe put in an assert len(memb)
to make it explicit?
pandas/tests/test_resample.py
Outdated
freq=self._index_fixture_freq) | ||
|
||
@pytest.fixture(params=[Series, DataFrame], scope='class') | ||
def pandas_obj(self, request, index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this 2 fixtures, series and frame, much simpler, and easier to grok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can then make another fixture like series_and_frame
if you want, though I find that more confusing, rather have you use params.
pandas/tests/test_resample.py
Outdated
@@ -2777,6 +2694,211 @@ def test_evenly_divisible_with_no_extra_bins(self): | |||
assert_frame_equal(result, expected) | |||
|
|||
|
|||
def assert_series_or_frame_equal(result, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is prob ok, but move to util.testing
pandas/tests/test_resample.py
Outdated
pass | ||
|
||
|
||
class TestPeriodIndex_pytest(Base_pytest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather name these TestPeriodIndex(Base)
and rename the original tests I think (and then we should move them in other PR's)
8a4182e
to
09d7ac3
Compare
Rebased on current master, so we can put the parametrized test in the existing test classes (which just inherit from |
cc @MaximilianR can you have a look. |
Looks great. Fixes a couple of important issues. Whether or not we reveal But this definitely makes things better! Thanks @winklerand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really just some minor stylistic changes.
can you fully close all of the mentioned issues at the top of the PR, and then create a new issue with any remnants. (ideally with check boxes).
Also pls create a new issue for test_resample reorg as I suggested. thanks.
bins = memb.searchsorted(rng, side='left') | ||
|
||
if nat_count > 0: | ||
# NaT handling as in pandas._lib.lib.generate_bins_dt64() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this path tested sufficiently, e.g. 0, 1, 2 NaT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a test case for consecutive NaTs in the index (1cad7fa)
Should be sufficiently tested, cases covered:
- 0 NaT: basically all other resampling tests
- multiple single NaTs (at beginning, inside and end of index)
- consecutive NaTs (at beginning, inside and end of index)
Any ideas for more exhaustive test cases?
pandas/tests/test_resample.py
Outdated
@@ -696,6 +696,32 @@ def create_index(self, *args, **kwargs): | |||
factory = self._index_factory() | |||
return factory(*args, **kwargs) | |||
|
|||
_index_fixture_start = datetime(2005, 1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a followup PR. test_resample is getting sufficiently long that it makes sense to split this into some sub-dirs e.g.
pandas/tests/resample/test_api, test_grouping, test_datetimeindex, test_period
etc. IOW split out the classes to separate files (to start).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would normally co-locate fixtures into a common.py
pandas/tests/test_resample.py
Outdated
_index_fixture_end = datetime(2005, 1, 10) | ||
_index_fixture_freq = 'D' | ||
|
||
@pytest.fixture(scope='class') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally don't use scoping like this (and instead use default function scoping). The issue is that we could have mutating things which will then propogate (we shouldn't in any event).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, changed to default function scoping (6fe6c0c)
pandas/tests/test_resample.py
Outdated
@@ -696,6 +696,32 @@ def create_index(self, *args, **kwargs): | |||
factory = self._index_factory() | |||
return factory(*args, **kwargs) | |||
|
|||
_index_fixture_start = datetime(2005, 1, 1) | |||
_index_fixture_end = datetime(2005, 1, 10) | |||
_index_fixture_freq = 'D' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these can be fixtures themselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/tests/test_resample.py
Outdated
@pytest.mark.parametrize('freq', ['2D']) | ||
@pytest.mark.parametrize('kind', ['period', None, 'timestamp']) | ||
def test_asfreq_downsample(self, series_and_frame, freq, kind): | ||
# GH 12884, 15944 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
give a 1-liner about the issue that you are specifiying test when not obvious (e.g. here)
pandas/tests/test_resample.py
Outdated
start = obj.index[0].to_timestamp(how='start') | ||
end = (obj.index[-1] + 1).to_timestamp(how='start') | ||
if kind == 'timestamp': | ||
expected = obj.to_timestamp().resample(freq).asfreq() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this test be combined with prior test? (looks like you can just add to the freq), or is there a reason not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, parametrization makes this obvious. In fact, both tests cover almost the same code path, so I consolidated them into one (f8f4157).
Reason basically was not to change the existing structure - the same methods are defined in the Base
class. I consolidated them as well in the Base
class - PTAL.
pandas/tests/test_resample.py
Outdated
expected_index = self.create_index(df.index[0], | ||
periods=len(df.index) / 2, | ||
freq='2D') | ||
# loffset coreces PeriodIndex to DateTimeIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line
pandas/tests/test_resample.py
Outdated
result_how = df.resample('2D', how=agg_arg, loffset='2H', | ||
kind=kind) | ||
if isinstance(agg_arg, list): | ||
expected.columns = pd.MultiIndex .from_tuples([('value', 'mean')]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have an odd space after pd.MultiIndex
pandas/tests/test_resample.py
Outdated
pi = PeriodIndex(start='2000', freq='D', periods=10) | ||
s = Series(range(len(pi)), index=pi) | ||
expected = s.to_timestamp().resample(freq).ohlc().to_period(freq) | ||
# timestamp-based resampling doesn't include all sub-periods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/tests/test_resample.py
Outdated
assert_frame_equal(result, expected) | ||
|
||
def test_resample_with_only_nat(self): | ||
pi = PeriodIndex([pd.NaT] * 3, freq='S') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue number for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done (not mentioned in the issue explicitly, is just an edge case)
For now this is fine. Yes I think we should remove this as PI becomes more flexible. But that's for another time. |
@winklerand biggest thing is going to need a small section in 0.21.0 to show these changes. In fact I find that writing these examples forces organization on describing the actual changes. Then (in this issue or followup), ideally like to example the timeseries.rst docs w.r.t. PI resampling. |
pandas/tests/test_resample.py
Outdated
expected = s.reindex(new_index) | ||
assert_series_equal(result, expected) | ||
@pytest.mark.parametrize('freq', ['2D', '1H']) | ||
def test_asfreq(self, series_and_frame, freq): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
I consolidated the test_asfreq_downsample()
and test_asfreq_upsample()
tests in the Base
class as well (so we don't have to override them in TestPeriodIndex
). PTAL if everything's still correct and covered as before.
@jreback sorry for the delay on this one. I think I made the requested changes, PTAL. Thanks for the great suggestions. I put a small PeriodIndex resampling paragraph into the v0.21 whatsnew documentation, in the 'Backward incompatible API changes' section - is this the right place? Going to organise the issues (closing the solved ones, open new one with the remaining issues) and create a test_resample reorg issue later tonight. |
@jreback : Were we waiting for anything on this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good. couple of comments.
|
||
.. code-block:: ipython | ||
|
||
In [1]: pi = pd.period_range('2017-01', periods=12, freq='M') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the fixed things that are not changed (e.g. [1], [2]), do in a separate ipython block above (to avoid repeating in previous/new).
doc/source/whatsnew/v0.21.0.txt
Outdated
``PeriodIndex`` resampling | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
In previous versions of pandas, resampling a ``Series``/``DataFrame`` indexed by a ``PeriodIndex`` returned a ``DatetimeIndex`` in some cases (:issue:`12884`). Resampling to a multiplied frequency now returns a ``PeriodIndex`` (:issue:`15944`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the only issue?
|
||
if self.loffset is not None: | ||
# Cannot apply loffset/timedelta to PeriodIndex -> convert to | ||
# timestamps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we show a warning for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should, given that we kind of did that before (it was print
statement with a warning!?)
side issue, maybe we should split up test_resample.py similar to groupby, e.g. make a sub-dir with common, test_datetime, test_period, test_timedelta. to make this a bit more understandable (let's make a separate issue for that). |
can you rebase. this looked really good. need to review once more. |
@winklerand sorry let this linger. can you rebase. |
48f1e76
to
0f63098
Compare
ok I rebased, will look after passing |
@MaximilianR can you have a look. in particular can you look at the original issues and see if we are fully / partially covering and make a note. |
|
||
# if index contains no valid (non-NaT) values, return empty index | ||
if not len(memb): | ||
binner = labels = PeriodIndex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use _shallow_copy
here, but this is OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this, ok for now.
Added tests to cover all code paths, moved the check to _convert_obj() and removed then-redundant check from _upsample() method.
Exceptions: - force conversion to DatetimeIndex by kind='timestamp' param - if loffset is given, convert to timestamps in any case
Removed previously defined helper method assert_series_or_frame_equal(), which behaved identical to assert_almost_equal()
0f63098
to
6084e0c
Compare
thanks @winklerand nice fixes! |
* 'master' of github.com:pandas-dev/pandas: (188 commits) Separate out _convert_datetime_to_tsobject (pandas-dev#17715) DOC: remove whatsnew note for xref pandas-dev#17131 BUG: Regression in .loc accepting a boolean Index as an indexer (pandas-dev#17738) DEPR: Deprecate cdate_range and merge into bdate_range (pandas-dev#17691) CLN: replace %s syntax with .format in pandas.core: categorical, common, config, config_init (pandas-dev#17735) Fixed the memory usage explanation of categorical in gotchas from O(nm) to O(n+m) (pandas-dev#17736) TST: add backward compat for offset testing for pickles (pandas-dev#17733) remove unused time conversion funcs (pandas-dev#17711) DEPR: Deprecate convert parameter in take (pandas-dev#17352) BUG:Time Grouper bug fix when applied for list groupers (pandas-dev#17587) BUG: Fix some PeriodIndex resampling issues (pandas-dev#16153) BUG: Fix unexpected sort in groupby (pandas-dev#17621) DOC: Fixed typo in documentation for 'pandas.DataFrame.replace' (pandas-dev#17731) BUG: Fix series rename called with str altering name rather index (GH17407) (pandas-dev#17654) DOC: Add examples for MultiIndex.get_locs + cleanups (pandas-dev#17675) Doc improvements for IntervalIndex and Interval (pandas-dev#17714) BUG: DataFrame sort_values and multiple "by" columns fails to order NaT correctly Last of the timezones funcs (pandas-dev#17669) Add missing file to _pyxfiles, delete commented-out (pandas-dev#17712) update imports of DateParseError, remove unused imports from tslib (pandas-dev#17713) ...
thanks @jreback for pushing this one over the finish line, completely forgot about it (works for me syndrome ;-)), sorry! |
closes #15944
xref partially #12884
closes #13083
closes #13224
This PR addresses some of the issues related to PeriodIndex resampling.
As I'm new to the pandas codebase, I appreciate any advice.