Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: try coerce result back to DatetimeBlock #22008

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,7 @@ Datetimelike

- Fixed bug where two :class:`DateOffset` objects with different ``normalize`` attributes could evaluate as equal (:issue:`21404`)
- Fixed bug where :meth:`Timestamp.resolution` incorrectly returned 1-microsecond ``timedelta`` instead of 1-nanosecond :class:`Timedelta` (:issue:`21336`,:issue:`21365`)
- Fixed bug where :class:`DataFrame` with ``dtype='datetime64[ns]'`` operating with :class:`DateOffset` could cast to ``dtype='object'`` (:issue:`21610`)

Timedelta
^^^^^^^^^
Expand Down
19 changes: 17 additions & 2 deletions pandas/core/internals/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,9 @@
ABCSeries,
ABCDatetimeIndex,
ABCExtensionArray,
ABCIndexClass)
ABCIndexClass,
ABCDateOffset,
)
import pandas.core.common as com
import pandas.core.algorithms as algos

Expand Down Expand Up @@ -2737,7 +2739,7 @@ def _try_coerce_args(self, values, other):

def _try_coerce_result(self, result):
""" reverse of try_coerce_args """
if isinstance(result, np.ndarray):
if isinstance(result, (np.ndarray, Block)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a good change, please remove. the point is that these are scalars/arrays NEVER blocks.

if result.dtype.kind in ['i', 'f', 'O']:
try:
result = result.astype('M8[ns]')
Expand Down Expand Up @@ -2785,6 +2787,17 @@ def set(self, locs, values, check=False):

self.values[locs] = values

def eval(self, func, other, try_cast=False, **kwargs):
block = super(DatetimeBlock, self).eval(func, other, try_cast=try_cast,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if instead of calling super eval you delegate to self.holder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DatetimeBlock - datetime case and DatetimeBlock + DateOffset case, it require the result block changing (DatetimeBlock -> TimeDeltaBlock , ObjectBlock -> DatetimeBlock or DatetimeTZBlock).
Delegating to self.holder couldn't change block type. Besides, self._holder is unaware of other.

In fact, what should be blamed is the last line in Block.eval carelessly make a wrong block. A proper fix should be

  1. spliting a method to make block regarding to other.
  2. override make_block with other as argument. But this will lead to self.make_block returning another type of block.
    return [self.make_block(result)]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, what should be blamed is the last line in Block.eval carelessly make a wrong block. A proper fix should be

There has been an effort recently to fix mismatches between the behaviors of arithmetic ops in Index vs Series vs DataFrame. For the timedelta and datetime cases, the canonical behavior is defined in TimedeltaIndex/DatetimeIndex.

The best case here to to avoid making this fix in core.internals but instead do it in core.DataFrame (or core.ops). The next base case is to have the FooBlock dispatch to the appropriate Index subclass to ensure the behavior is canonical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will be great, but it may need a big rewrite of eval and _try_coerce_args, cause now values is all coerced to ndarray[i8] for DatetimeLike. (May be benefited from numpy bi-ops, but confused to convert back.)

Is there an issue to xfer? This PR would depand on it.

**kwargs)[0]
if try_cast:
if isinstance(other, (np.datetime64, date)):
block = TimeDeltaBlock(block.values, block.mgr_locs,
ndim=block.ndim)
elif isinstance(other, ABCDateOffset):
block = self._try_coerce_result(block)
return [block]


class DatetimeTZBlock(NonConsolidatableMixIn, DatetimeBlock):
""" implement a datetime64 block with a tz attribute """
Expand Down Expand Up @@ -2920,6 +2933,8 @@ def _try_coerce_result(self, result):
if isinstance(result, np.ndarray):
if result.dtype.kind in ['i', 'f', 'O']:
result = result.astype('M8[ns]')
elif isinstance(result, Block):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, NEVER do this

result = self.make_block_same_class(result.values.flat)
elif isinstance(result, (np.integer, np.float, np.datetime64)):
result = tslibs.Timestamp(result, tz=self.values.tz)
if isinstance(result, np.ndarray):
Expand Down
24 changes: 24 additions & 0 deletions pandas/tests/frame/test_arithmetic.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# -*- coding: utf-8 -*-
import datetime
import pytest
import numpy as np

Expand Down Expand Up @@ -211,6 +212,29 @@ def test_df_sub_datetime64_not_ns(self):
pd.Timedelta(days=2)])
tm.assert_frame_equal(res, expected)

def test_timestamp_df_add_dateoffset(self):
# GH 21610
expected = pd.DataFrame([pd.Timestamp('2019')])
result = pd.DataFrame([pd.Timestamp('2018')]) + pd.DateOffset(years=1)
tm.assert_frame_equal(expected, result)

expected = pd.DataFrame([pd.Timestamp('2019', tz='Asia/Shanghai')])
result = (pd.DataFrame([pd.Timestamp('2018', tz='Asia/Shanghai')])
+ pd.DateOffset(years=1))
tm.assert_frame_equal(expected, result)

@pytest.mark.parametrize('other', [
pd.Timestamp('2017'),
np.datetime64('2017'),
datetime.datetime(2017, 1, 1),
datetime.date(2017, 1, 1),
])
def test_timestamp_df_sub_timestamp(self, other):
# GH 8554 12437
expected = pd.DataFrame([pd.Timedelta('365d')])
result = pd.DataFrame([pd.Timestamp('2018')]) - other
tm.assert_frame_equal(expected, result)

@pytest.mark.parametrize('data', [
[1, 2, 3],
[1.1, 2.2, 3.3],
Expand Down
2 changes: 0 additions & 2 deletions pandas/tests/internals/test_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -1238,7 +1238,6 @@ class TestCanHoldElement(object):
(2**63, 'complex128'),
(True, 'bool'),
(np.timedelta64(20, 'ns'), '<m8[ns]'),
(np.datetime64(20, 'ns'), '<M8[ns]'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removed? (Besides no is exempt sub working)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR lead to datetime64 OP datetime64 = timedelta64, regardless of OP. Although meaningless, it will result datetime64 + datetime64 = timedelta64.

The assumption datetime64 OP datetime64 = datetime64 is meaningless for all OP. If allow this line, it will require extra OP check in DatetimeBlock.eval, which I don't think worth.

])
@pytest.mark.parametrize('op', [
operator.add,
Expand All @@ -1255,7 +1254,6 @@ def test_binop_other(self, op, value, dtype):
(operator.truediv, 'bool'),
(operator.mod, 'i8'),
(operator.mod, 'complex128'),
(operator.mod, '<M8[ns]'),
(operator.mod, '<m8[ns]'),
(operator.pow, 'bool')}
if (op, dtype) in skip:
Expand Down