Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: try coerce result back to DatetimeBlock #22008

Closed

Conversation

holymonson
Copy link
Contributor

@holymonson holymonson commented Jul 21, 2018


Hang up until Block.eval reformed.

@WillAyd WillAyd added Datetime Datetime data dtype Timedelta Timedelta data type labels Jul 21, 2018
@WillAyd
Copy link
Member

WillAyd commented Jul 21, 2018

@jbrockmendel . Should this work with tz-aware date times as well?

@jbrockmendel
Copy link
Member

Should this work with tz-aware date times as well?

Yes.

If there's any way at all to do this without poking into core.internals, that would be preferred. The Series version of this works correctly right? This should be doing something like operating column-wise to dispatch to the Series implementation (which itself dispatches to the DatetimeIndex implementation).

Can you include tests for subtraction while you're at it?

@codecov
Copy link

codecov bot commented Jul 21, 2018

Codecov Report

Merging #22008 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22008      +/-   ##
==========================================
+ Coverage   91.99%   91.99%   +<.01%     
==========================================
  Files         167      167              
  Lines       50578    50588      +10     
==========================================
+ Hits        46530    46540      +10     
  Misses       4048     4048
Flag Coverage Δ
#multiple 90.4% <100%> (ø) ⬆️
#single 42.17% <18.18%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/__init__.py 95.51% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0828c25...21c1c7c. Read the comment docs.

@@ -1362,10 +1368,15 @@ def eval(self, func, other, errors='raise', try_cast=False, mgr=None):
values, values_mask, other, other_mask = self._try_coerce_args(
transf(values), other)
except TypeError:
# `Timestamp` operate with `DateOffset` must cast to `object`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would rather see you override eval in the appropriate blocks

@holymonson holymonson force-pushed the timestamp_df-add-offset branch from ad96a28 to a75f375 Compare July 22, 2018 09:44
@holymonson holymonson changed the title core: coerce result to dtype in Block.eval if dtype explicitly given core: try coerce result back to DatetimeBlock Jul 22, 2018
@holymonson
Copy link
Contributor Author

v2.0, override DatetimeBlock.eval.

Below test fails, but not a regression. Will look deeper later.

    def test_timestamp_df_sub_timestamp(self):
        expected = pd.DataFrame([pd.Timedelta('365d')])
        result = pd.DataFrame([pd.Timestamp('2018')]) - pd.Timestamp('2017')
        tm.assert_frame_equal(expected, result)

@holymonson
Copy link
Contributor Author

docs said returning a new block, but indeed it returns a singleton list.

Returns
-------
a new block, the result of the func
"""

result = _block_shape(result, ndim=self.ndim)
return [self.make_block(result)]

block = super().eval(func, other, try_cast=try_cast, **kwargs)[0]
if try_cast:
if isinstance(other,
(tslibs.Timestamp, np.datetime64, datetime, date)):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an ABCTimestamp to check? At least 3 times use similar isinstance().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an ABCTimestamp to check?

No, but Timestamp subclasses datetime, so could actually be removed from this check.

@holymonson holymonson force-pushed the timestamp_df-add-offset branch from 4c36407 to 8a3a20e Compare July 22, 2018 15:00
@@ -2785,6 +2787,17 @@ def set(self, locs, values, check=False):

self.values[locs] = values

def eval(self, func, other, try_cast=False, **kwargs):
block = super(DatetimeBlock, self).eval(func, other, try_cast=try_cast,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if instead of calling super eval you delegate to self.holder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In DatetimeBlock - datetime case and DatetimeBlock + DateOffset case, it require the result block changing (DatetimeBlock -> TimeDeltaBlock , ObjectBlock -> DatetimeBlock or DatetimeTZBlock).
Delegating to self.holder couldn't change block type. Besides, self._holder is unaware of other.

In fact, what should be blamed is the last line in Block.eval carelessly make a wrong block. A proper fix should be

  1. spliting a method to make block regarding to other.
  2. override make_block with other as argument. But this will lead to self.make_block returning another type of block.
    return [self.make_block(result)]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, what should be blamed is the last line in Block.eval carelessly make a wrong block. A proper fix should be

There has been an effort recently to fix mismatches between the behaviors of arithmetic ops in Index vs Series vs DataFrame. For the timedelta and datetime cases, the canonical behavior is defined in TimedeltaIndex/DatetimeIndex.

The best case here to to avoid making this fix in core.internals but instead do it in core.DataFrame (or core.ops). The next base case is to have the FooBlock dispatch to the appropriate Index subclass to ensure the behavior is canonical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will be great, but it may need a big rewrite of eval and _try_coerce_args, cause now values is all coerced to ndarray[i8] for DatetimeLike. (May be benefited from numpy bi-ops, but confused to convert back.)

Is there an issue to xfer? This PR would depand on it.

@holymonson
Copy link
Contributor Author

Hang up until Block.eval reformed. This will also close #8554 #12437.

@@ -1238,7 +1238,6 @@ class TestCanHoldElement(object):
(2**63, 'complex128'),
(True, 'bool'),
(np.timedelta64(20, 'ns'), '<m8[ns]'),
(np.datetime64(20, 'ns'), '<M8[ns]'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removed? (Besides no is exempt sub working)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR lead to datetime64 OP datetime64 = timedelta64, regardless of OP. Although meaningless, it will result datetime64 + datetime64 = timedelta64.

The assumption datetime64 OP datetime64 = datetime64 is meaningless for all OP. If allow this line, it will require extra OP check in DatetimeBlock.eval, which I don't think worth.

(operator.mod, 'complex128'),
(operator.mod, '<m8[ns]'),
(operator.pow, 'bool'),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls avoid formatting changes where convenient, makes it harder for reviewers to identify the actual changes.

@holymonson holymonson force-pushed the timestamp_df-add-offset branch from c82ea08 to 21c1c7c Compare July 23, 2018 05:55
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a lot of concerns over the changes in the impl here.

@@ -2737,7 +2739,7 @@ def _try_coerce_args(self, values, other):

def _try_coerce_result(self, result):
""" reverse of try_coerce_args """
if isinstance(result, np.ndarray):
if isinstance(result, (np.ndarray, Block)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a good change, please remove. the point is that these are scalars/arrays NEVER blocks.

@@ -2920,6 +2933,8 @@ def _try_coerce_result(self, result):
if isinstance(result, np.ndarray):
if result.dtype.kind in ['i', 'f', 'O']:
result = result.astype('M8[ns]')
elif isinstance(result, Block):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, NEVER do this

@jbrockmendel
Copy link
Member

Two of the bugs this addresses (#21610, #8554) were closed by #22163. The remaining issue #12437 is still open. @holymonson can this be simplified if only aimed at that issue?

@holymonson
Copy link
Contributor Author

Two of the bugs this addresses (#21610, #8554) were closed by #22163. The remaining issue #12437 is still open. @holymonson can this be simplified if only aimed at that issue?

This Block-level fix is not good if dispatching to Series could already fix some of them. I would rather close this (Or you might want to take the testcases).

@holymonson holymonson closed this Aug 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dtype changed when DataFrame add DateOffset
4 participants