Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.where casting dt64 to int64 #38073

Merged
merged 18 commits into from
Dec 29, 2020
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
24dd5dd
ENH: support 2D in DatetimeArray._from_sequence
jbrockmendel Nov 23, 2020
ac47fab
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Nov 25, 2020
18f1671
BUG: Series.where casting dt64 to int64
jbrockmendel Nov 25, 2020
95d7aa3
whatsnew
jbrockmendel Nov 25, 2020
8862b0e
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Dec 2, 2020
eb686bf
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Dec 8, 2020
e0fcbe5
move whatsnew
jbrockmendel Dec 8, 2020
48e99c4
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Dec 13, 2020
75ce3cd
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Dec 17, 2020
5197e96
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Dec 19, 2020
0d0165c
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Dec 23, 2020
f9786c6
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Dec 23, 2020
1c04797
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Dec 27, 2020
d797c21
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Dec 27, 2020
d9f1e3a
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Dec 28, 2020
08c06e9
use fixture, remove unnecessary check
jbrockmendel Dec 28, 2020
4cef5ea
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Dec 29, 2020
1e70e20
Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…
jbrockmendel Dec 29, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,7 @@ Datetimelike
^^^^^^^^^^^^
- Bug in :class:`DataFrame` and :class:`Series` constructors sometimes dropping nanoseconds from :class:`Timestamp` (resp. :class:`Timedelta`) ``data``, with ``dtype=datetime64[ns]`` (resp. ``timedelta64[ns]``) (:issue:`38032`)
- Bug in :meth:`DataFrame.first` and :meth:`Series.first` returning two months for offset one month when first day is last calendar day (:issue:`29623`)
- Bug in :meth:`Series.where` incorrectly casting ``datetime64`` values to ``int64`` (:issue:`37682`)
-

Timedelta
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/arrays/numpy_.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,8 @@ def __init__(self, values: Union[np.ndarray, "PandasArray"], copy: bool = False)
f"'values' must be a NumPy array, not {type(values).__name__}"
)

if values.ndim != 1:
if values.ndim == 0:
# Technically we support 2, but do not advertise that fact.
raise ValueError("PandasArray must be 1-dimensional.")

if copy:
Expand Down
53 changes: 41 additions & 12 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -1389,6 +1389,25 @@ def shift(self, periods: int, axis: int = 0, fill_value=None):

return [self.make_block(new_values)]

def _maybe_reshape_where_args(self, values, other, cond, axis):
jreback marked this conversation as resolved.
Show resolved Hide resolved
transpose = self.ndim == 2

cond = _extract_bool_array(cond)

# If the default broadcasting would go in the wrong direction, then
# explicitly reshape other instead
if getattr(other, "ndim", 0) >= 1:
if values.ndim - 1 == other.ndim and axis == 1:
other = other.reshape(tuple(other.shape + (1,)))
elif transpose and values.ndim == self.ndim - 1:
# TODO(EA2D): not neceesssary with 2D EAs
cond = cond.T

if not hasattr(cond, "shape"):
raise ValueError("where must have a condition that is ndarray like")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this actually hit? shouldn't this be an assertion (i know this is friendler of course), but still.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you explicity check this on L1347 no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, will remove

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated + green


return other, cond

def where(
self, other, cond, errors="raise", try_cast: bool = False, axis: int = 0
) -> List["Block"]:
Expand All @@ -1411,7 +1430,6 @@ def where(
"""
import pandas.core.computation.expressions as expressions

cond = _extract_bool_array(cond)
assert not isinstance(other, (ABCIndex, ABCSeries, ABCDataFrame))

assert errors in ["raise", "ignore"]
Expand All @@ -1422,17 +1440,7 @@ def where(
if transpose:
values = values.T

# If the default broadcasting would go in the wrong direction, then
# explicitly reshape other instead
if getattr(other, "ndim", 0) >= 1:
if values.ndim - 1 == other.ndim and axis == 1:
other = other.reshape(tuple(other.shape + (1,)))
elif transpose and values.ndim == self.ndim - 1:
# TODO(EA2D): not neceesssary with 2D EAs
cond = cond.T

if not hasattr(cond, "shape"):
raise ValueError("where must have a condition that is ndarray like")
other, cond = self._maybe_reshape_where_args(values, other, cond, axis)

if cond.ravel("K").all():
result = values
Expand Down Expand Up @@ -2189,6 +2197,26 @@ def to_native_types(self, na_rep="NaT", **kwargs):
result = arr._format_native_types(na_rep=na_rep, **kwargs)
return self.make_block(result)

def where(
self, other, cond, errors="raise", try_cast: bool = False, axis: int = 0
) -> List["Block"]:
# TODO(EA2D): reshape unnecessary with 2D EAs
arr = self.array_values().reshape(self.shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can almost use your ravel_compat decorator


other, cond = self._maybe_reshape_where_args(arr, other, cond, axis)

try:
res_values = arr.T.where(cond, other).T
except (ValueError, TypeError):
return super().where(
other, cond, errors=errors, try_cast=try_cast, axis=axis
)

# TODO(EA2D): reshape not needed with 2D EAs
res_values = res_values.reshape(self.values.shape)
nb = self.make_block_same_class(res_values)
return [nb]

def _can_hold_element(self, element: Any) -> bool:
arr = self.array_values()

Expand Down Expand Up @@ -2257,6 +2285,7 @@ class DatetimeTZBlock(ExtensionBlock, DatetimeBlock):
fillna = DatetimeBlock.fillna # i.e. Block.fillna
fill_value = DatetimeBlock.fill_value
_can_hold_na = DatetimeBlock._can_hold_na
where = DatetimeBlock.where

array_values = ExtensionBlock.array_values

Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/arrays/test_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ def test_array_inference_fails(data):
tm.assert_extension_array_equal(result, expected)


@pytest.mark.parametrize("data", [np.array([[1, 2], [3, 4]]), [[1, 2], [3, 4]]])
@pytest.mark.parametrize("data", [np.array(0)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we pass thru 2-D pandas arrays?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DTA/TDA._validate_listlike calls pd.array on possibly-2D inputs, which go through PandasArray

def test_nd_raises(data):
with pytest.raises(ValueError, match="PandasArray must be 1-dimensional"):
pd.array(data, dtype="int64")
Expand Down
31 changes: 31 additions & 0 deletions pandas/tests/series/indexing/test_where.py
Original file line number Diff line number Diff line change
Expand Up @@ -464,3 +464,34 @@ def test_where_categorical(klass):
df = klass(["A", "A", "B", "B", "C"], dtype="category")
res = df.where(df != "C")
tm.assert_equal(exp, res)


@pytest.mark.parametrize("tz", [None, "US/Pacific"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use the timezone fixtures, but nbd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

def test_where_datetimelike_categorical(tz):
jreback marked this conversation as resolved.
Show resolved Hide resolved
# GH#37682
dr = pd.date_range("2001-01-01", periods=3, tz=tz)._with_freq(None)
lvals = pd.DatetimeIndex([dr[0], dr[1], pd.NaT])
rvals = pd.Categorical([dr[0], pd.NaT, dr[2]])

mask = np.array([True, True, False])

# DatetimeIndex.where
res = lvals.where(mask, rvals)
tm.assert_index_equal(res, dr)

# DatetimeArray.where
res = lvals._data.where(mask, rvals)
tm.assert_datetime_array_equal(res, dr._data)

# Series.where
res = Series(lvals).where(mask, rvals)
tm.assert_series_equal(res, Series(dr))

# DataFrame.where
if tz is None:
res = pd.DataFrame(lvals).where(mask[:, None], pd.DataFrame(rvals))
else:
with pytest.xfail(reason="frame._values loses tz"):
res = pd.DataFrame(lvals).where(mask[:, None], pd.DataFrame(rvals))

tm.assert_frame_equal(res, pd.DataFrame(dr))