Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: make "closed" part of IntervalDtype #38394

Merged
merged 24 commits into from
Jan 10, 2021
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
747926e
ENH: make closed part of IntervalDtype
jbrockmendel Dec 7, 2020
25d5925
TST: raise on mismatched closed
jbrockmendel Nov 21, 2020
15613e6
typo fixup
jbrockmendel Nov 21, 2020
7bd2db6
pickle
jbrockmendel Dec 8, 2020
10c0225
warn on deprcated
jbrockmendel Dec 9, 2020
2a82a78
test for warnings
jbrockmendel Dec 9, 2020
1b93617
Merge branch 'master' of https://github.com/pandas-dev/pandas into en…
jbrockmendel Dec 9, 2020
f2bb5a1
Merge branch 'master' of https://github.com/pandas-dev/pandas into en…
jbrockmendel Dec 10, 2020
ded6c31
update doctests
jbrockmendel Dec 10, 2020
9816690
remove now-unnecessary _closed attr
jbrockmendel Dec 10, 2020
aae7d84
catch warnings
jbrockmendel Dec 11, 2020
b262bdc
Merge branch 'master' of https://github.com/pandas-dev/pandas into en…
jbrockmendel Dec 11, 2020
c2efb79
whatnsew
jbrockmendel Dec 11, 2020
5ef58db
Merge branch 'master' of https://github.com/pandas-dev/pandas into en…
jbrockmendel Dec 13, 2020
bf86746
test for closed mismatch
jbrockmendel Dec 14, 2020
70c7de6
test for mismatch
jbrockmendel Dec 14, 2020
517c8f1
comment
jbrockmendel Dec 14, 2020
922ce1a
mypy fixup
jbrockmendel Dec 14, 2020
82ee698
Merge branch 'master' of https://github.com/pandas-dev/pandas into en…
jbrockmendel Dec 16, 2020
e783761
Merge branch 'master' of https://github.com/pandas-dev/pandas into en…
jbrockmendel Jan 8, 2021
6e47156
Merge branch 'master' of https://github.com/pandas-dev/pandas into en…
jbrockmendel Jan 8, 2021
367feee
API: allow closed=None in IntervalDtype
jbrockmendel Jan 8, 2021
b10b0bf
revert deprecation in whatsnew
jbrockmendel Jan 8, 2021
370a843
revert warning-catching
jbrockmendel Jan 8, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ Other API changes

Deprecations
~~~~~~~~~~~~

- Deprecated construction of :class:`IntervalDtype` without specifying ``closed`` (:issue:`38394`)
-
-

Expand Down
4 changes: 2 additions & 2 deletions pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -703,8 +703,8 @@ def float_frame():
# ----------------------------------------------------------------
@pytest.fixture(
params=[
(Interval(left=0, right=5), IntervalDtype("int64")),
(Interval(left=0.1, right=0.5), IntervalDtype("float64")),
(Interval(left=0, right=5), IntervalDtype("int64", "right")),
(Interval(left=0.1, right=0.5), IntervalDtype("float64", "right")),
(Period("2012-01", freq="M"), "period[M]"),
(Period("2012-02-01", freq="D"), "period[D]"),
(
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/_arrow_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ def __hash__(self):
def to_pandas_dtype(self):
import pandas as pd

return pd.IntervalDtype(self.subtype.to_pandas_dtype())
return pd.IntervalDtype(self.subtype.to_pandas_dtype(), self.closed)

# register the type with a dummy instance
_interval_type = ArrowIntervalType(pyarrow.int64(), "left")
Expand Down
35 changes: 23 additions & 12 deletions pandas/core/arrays/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@
>>> pd.arrays.IntervalArray([pd.Interval(0, 1), pd.Interval(1, 5)])
<IntervalArray>
[(0, 1], (1, 5]]
Length: 2, closed: right, dtype: interval[int64]
Length: 2, closed: right, dtype: interval[int64, right]
jreback marked this conversation as resolved.
Show resolved Hide resolved

It may also be constructed using one of the constructor
methods: :meth:`IntervalArray.from_arrays`,
Expand Down Expand Up @@ -213,6 +213,9 @@ def _simple_new(
):
result = IntervalMixin.__new__(cls)

if closed is None and isinstance(dtype, IntervalDtype):
closed = dtype.closed

closed = closed or "right"
left = ensure_index(left, copy=copy)
right = ensure_index(right, copy=copy)
Expand All @@ -227,6 +230,12 @@ def _simple_new(
left = left.astype(dtype.subtype)
right = right.astype(dtype.subtype)

if dtype.closed is None:
# possibly loading an old pickle
dtype = IntervalDtype(dtype.subtype, closed)
elif closed != dtype.closed:
raise ValueError("closed keyword does not match dtype.closed")

# coerce dtypes to match if needed
if is_float_dtype(left) and is_integer_dtype(right):
right = right.astype(left.dtype)
Expand Down Expand Up @@ -268,9 +277,11 @@ def _simple_new(
# If these share data, then setitem could corrupt our IA
right = right.copy()

dtype = IntervalDtype(left.dtype, closed=closed)
result._dtype = dtype

jreback marked this conversation as resolved.
Show resolved Hide resolved
result._left = left
result._right = right
result._closed = closed
if verify_integrity:
result._validate()
return result
Expand Down Expand Up @@ -330,7 +341,7 @@ def _from_factorized(cls, values, original):
>>> pd.arrays.IntervalArray.from_breaks([0, 1, 2, 3])
<IntervalArray>
[(0, 1], (1, 2], (2, 3]]
Length: 3, closed: right, dtype: interval[int64]
Length: 3, closed: right, dtype: interval[int64, right]
"""
),
}
Expand Down Expand Up @@ -399,7 +410,7 @@ def from_breaks(cls, breaks, closed="right", copy=False, dtype=None):
>>> pd.arrays.IntervalArray.from_arrays([0, 1, 2], [1, 2, 3])
<IntervalArray>
[(0, 1], (1, 2], (2, 3]]
Length: 3, closed: right, dtype: interval[int64]
Length: 3, closed: right, dtype: interval[int64, right]
"""
),
}
Expand Down Expand Up @@ -456,7 +467,7 @@ def from_arrays(cls, left, right, closed="right", copy=False, dtype=None):
>>> pd.arrays.IntervalArray.from_tuples([(0, 1), (1, 2)])
<IntervalArray>
[(0, 1], (1, 2]]
Length: 2, closed: right, dtype: interval[int64]
Length: 2, closed: right, dtype: interval[int64, right]
"""
),
}
Expand Down Expand Up @@ -534,7 +545,7 @@ def _shallow_copy(self, left, right):

@property
def dtype(self):
return IntervalDtype(self.left.dtype)
return self._dtype

@property
def nbytes(self) -> int:
Expand Down Expand Up @@ -1155,7 +1166,7 @@ def mid(self):
>>> intervals
<IntervalArray>
[(0, 1], (1, 3], (2, 4]]
Length: 3, closed: right, dtype: interval[int64]
Length: 3, closed: right, dtype: interval[int64, right]
"""
),
}
Expand Down Expand Up @@ -1184,7 +1195,7 @@ def closed(self):
Whether the intervals are closed on the left-side, right-side, both or
neither.
"""
return self._closed
return self.dtype.closed

_interval_shared_docs["set_closed"] = textwrap.dedent(
"""
Expand Down Expand Up @@ -1219,11 +1230,11 @@ def closed(self):
>>> index
<IntervalArray>
[(0, 1], (1, 2], (2, 3]]
Length: 3, closed: right, dtype: interval[int64]
Length: 3, closed: right, dtype: interval[int64, right]
>>> index.set_closed('both')
<IntervalArray>
[[0, 1], [1, 2], [2, 3]]
Length: 3, closed: both, dtype: interval[int64]
Length: 3, closed: both, dtype: interval[int64, both]
"""
),
}
Expand Down Expand Up @@ -1282,7 +1293,7 @@ def __array__(self, dtype=None) -> np.ndarray:
left = self._left
right = self._right
mask = self.isna()
closed = self._closed
closed = self.closed

result = np.empty(len(left), dtype=object)
for i in range(len(left)):
Expand Down Expand Up @@ -1422,7 +1433,7 @@ def repeat(self, repeats, axis=None):
>>> intervals
<IntervalArray>
[(0, 1], (1, 3], (2, 4]]
Length: 3, closed: right, dtype: interval[int64]
Length: 3, closed: right, dtype: interval[int64, right]
"""
),
}
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -793,7 +793,7 @@ def infer_dtype_from_scalar(val, pandas_dtype: bool = False) -> Tuple[DtypeObj,
dtype = PeriodDtype(freq=val.freq)
elif lib.is_interval(val):
subtype = infer_dtype_from_scalar(val.left, pandas_dtype=True)[0]
dtype = IntervalDtype(subtype=subtype)
dtype = IntervalDtype(subtype=subtype, closed=val.closed)

return dtype, val

Expand Down
94 changes: 84 additions & 10 deletions pandas/core/dtypes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
Union,
cast,
)
import warnings

import numpy as np
import pytz
Expand Down Expand Up @@ -1002,37 +1003,75 @@ class IntervalDtype(PandasExtensionDtype):

Examples
--------
>>> pd.IntervalDtype(subtype='int64')
interval[int64]
>>> pd.IntervalDtype(subtype='int64', closed='both')
interval[int64, both]
"""

name = "interval"
kind: str_type = "O"
str = "|O08"
base = np.dtype("O")
num = 103
_metadata = ("subtype",)
_match = re.compile(r"(I|i)nterval\[(?P<subtype>.+)\]")
_metadata = (
"subtype",
"closed",
)
_match = re.compile(
r"(I|i)nterval\[(?P<subtype>[^,]+)(, (?P<closed>(right|left|both|neither)))?\]"
)
_cache: Dict[str_type, PandasExtensionDtype] = {}

def __new__(cls, subtype=None):
def __new__(cls, subtype=None, closed: Optional[str_type] = None):
from pandas.core.dtypes.common import is_string_dtype, pandas_dtype

if closed is not None and closed not in {"right", "left", "both", "neither"}:
raise ValueError("closed must be one of 'right', 'left', 'both', 'neither'")

if isinstance(subtype, IntervalDtype):
if closed is not None and closed != subtype.closed:
jreback marked this conversation as resolved.
Show resolved Hide resolved
raise ValueError(
"dtype.closed and 'closed' do not match. "
"Try IntervalDtype(dtype.subtype, closed) instead."
)
return subtype
elif subtype is None:
# we are called as an empty constructor
# generally for pickle compat
u = object.__new__(cls)
u._subtype = None
u._closed = closed
return u
elif isinstance(subtype, str) and subtype.lower() == "interval":
subtype = None
else:
if isinstance(subtype, str):
m = cls._match.search(subtype)
if m is not None:
subtype = m.group("subtype")
gd = m.groupdict()
subtype = gd["subtype"]
if gd.get("closed", None) is not None:
if closed is not None:
if closed != gd["closed"]:
raise ValueError(
"'closed' keyword does not match value "
"specified in dtype string"
)
closed = gd["closed"]
elif closed is not None:
# user passed eg. IntervalDtype("interval[int64]", "left")
pass
else:
warnings.warn(
"Constructing an IntervalDtype from a string without "
"specifying 'closed' is deprecated and will raise in "
"a future version. "
f"Use e.g. 'interval[{subtype}, left]'. "
"Defaulting to closed='right'.",
FutureWarning,
stacklevel=2,
)
# default to "right"
closed = "right"

try:
subtype = pandas_dtype(subtype)
Expand All @@ -1047,14 +1086,32 @@ def __new__(cls, subtype=None):
)
raise TypeError(msg)

if closed is None and subtype is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have this one do we need to warn on L1058? (e.g. which one is getting hit)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you respond to this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are both hit separately

warnings.warn(
"Constructing an IntervalDtype without "
"specifying 'closed' is deprecated and will raise in "
"a future version. "
"Use e.g. IntervalDtype(np.int64, 'left'). "
"Defaulting to closed='right'.",
FutureWarning,
stacklevel=2,
)
closed = "right"

key = str(subtype) + str(closed)
try:
return cls._cache[str(subtype)]
return cls._cache[key]
except KeyError:
u = object.__new__(cls)
u._subtype = subtype
cls._cache[str(subtype)] = u
u._closed = closed
cls._cache[key] = u
return u

@property
def closed(self):
return self._closed

@property
def subtype(self):
"""
Expand Down Expand Up @@ -1104,7 +1161,7 @@ def type(self):
def __str__(self) -> str_type:
if self.subtype is None:
return "interval"
return f"interval[{self.subtype}]"
return f"interval[{self.subtype}, {self.closed}]"

def __hash__(self) -> int:
# make myself hashable
Expand All @@ -1118,6 +1175,8 @@ def __eq__(self, other: Any) -> bool:
elif self.subtype is None or other.subtype is None:
# None should match any subtype
return True
elif self.closed != other.closed:
return False
else:
from pandas.core.dtypes.common import is_dtype_equal

Expand All @@ -1129,6 +1188,17 @@ def __setstate__(self, state):
# pickle -> need to set the settable private ones here (see GH26067)
self._subtype = state["subtype"]

# backward-compat older pickles won't have "closed" key
self._closed = state.pop("closed", None)
if self._closed is None:
warnings.warn(
"Unpickled legacy IntervalDtype does not specify 'closed' "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have tests for all of the newly-added warning/raising cases in the IntervalDtype constructor. (just added a missing consistency check https://github.com/pandas-dev/pandas/pull/38394/files#diff-f99ef42afad339d00e36197a60ccc76d74c6a94c30e05aef69d18e0ec4b10d4eR1055)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, dont have a test for this one. will update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, this is definitely reached, but tm.assert_produces_warning isnt seeing anything

"attribute. Set dtype._closed to one of 'left', 'right', 'both', "
"'neither' before using this IntervalDtype object.",
UserWarning,
stacklevel=2,
)

@classmethod
def is_dtype(cls, dtype: object) -> bool:
"""
Expand Down Expand Up @@ -1175,11 +1245,15 @@ def __from_arrow__(
def _get_common_dtype(self, dtypes: List[DtypeObj]) -> Optional[DtypeObj]:
# NB: this doesn't handle checking for closed match
if not all(isinstance(x, IntervalDtype) for x in dtypes):
return None

closed = cast("IntervalDtype", dtypes[0]).closed
if not all(cast("IntervalDtype", x).closed == closed for x in dtypes):
return np.dtype(object)

from pandas.core.dtypes.cast import find_common_type

common = find_common_type([cast("IntervalDtype", x).subtype for x in dtypes])
if common == object:
return np.dtype(object)
return IntervalDtype(common)
return IntervalDtype(common, closed=closed)
Loading