Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: make "closed" part of IntervalDtype #38394

Merged
merged 24 commits into from
Jan 10, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Dec 10, 2020

Most of the work here is in accomodating backwards-compat.

@jbrockmendel
Copy link
Member Author

added whatsnew for deprecation

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good a few questions. biggest issue is the II repr is now duplicative. should fix here, though would increase the diff a lot, followon ok.

pandas/core/arrays/interval.py Show resolved Hide resolved
pandas/core/dtypes/dtypes.py Show resolved Hide resolved
@@ -1047,14 +1080,32 @@ def __new__(cls, subtype=None):
)
raise TypeError(msg)

if closed is None and subtype is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have this one do we need to warn on L1058? (e.g. which one is getting hit)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you respond to this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are both hit separately

self._closed = state.pop("closed", None)
if self._closed is None:
warnings.warn(
"Unpickled legacy IntervalDtype does not specify 'closed' "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have tests for all of the newly-added warning/raising cases in the IntervalDtype constructor. (just added a missing consistency check https://github.com/pandas-dev/pandas/pull/38394/files#diff-f99ef42afad339d00e36197a60ccc76d74c6a94c30e05aef69d18e0ec4b10d4eR1055)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, dont have a test for this one. will update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, this is definitely reached, but tm.assert_produces_warning isnt seeing anything

@jreback jreback added Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type labels Dec 13, 2020
@jreback jreback added this to the 1.3 milestone Dec 13, 2020
@jreback
Copy link
Contributor

jreback commented Dec 13, 2020

cc @jschendel if you can have a look.

@jreback
Copy link
Contributor

jreback commented Dec 14, 2020

Performing static analysis using mypy
pandas/core/dtypes/dtypes.py:1250: error: Item "ExtensionDtype" of "Union[Any, ExtensionDtype]" has no attribute "closed" [union-attr]
pandas/core/dtypes/dtypes.py:1251: error: Item "ExtensionDtype" of "Union[Any, ExtensionDtype]" has no attribute "closed" [union-attr]

@jbrockmendel
Copy link
Member Author

There's an annoying new failure mode:

dtype = pd.IntervalDtype("interval[int64]")   # <-- now warns and sets closed="right" as default

idx = pd.IntervalIndex([], dtype=dtype, closed="left")   # <-- raises ValueError bc of mismatch

@jreback
Copy link
Contributor

jreback commented Dec 16, 2020

lgtm. @jorisvandenbossche @jschendel if you can take a look.

@jorisvandenbossche
Copy link
Member

Why is it needed to deprecate the default value for closed?

@jbrockmendel
Copy link
Member Author

Why is it needed to deprecate the default value for closed?

ATM there is (presumably) code in the wild calling the IntervalDtype constructor without specifying any closed. Eventually failure to specify closed will raise. Needs a deprecation cycle.

@jorisvandenbossche
Copy link
Member

Yes, but so the question is: why does it need to eventually raise?
Other constructors like Interval and interval_range also have a default.

Another option could also be to have it "not initialized" as default, to enable inferring it from the passed Interval objects

@jbrockmendel
Copy link
Member Author

Another option could also be to have it "not initialized" as default, to enable inferring it from the passed Interval objects

id be OK with this

@jreback
Copy link
Contributor

jreback commented Dec 17, 2020

Another option could also be to have it "not initialized" as default, to enable inferring it from the passed Interval objects

id be OK with this

then the default should be 'infer' (which we can do if we have an unitialized value)

@jbrockmendel
Copy link
Member Author

ok ill give this a shot and update

@jreback
Copy link
Contributor

jreback commented Dec 22, 2020

ping when ready here

@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

this was ok, can you merge master

@jbrockmendel
Copy link
Member Author

Most recent discussion is that I need to do a partial re-write, which I haven't gotten around to yet.

@jreback
Copy link
Contributor

jreback commented Dec 29, 2020

Most recent discussion is that I need to do a partial re-write, which I haven't gotten around to yet.

ok!

@jbrockmendel
Copy link
Member Author

updated as discussed. agreed this is much nicer

@jreback
Copy link
Contributor

jreback commented Jan 8, 2021

I must be missing it, where exactly is the FutureWarning in the Dtype constructor triggered?

@jreback
Copy link
Contributor

jreback commented Jan 8, 2021

this no longer deprecates?

@jbrockmendel
Copy link
Member Author

this no longer deprecates?

correct. passing "closed" to IntervalDtype is optional now

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. I would add a whatsnew note in other enhancements or create an interval bug fix section and say that the repr for Interval is changing.

@jbrockmendel
Copy link
Member Author

say that the repr for Interval is changing.

let's save that for the follow-up that changes that is aimed specifically at the repr

@jreback jreback merged commit 839c1bd into pandas-dev:master Jan 10, 2021
@jbrockmendel jbrockmendel deleted the enh-interval-dtype branch January 10, 2021 02:07
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Dtype Conversions Unexpected or buggy dtype conversions Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Add closed to IntervalDtype
3 participants