API: ensure IntervalIndex.left/right are 64bit if numeric, part II #50195

topper-123 · 2022-12-12T04:04:42Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Follow-up to #50130. It turned out that IntervalArray.from_array could get around the 64bit requirement, so we fix that by moving maybe_convert_numeric_to_64bit and and using it in the IntervalArray constructor also.

Also return the 64bit index in IntervalIndex._maybe_convert_i8, previously we returned original, which was the not-64bit-converted one...(this changes a test, but it’s just for an internal method).

phofl · 2022-12-16T09:54:14Z

pandas/core/arrays/interval.py

@@ -284,7 +306,10 @@ def _simple_new(
        from pandas.core.indexes.base import ensure_index

        left = ensure_index(left, copy=copy)


Is it possible to handle this somehow in the creation of the IntervalIndex?

I tried to remove ensure_index in various ways, but failed. It looks like there is some dtype issues that need to passed through an Index to be solved, but I didn't manage to untangle it, unfortunately.

There is a lot going on in IntervalArray._simple_new. I've looked into moving all the validation/dtype wrangling there into a separate function. That would mean that _simple_new would become much more simple and it would much simpler to instantiate an IntervalArray, when we can be sure the input data is correct. I'll push a PR about this shortly.

I did change this in the newest version, could you take a look?

MarcoGorelli · 2022-12-17T10:25:02Z

pandas/tests/indexes/interval/test_interval.py

-        assert result is key
+        if not isinstance(result, NumericIndex):
+            assert result is key
+        else:
+            expected = NumericIndex(key)
+            tm.assert_index_equal(result, expected)


rather than adding logic to the test (which can hide bugs), is it possible to either:

include the expected result in the parametrisation

OR split the test out into two separate ones, one of which uses assert result is key and the other tm.assert_index_equal(result, expected)?

Also, doesn't this test also pass on upstream/main? Is there a way to write it such that it fails there, but passes here?

I didn't get time to address this question right now, sorry. I'll get back to this tonight.

I`ve updated the PR.

thanks for updating - any chance you could address the logic in the test comment too please?

IIRC when I tried executing this, it was just one of the make_key inputs which required a different assertion

If so, then can the assertion either be included in the parametrisation, or the test be split into two?

For reference, this advice comes from: https://testing.googleblog.com/2014/07/testing-on-toilet-dont-put-logic-in.html

Yeah good points in the article, very nice to have it articulated.

I've changed the PR. I started by separating the test into two tests split by type of make_key. However, I didn't like having two very similar tests and I didn't like having lambdas in parametrization, (because lambdas are difficult to introspect, and having several lambdas means it's difficult to see which test you're looking at when debugging). so I've made a new version.

I prefer the newest version (avoiding lambdas, clear inputs into the test function), but will await your comment.

sure, looks better, thanks for updating

MarcoGorelli · 2022-12-17T10:37:51Z

pandas/core/arrays/interval.py

@@ -284,7 +306,10 @@ def _simple_new(
        from pandas.core.indexes.base import ensure_index

        left = ensure_index(left, copy=copy)
+        left = maybe_convert_numeric_to_64bit(left)


just for my understanding, what's an example of where this makes a difference? the test you've modifed passes even without this change

I think I took the wrong approach here originally.

The issue that this was supposed to solve is that on 32-bit systems e.g. IntervalArray.from_breaks([1, 2, 3]) should give an array with dtype interval[int64, right] to align with the convention in pandas that lists in constructors should interpreted as 64-bit (e.g. Series([1, 2, 3]) and Index([1, 2, 3]) both give int64 dtype even on 32-bit systems). Previously, (after #49560) giving lists to IntervalArray gave interval[int32, right]. This affected some tests in #49560 which is the reason I have taken this up.

In the newest version I moved this logic to a _maybe_convert_platform_interval, which IMO should be the better location for this.

Also, this issue on 32-bit systems is only with integer dtypes as e.g. np.asarray([1.5]) will always have float64 dtype. So I've made this simpler in the newest version by just checking for integer dtype and converting to int64 if needed.

MarcoGorelli · 2022-12-17T10:42:54Z

pandas/core/arrays/interval.py

+    if not is_array_like(arr):
+        return arr


just for my understanding, what's an example of where this makes a difference?

This was just a short-circuit, so the functions breaks early, if the value can't possibly be array-like, it made no funcional difference. This will overall probably not be an improvement as the arr in the current version now can't be non-array plus the function is typed, so I can remove it again.

I moved the function to core.dtypes.cast and renamed it maybe_upcast_numeric_to_64bit.

Yeah would remove this if not necessary. This would get me guessing how to get here if I would want to make a change in a couple of weeks

topper-123 · 2022-12-19T09:06:06Z

Sorry for the late response, I had some things I had to attend to in the weekend.

I respond to you comments individually above. I did look into this again and agree that some of the suggestions in my original PR could be improved upon (especially the changes to ÌntervalArray) and I've uploaded a new version (with rebase).

topper-123 · 2022-12-20T22:32:57Z

The failed check is unrelated.

topper-123 · 2022-12-23T06:11:29Z

Rebased to make the CI run again. No other changes have been made.

topper-123 · 2022-12-30T10:42:50Z

Ping.

As far as I see all comments have been addressed?

MarcoGorelli

Thanks for sticking with this

My only comment is about

    if not is_array_like(arr):
        return arr

, if this isn't covered by any tests, then TBH I'd prefer to keep it out

Other than that, I don't have any objections, but I'm not familiar enough with this part of the codebase to merge, so I'll hand over to @phofl

topper-123 · 2022-12-30T13:23:57Z

👍 I've remove that code section.

MarcoGorelli

Thanks for updating! No objections - approving to remove my 'requested changes', but handing over to others with more expertise in this before merging

jbrockmendel · 2023-01-03T19:35:43Z

pandas/core/arrays/interval.py

+def maybe_convert_numeric_to_64bit(arr: NumpyIndexT) -> NumpyIndexT:
+    # IntervalTree only supports 64 bit numpy array
+    dtype = arr.dtype
+    if not np.issubclass_(dtype.type, np.number):


is this different from is_numeric_dtype?

…IntervalIndex._engine

topper-123 · 2023-01-10T12:53:49Z

Ping. I've rebased because this has been standing still for a bit.

phofl · 2023-01-10T15:41:25Z

thx @topper-123

topper-123 force-pushed the IntervalIndex2 branch from 76e7938 to 0f96b05 Compare December 12, 2022 17:12

This was referenced Dec 12, 2022

BUG: NumericIndex should not support float16 dtype #49536

Merged

DEPR: Remove int64 uint64 float64 index part 1 #49560

Merged

mroeschke added the Interval Interval data type label Dec 13, 2022

topper-123 added this to the 2.0 milestone Dec 13, 2022

topper-123 force-pushed the IntervalIndex2 branch from 0f96b05 to d38b66c Compare December 15, 2022 16:38

topper-123 mentioned this pull request Dec 15, 2022

DEPR: deprecate Index.is_categorical #50225

Merged

5 tasks

phofl reviewed Dec 16, 2022

View reviewed changes

MarcoGorelli self-requested a review December 16, 2022 10:09

MarcoGorelli requested changes Dec 17, 2022

View reviewed changes

topper-123 force-pushed the IntervalIndex2 branch from d38b66c to 4117441 Compare December 19, 2022 09:00

MarcoGorelli self-requested a review December 20, 2022 22:33

topper-123 force-pushed the IntervalIndex2 branch from 5123e34 to f58478b Compare December 23, 2022 00:49

topper-123 mentioned this pull request Dec 28, 2022

DEPR: Remove int64 uint64 float64 index tests #50479

Closed

MarcoGorelli requested changes Dec 30, 2022

View reviewed changes

MarcoGorelli approved these changes Dec 30, 2022

View reviewed changes

topper-123 force-pushed the IntervalIndex2 branch from bd90f1b to 872c199 Compare January 3, 2023 15:29

jbrockmendel reviewed Jan 3, 2023

View reviewed changes

topper-123 mentioned this pull request Jan 4, 2023

GH49560 without GH50195 (DEPR: Remove int64 uint64 float64 index part 1) #50550

Closed

topper-123 added 6 commits January 10, 2023 00:37

API:move use of maybe_convert_numeric_to_64bit to to also be used in …

88923ad

…IntervalIndex._engine

move maybe_upcast_numeric_to_64bit to core.dtypes.cast

471362b

update

8cc670a

fix test_maybe_convert_i8_numeric

cc4859a

fix test_maybe_convert_i8_numeric II

2692932

fix precommit

d26cb7a

don't short-circuit

880d51f

topper-123 force-pushed the IntervalIndex2 branch from 872c199 to 880d51f Compare January 10, 2023 00:52

phofl approved these changes Jan 10, 2023

View reviewed changes

phofl merged commit 939d0ba into pandas-dev:main Jan 10, 2023

topper-123 deleted the IntervalIndex2 branch January 10, 2023 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: ensure IntervalIndex.left/right are 64bit if numeric, part II #50195

API: ensure IntervalIndex.left/right are 64bit if numeric, part II #50195

topper-123 commented Dec 12, 2022 •

edited

Loading

phofl Dec 16, 2022

topper-123 Dec 16, 2022 •

edited

Loading

topper-123 Dec 19, 2022

MarcoGorelli Dec 17, 2022

MarcoGorelli Dec 17, 2022

topper-123 Dec 19, 2022

topper-123 Dec 20, 2022

MarcoGorelli Dec 30, 2022

topper-123 Dec 30, 2022 •

edited

Loading

MarcoGorelli Dec 30, 2022

MarcoGorelli Dec 17, 2022

topper-123 Dec 19, 2022 •

edited

Loading

topper-123 Dec 19, 2022 •

edited

Loading

MarcoGorelli Dec 17, 2022

topper-123 Dec 19, 2022 •

edited

Loading

phofl Dec 30, 2022

topper-123 commented Dec 19, 2022

topper-123 commented Dec 20, 2022

topper-123 commented Dec 23, 2022

topper-123 commented Dec 30, 2022

MarcoGorelli left a comment •

edited

Loading

topper-123 commented Dec 30, 2022

MarcoGorelli left a comment

jbrockmendel Jan 3, 2023

topper-123 commented Jan 10, 2023

phofl commented Jan 10, 2023

		@@ -284,7 +306,10 @@ def _simple_new(
		from pandas.core.indexes.base import ensure_index

		left = ensure_index(left, copy=copy)

API: ensure IntervalIndex.left/right are 64bit if numeric, part II #50195

API: ensure IntervalIndex.left/right are 64bit if numeric, part II #50195

Conversation

topper-123 commented Dec 12, 2022 • edited Loading

Choose a reason for hiding this comment

topper-123 Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 Dec 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

topper-123 Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 commented Dec 19, 2022

topper-123 commented Dec 20, 2022

topper-123 commented Dec 23, 2022

topper-123 commented Dec 30, 2022

MarcoGorelli left a comment • edited Loading

Choose a reason for hiding this comment

topper-123 commented Dec 30, 2022

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 commented Jan 10, 2023

phofl commented Jan 10, 2023

topper-123 commented Dec 12, 2022 •

edited

Loading

topper-123 Dec 16, 2022 •

edited

Loading

topper-123 Dec 30, 2022 •

edited

Loading

topper-123 Dec 19, 2022 •

edited

Loading

topper-123 Dec 19, 2022 •

edited

Loading

topper-123 Dec 19, 2022 •

edited

Loading

MarcoGorelli left a comment •

edited

Loading