BUG: validate Index data is 1D + deprecate multi-dim indexing #30588

jbrockmendel · 2019-12-31T20:31:38Z

closes Assigning array of >1 dim to index produces inconsistent index #13601
closes BUG: Index constructor should not allow an ndarray with ndim > 2 #27125
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This changes the behavior of idx[:, None] to return an ndarray instead of an invalid Index, needed to keep matplotlib tests working.

See also #20285 which this does not entirely close. That becomes pretty easy to address though once a decision is made on whether to treat [[0, 1], [2, 3]] like [(0, 1), (2, 3)] (the latter becomes a MultiIndex, the former currently becomes an invalid Index)

jbrockmendel · 2019-12-31T21:29:37Z

There is a feather test in which we do

         df.columns = (pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1)]),)
         self.check_error_on_write(df, ValueError)

and in this PR we are now raising on the df.columns = ... instead of getting through to fail on the next line. Is there a different invalid index we can pass to restore this test? cc @jorisvandenbossche

jreback

lgtm. ping on green. i think should add a whatsnew note

TomAugspurger · 2020-01-02T16:28:09Z

There is a feather test in which we do

@jbrockmendel that test setup looks bad. We're assigning a length-1 tuple whose only element is a MultiIndex to be the single key?

In [7]: df = pd.DataFrame({"A": [1, 2, 3]})

In [8]: df.columns = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("a", 3)]),

In [9]: df
Out[9]: ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/sandbox/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.map_locations()
   1637             raise KeyError(key)
   1638
-> 1639     def map_locations(self, ndarray[object] values):
   1640         cdef:
   1641             Py_ssize_t i, n = len(values)

ValueError: Buffer has wrong number of dimensions (expected 1, got 3)
Exception ignored in: 'pandas._libs.index.IndexEngine._call_map_locations'
Traceback (most recent call last):
  File "pandas/_libs/hashtable_class_helper.pxi", line 1639, in pandas._libs.hashtable.PyObjectHashTable.map_locations
    def map_locations(self, ndarray[object] values):
ValueError: Buffer has wrong number of dimensions (expected 1, got 3)

I think we'd rather have a length-1 MultiIndex

diff --git a/pandas/tests/io/test_feather.py b/pandas/tests/io/test_feather.py
index e06f2c31a2..3500470035 100644
--- a/pandas/tests/io/test_feather.py
+++ b/pandas/tests/io/test_feather.py
@@ -136,7 +136,7 @@ class TestFeather:
 
         # column multi-index
         df.index = [0, 1, 2]
-        df.columns = (pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1)]),)
+        df.columns = pd.MultiIndex.from_tuples([("a", 1)])
         self.check_error_on_write(df, ValueError)
 
     def test_path_pathlib(self):

…g-idx-ndim

jbrockmendel · 2020-01-02T16:36:21Z

I think we'd rather have a length-1 MultiIndex

That seems to do the trick, thanks.

jbrockmendel · 2020-01-02T20:49:15Z

ping, whatsnew added

TomAugspurger · 2020-01-02T21:47:23Z

This changes the behavior of idx[:, None] to return an ndarray instead of an invalid Index, needed to keep matplotlib tests working.

Do we have tests that ensure index[2darray] and idx[:, None] returns an ndarray? Did we decide on that as a desirable API, or is that still under discussion?

jbrockmendel · 2020-01-02T22:02:31Z

Did we decide on that as a desirable API, or is that still under discussion?

It's definitely been decided on for DatetimeIndex, which does that ATM. For the rest, I think this is #27837.

TomAugspurger · 2020-01-02T22:11:25Z

It seems like in #27837 the preference of most was to deprecate indexing with 2-D indexers. I'm concerned about changing the behavior here to return an ndarray, only to deprecate it in the future.

I suppose that validating we don't have 2D data necessitates the change to Index.__getitem__ to return an ndarray?

jbrockmendel · 2020-01-02T22:43:57Z

I guess we _could_ call simple_new from getitem, but I’d prefer to just return ndarray. Could also deprecate saying it will raise instead of returning ndarray in the future.

…

On Thu, Jan 2, 2020 at 2:11 PM Tom Augspurger ***@***.***> wrote: It seems like in #27837 <#27837> the preference of most was to deprecate indexing with 2-D indexers. I'm concerned about changing the behavior here to return an ndarray, only to deprecate it in the future. I suppose that validating we don't have 2D data necessitates the change to Index.__getitem__ to return an ndarray? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#30588?email_source=notifications&email_token=AB5UM6HJ66T5NLKXZDNLJ6TQ3ZRA7A5CNFSM4KBX27QKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH7TDLA#issuecomment-570372524>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5UM6D6ELAB6XUXV4N2JSDQ3ZRA7ANCNFSM4KBX27QA> .

TomAugspurger · 2020-01-03T14:12:56Z

That's also an option. @jreback @jorisvandenbossche thoughts on that?

I'm slightly OK with breaking API to return an ndarray rather than an Index here, since the behavior on master is so clearly broken. But I also don't want to establish Index.__getitem__ returning an ndarray as a documented / supported behavior.

jreback · 2020-01-05T21:56:51Z

re-reading the original, I think we are all in favor of deprecate & raise. This PR now raises for construction with a 2D. I don't see any behavior change on getitem, so that still returns an ndarray?

jbrockmendel · 2020-01-05T22:44:09Z

I don't see any behavior change on getitem, so that still returns an ndarray?

This does change Index.__getitem__ to return ndarray[ndim=2] instead of constructing an invalid Index.

jreback · 2020-01-05T23:40:31Z

I don't see any behavior change on getitem, so that still returns an ndarray?

This does change Index.__getitem__ to return ndarray[ndim=2] instead of constructing an invalid Index.

ok and tests are added for this? or changed ones?

jbrockmendel · 2020-01-05T23:45:38Z

ok and tests are added for this? or changed ones?

Yes, in test_indexing tests are updated to check for the expected exception when indexing with non-1D.

jorisvandenbossche · 2020-01-06T08:31:08Z

I am personally fine with the 2D Index -> ndarray change in getitem now, if that makes it easier to fix the Index construction from 2D data bug (in #27837 (comment), I also mentioned such a change probably has not too much impact, at least not for the matplotlib use case).
But, since we agree we want to deprecate it, I would directly add such a DeprecationWarning as well while changing it.

TomAugspurger · 2020-01-06T13:37:26Z

DeprecationWarning or FutureWarning? I don't have a preference.

We'll need a doc note for the API change and the deprecation.

jorisvandenbossche · 2020-01-06T13:40:20Z

Let's start with a DeprecationWarning, as this is typically something that will be used by libraries I think. We have plenty of time to change it to FutureWarning before 2.0.

jreback · 2020-01-06T13:41:34Z

Let's start with a DeprecationWarning, as this is typically something that will be used by libraries I think. We have plenty of time to change it to FutureWarning before 2.0.

sgtm

…g-idx-ndim

doc/source/whatsnew/v1.0.0.rst

jorisvandenbossche · 2020-01-06T16:47:00Z

pandas/core/indexes/base.py

+                # Deprecation GH#30588
+                warnings.warn(
+                    "Support for Index[:, None] is deprecated and will be "
+                    "removed in a future version.",


Same comment here about Index[:, None] and recommended alternative as I posted above for the whatsnew

Co-Authored-By: Joris Van den Bossche <jorisvandenbossche@gmail.com>

…g-idx-ndim

TomAugspurger · 2020-01-09T14:58:20Z

Planning to do the RC in ~5 hours.

…g-idx-ndim

pandas/core/arrays/interval.py

pandas/core/indexes/base.py

pandas/core/arrays/interval.py

jschendel · 2020-01-09T18:43:40Z

pandas/core/arrays/interval.py

@@ -500,8 +500,11 @@ def __getitem__(self, value):

        # scalar
        if not isinstance(left, ABCIndexClass):
-            if isna(left):
+            if is_scalar(left) and isna(left):


My intention with this block of code was to only handle scalars here. Since self.left and self.right are always indexes, and we're using __getitem__ on them to get left/right, my assumption at the time was that left/right would either always be a scalar or an Index (1d), so not isinstance(left, ABCIndexClass) would imply scalar (I guess could use is_scalar instead of the isinstance).

With this PR it looks like Index.__getitem__ can return a scalar, Index, or ndarray with ndim > 1? With the last case being temporary until we remove this behavior? Or am I omitting a case where something else could be returned? If these are the only three cases, could we handle the ndim > 1 case separately before this if block?

Something like:

left = self.left[value] right = self.right[value] # TODO: remove this block when Index.__getitem__ returning ndim > 1 is deprecated if np.ndim(left) > 1: # GH#30588 multi-dimensional indexer disallowed raise ValueError(...) # scalar if not isinstance(left, ABCIndexClass): ....

Makes the logic less nested and easier to remove when we go through with the deprecation.

Updated so that this raises ValueError for >1dim indexing on IntervalArray

jbrockmendel · 2020-01-09T19:27:44Z

@TomAugspurger I have to head out in about half an hour, which IIRC is about when you wanted to cut the RC. Anything else to do here? (assuming Travis finished green)

TomAugspurger · 2020-01-09T19:28:42Z

I think we're good. Will let it sit until just before I tag the rc, in case @jreback has a chance to look. But I think we can do followups after the RC.

jorisvandenbossche · 2020-01-09T19:28:50Z

Taking a look

jorisvandenbossche · 2020-01-09T19:38:01Z

pandas/core/arrays/categorical.py

-        )
+        result = self._codes[key]
+        if result.ndim > 1:
+            return result


This should also raise a warning?

(but adding the warning is not a blocker for the RC, since that's a future deprecation, as long as the behaviour change is already there)

Actually, we are returning plain integer codes here, which doesn't look good.

And checked with 0.25, and there multi-dim indexing on categorical/categoricalindex basically fails (it "returns", but whenever you do something it gives an error; even converting to the result to a numpy array (the original matplotlib use case for the 2d indexing) fails).
So I would also raise an error here, like you did for some other arrays.

Agreed this should raise to match Interval. Will push an update.

It's not straightforward, as actually this already gets catched before in the boolean checking ..

(I would be fine with doing a clean-up for my comments in a follow-up PR)

Yeah, this is a bit tricky. Will look a bit longer.

this raised an error in 0.25, it raises a different (but wrong) error now, so I think fine to do in a follow-up

I'm very confused by what's going on with with Series.__getitem__ here. Are we OK with changing this for categorical after the RC?

jorisvandenbossche · 2020-01-09T19:48:53Z

pandas/tests/indexes/categorical/test_category.py

+    def test_getitem_2d_deprecated(self):
+        # GH#30588 multi-dim indexing is deprecated, but raising is also acceptable
+        idx = self.create_index()
+        with pytest.raises(ValueError, match="cannot mask with array containing NA"):


We shouldn't test wrong error messages ...

This may be a bug in how we're handling tuples. We convert the tuple (slice(None), None) to an ndarray

array([slice(None, None, None), None], dtype=object)

which gets interpreteded as a boolean mask with missing values, and so raise.

I'm going to update the test to have a 2d ndarray as the indexer, nad open an issue for the bug.

jorisvandenbossche · 2020-01-09T19:51:46Z

pandas/tests/indexes/interval/test_base.py

+        idx = self.create_index()
+        with pytest.raises(ValueError, match="multi-dimensional indexing not allowed"):
+            with tm.assert_produces_warning(DeprecationWarning, check_stacklevel=False):
+                idx[:, None]


Shouldn't this only raise the error? (while trying out this branch, for me it also does)

Keep in mind, it's a deprecation warning and may not be visible.

Yep, that's it

In [11]: warnings.simplefilter("always", DeprecationWarning) In [12]: idx[np.array([[0, 1]])] /Users/taugspurger/sandbox/pandas/pandas/core/arrays/interval.py:498: DeprecationWarning: Support for multi-dimensional indexing (e.g. `index[:, None]`) on an Index is deprecated and will be removed in a future version. Convert to a numpy array before indexing instead. left = self.left[value] /Users/taugspurger/sandbox/pandas/pandas/core/arrays/interval.py:499: DeprecationWarning: Support for multi-dimensional indexing (e.g. `index[:, None]`) on an Index is deprecated and will be removed in a future version. Convert to a numpy array before indexing instead. right = self.right[value] --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-12-bd6337c200ee> in <module> ----> 1 idx[np.array([[0, 1]])] ~/sandbox/pandas/pandas/core/indexes/interval.py in __getitem__(self, value) 990 991 def __getitem__(self, value): --> 992 result = self._data[value] 993 if isinstance(result, IntervalArray): 994 return self._shallow_copy(result) ~/sandbox/pandas/pandas/core/arrays/interval.py in __getitem__(self, value) 505 if np.ndim(left) > 1: 506 # GH#30588 multi-dimensional indexer disallowed --> 507 raise ValueError("multi-dimensional indexing not allowed") 508 return Interval(left, right, self.closed) 509 ValueError: multi-dimensional indexing not allowed

I think that's OK for now, but can fix up later if desired.

jorisvandenbossche · 2020-01-09T19:53:02Z

pandas/tests/indexing/test_indexing.py

                idxr[nd3]
+        else:
+            with pytest.raises(ValueError, match=msg):
+                with tm.assert_produces_warning(DeprecationWarning):


same here, is it needed to have both contexts?

jorisvandenbossche · 2020-01-09T19:54:25Z

pandas/tests/plotting/test_converter.py

@@ -66,11 +66,10 @@ def test_registering_no_warning(self):

        # Set to the "warn" state, in case this isn't the first test run
        register_matplotlib_converters()
-        with tm.assert_produces_warning(None) as w:
+        with tm.assert_produces_warning(DeprecationWarning, check_stacklevel=False):
+            # GH#30588 DeprecationWarning from 2D indexing


I think we shouldn't assert here, but just ignore warnings. When matplotlib fixes this (hopefully they will do soon), this test would otherwise fail

(can be fixed later)

Working on a fix on the mpl side now.

jbrockmendel · 2020-01-09T19:59:38Z

off topic: running the tests has started leaving behind a bunch of gibberish.pickle files. is that just me?

TomAugspurger · 2020-01-09T21:03:53Z

So I think the only potential behavior change is for categorical at #30588 (comment). We might want that to raise for non 1-d indexers, but I can't figure things out. I'm comfortable changing that after the RC, so I'm planning to merge this in a bit (30 minutes or so).

jbrockmendel · 2020-01-09T21:04:32Z

I’m AFK for a few more hours, feel free to push edits.

…

On Thu, Jan 9, 2020 at 1:02 PM Tom Augspurger ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/core/arrays/categorical.py <#30588 (comment)>: > @@ -2007,9 +2007,10 @@ def __getitem__(self, key): if com.is_bool_indexer(key): key = check_bool_array_indexer(self, key) - return self._constructor( - values=self._codes[key], dtype=self.dtype, fastpath=True - ) + result = self._codes[key] + if result.ndim > 1: + return result I'm very confused by what's going on with with Series.__getitem__ here. Are we OK with changing this for categorical after the RC? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#30588?email_source=notifications&email_token=AB5UM6GIS5HB7GM2GGFB3XLQ46GFBA5CNFSM4KBX27QKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCRIF7EA#discussion_r364958045>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5UM6GAQOIUPYZCZKAEIF3Q46GFBANCNFSM4KBX27QA> .

jorisvandenbossche · 2020-01-09T21:10:55Z

To be clear, the 2D indexing case on categorical is already completely broken in pandas 0.25.0 as well, so I think the (wrong) error now is fine. We can fix the error message after the RC.

TomAugspurger · 2020-01-09T21:17:13Z

Thanks! Opening followups now.

…ndexing-1row-df * upstream/master: (284 commits) CLN: leftover ix checks (pandas-dev#30951) CI: numpydev changed double to single quote (pandas-dev#30952) DOC: Fix whatsnew contributors section (pandas-dev#30926) STY: wrong placed space in strings (pandas-dev#30940) TYP: type up parts of series.py (pandas-dev#30761) DOC: Fix SS03 docstring error (pandas-dev#30939) CLN: remove unnecesary _date_check_type (pandas-dev#30932) DOC: Fixture docs in pandas/conftest.py (pandas-dev#30917) CLN: F-strings (pandas-dev#30916) replace syntax with f-string (pandas-dev#30919) BUG: pickle files left behind by tm.round_trip_pickle (pandas-dev#30906) TYP: offsets (pandas-dev#30897) TYP: typing annotations (pandas-dev#30901) WEB: Remove from roadmap moving the docstring script (pandas-dev#30893) WEB: Removing Discourse links (pandas-dev#30890) DOC: Encourage use of pre-commit in the docs (pandas-dev#30864) DEPR: fix missing stacklevel in pandas.core.index deprecation (pandas-dev#30878) CLN: remove unnecessary overriding in subclasses (pandas-dev#30875) RLS: 1.0.0rc0 BUG: validate Index data is 1D + deprecate multi-dim indexing (pandas-dev#30588) ... # Conflicts: # doc/source/whatsnew/v1.0.0.rst

cishwarya · 2023-04-05T10:18:10Z

Hi @jbrockmendel @TomAugspurger @jreback , I have a pandas dataframe with the following 4 columns
ds | yhat | yhat_lower | yhat_upper

2023-01-28 | 144.54255 | 130.96604 | 157.49898

This is the result of predicted dataframe (called forecast) using Prophet time series model.
I am trying to plot the dataframe with the following command
fig1 = model.plot(forecast)
This doesn't work anymore and throws the error Multi-dimensional indexing (e.g. obj[:, None]) is no longer supported. Convert to a numpy array before indexing instead.
I converted my dataframe to numpy array
forecast_np =forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].to_numpy()
but still get the error
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
Can you please help here?
Thanks

BUG: validate Index data is 1D

8e23296

jreback added Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses labels Jan 1, 2020

jreback requested changes Jan 1, 2020

View reviewed changes

jbrockmendel added 2 commits January 2, 2020 08:31

Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…

bad608d

…g-idx-ndim

Use single-entry MultiIndex

181457a

whatsnew

2898896

jbrockmendel added 2 commits January 6, 2020 08:19

Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…

19d4347

…g-idx-ndim

deprecate 2d returns

e265c15

jorisvandenbossche reviewed Jan 6, 2020

View reviewed changes

jbrockmendel and others added 3 commits January 6, 2020 09:34

Update doc/source/whatsnew/v1.0.0.rst

c14f01a

Co-Authored-By: Joris Van den Bossche <jorisvandenbossche@gmail.com>

Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…

cdc9fda

…g-idx-ndim

Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…

ea67b33

…g-idx-ndim

Merge branch 'master' of https://github.com/pandas-dev/pandas into bu…

e488139

…g-idx-ndim

TomAugspurger reviewed Jan 9, 2020

View reviewed changes

pandas/core/arrays/interval.py Show resolved Hide resolved

pandas/core/indexes/base.py Show resolved Hide resolved

jbrockmendel added 2 commits January 9, 2020 09:28

catch warnings

e02dd13

Disallow multi-dim indexing for IntervalArray

d163da8

jschendel reviewed Jan 9, 2020

View reviewed changes

ignore stacklevel

74dffbe

TomAugspurger approved these changes Jan 9, 2020

View reviewed changes

jorisvandenbossche reviewed Jan 9, 2020

View reviewed changes

jorisvandenbossche changed the title ~~BUG: validate Index data is 1D~~ BUG: validate Index data is 1D + deprecate multi-dim indexing Jan 9, 2020

TomAugspurger merged commit 13858f6 into pandas-dev:master Jan 9, 2020

TomAugspurger mentioned this pull request Jan 9, 2020

Multi-dimensional Index indexing followups #30867

Closed

jbrockmendel deleted the bug-idx-ndim branch January 10, 2020 03:53

TomAugspurger mentioned this pull request Jan 13, 2020

DataFrame.set_index when setting a duplicate name now raises #30965

Open

jorisvandenbossche mentioned this pull request Jan 28, 2020

API: what should a 2D indexing operation into a 1D Index do? (eg idx[:, None]) #27837

Closed

This was referenced Feb 2, 2020

Backport PR: BUG: Array.__setitem__ failing with nullable boolean mask (#31484) #31562

Merged

CI: Remove warning raising after new matplotlib release #31573

Merged

h-vetinari mentioned this pull request Feb 11, 2020

REGR: changed return type for multi-dimensional indexing #31870

Closed

jorisvandenbossche mentioned this pull request Mar 10, 2021

[ArrayManager] TST: run (+fix/skip) pandas/tests/indexing tests #40325

Merged

tacaswell mentioned this pull request Jan 11, 2022

BUG: inconsistent behavior on multi-dimensional slicing based on type #45303

Open

3 tasks

BUG: validate Index data is 1D + deprecate multi-dim indexing #30588

BUG: validate Index data is 1D + deprecate multi-dim indexing #30588

Conversation

jbrockmendel commented Dec 31, 2019

jbrockmendel commented Dec 31, 2019

jreback left a comment

Choose a reason for hiding this comment

TomAugspurger commented Jan 2, 2020

jbrockmendel commented Jan 2, 2020

jbrockmendel commented Jan 2, 2020

TomAugspurger commented Jan 2, 2020

jbrockmendel commented Jan 2, 2020

TomAugspurger commented Jan 2, 2020

jbrockmendel commented Jan 2, 2020 via email

TomAugspurger commented Jan 3, 2020

jreback commented Jan 5, 2020

jbrockmendel commented Jan 5, 2020

jreback commented Jan 5, 2020

jbrockmendel commented Jan 5, 2020

jorisvandenbossche commented Jan 6, 2020

TomAugspurger commented Jan 6, 2020

jorisvandenbossche commented Jan 6, 2020

jreback commented Jan 6, 2020

Choose a reason for hiding this comment

TomAugspurger commented Jan 9, 2020

jschendel Jan 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Jan 9, 2020

TomAugspurger commented Jan 9, 2020

jorisvandenbossche commented Jan 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Jan 9, 2020

TomAugspurger commented Jan 9, 2020

jbrockmendel commented Jan 9, 2020 via email

jorisvandenbossche commented Jan 9, 2020

TomAugspurger commented Jan 9, 2020

cishwarya commented Apr 5, 2023 • edited Loading

jschendel Jan 9, 2020 •

edited

Loading

cishwarya commented Apr 5, 2023 •

edited

Loading