Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: changed return type for multi-dimensional indexing #31870

Closed
h-vetinari opened this issue Feb 11, 2020 · 4 comments
Closed

REGR: changed return type for multi-dimensional indexing #31870

h-vetinari opened this issue Feb 11, 2020 · 4 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@h-vetinari
Copy link
Contributor

I kept my code free of deprecation warnings for 0.25.3 but upgrading still broke it.

I can now see on the tracker that multi-dimensional indexing has been deprecated #27837, #30588 #30867, but it seems the introduction of the deprecation warning has changed the behaviour itself.

>>> pd.__version__
'0.25.3'
>>> idx = pd.Index([f'ID_{x}' for x in range(10)])
>>> selector = np.array(np.random.randint(0, 10, (3, 10)))
>>> selector
array([[2, 3, 4, 9, 1, 8, 5, 6, 3, 3],
       [3, 5, 0, 7, 1, 3, 2, 0, 2, 8],
       [8, 4, 6, 8, 0, 4, 3, 4, 5, 7]])
>>> idx[selector]
Index(['ID_2', 'ID_3', 'ID_4', 'ID_9', 'ID_1', 'ID_8', 'ID_5', 'ID_6', 'ID_3',
       'ID_3', 'ID_3', 'ID_5', 'ID_0', 'ID_7', 'ID_1', 'ID_3', 'ID_2', 'ID_0',
       'ID_2', 'ID_8', 'ID_8', 'ID_4', 'ID_6', 'ID_8', 'ID_0', 'ID_4', 'ID_3',
       'ID_4', 'ID_5', 'ID_7'],
      dtype='object')
>>> idx[selector].values
array([['ID_2', 'ID_3', 'ID_4', 'ID_9', 'ID_1', 'ID_8', 'ID_5', 'ID_6',
        'ID_3', 'ID_3'],
       ['ID_3', 'ID_5', 'ID_0', 'ID_7', 'ID_1', 'ID_3', 'ID_2', 'ID_0',
        'ID_2', 'ID_8'],
       ['ID_8', 'ID_4', 'ID_6', 'ID_8', 'ID_0', 'ID_4', 'ID_3', 'ID_4',
        'ID_5', 'ID_7']], dtype=object)

On v1.0.1, this yields an np.array instead of a pd.Index, and so the .values call fails.

>>> idx[selector]
__main__:1: DeprecationWarning: Support for multi-dimensional indexing (e.g. `index[:, None]`) on an Index is deprecated and will be removed in a future version.  Convert to a numpy array before indexing instead.
array([['ID_2', 'ID_3', 'ID_4', 'ID_9', 'ID_1', 'ID_8', 'ID_5', 'ID_6',
        'ID_3', 'ID_3'],
       ['ID_3', 'ID_5', 'ID_0', 'ID_7', 'ID_1', 'ID_3', 'ID_2', 'ID_0',
        'ID_2', 'ID_8'],
       ['ID_8', 'ID_4', 'ID_6', 'ID_8', 'ID_0', 'ID_4', 'ID_3', 'ID_4',
        'ID_5', 'ID_7']], dtype=object)

Note also that the warning is (at least for me) not raised on python 3.6, only on 3.7.

If desired, I can flesh out the reasons for needing this multi-dimensional indexing. I wonder how I'll be able to replace it.

@jorisvandenbossche
Copy link
Member

Apparently this wasn't really included well in the whatsnew note (which only mentioned the deprecation), but this change was intentional.
The problem was that before 1.0, you got an Index object in a "corrupted" state (holding a 2D array as .values, which would fail in many operations). So there was a breaking change to now return the actual 2D array instead of the 1D Index holding the 2D array (to get rid of allowing such "corrupted" index objects, and which also makes it consistent with Series multi-dimensional indexing returning an ndarray), in addition to also deprecating this kind of indexing.

Note also that the warning is (at least for me) not raised on python 3.6, only on 3.7.

That's because it's a deprecation warning (and not a future warning); python changed the default visibility of warnings recently.

If desired, I can flesh out the reasons for needing this multi-dimensional indexing. I wonder how I'll be able to replace it.

Please do.

@jorisvandenbossche jorisvandenbossche added the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 11, 2020
@h-vetinari
Copy link
Contributor Author

Thanks for the quick response, that explains things a bit.

After tracking down the differences and raising this issue, I now also see that it can be quite easily solved in my case (as the warning suggests) by multi-indexing into the underlying .values. This solves my immediate problem, but -- annoyingly -- is not cross-compatible. In any case, it's enough to close this issue.

@jorisvandenbossche
Copy link
Member

This solves my immediate problem, but -- annoyingly -- is not cross-compatible

How is that not cross-compatible? I would think that works for both pandas 0.25.x and 1.0.x?

@h-vetinari
Copy link
Contributor Author

You're right, after rearranging the code, it now works for both. I was thinking of something else when I wrote that and was wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

2 participants