-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API/ERR: allow iterators in df.set_index & improve errors #24984
Changes from 12 commits
3e01681
1b71e68
caeb125
8bd5340
3c8b69a
cdfd86a
d76ecfb
d2ffb81
5863678
0a7d783
087d4f1
794f61d
0761633
c58e8b6
29fa8c0
b5c8fa8
5590433
7767ff7
37c12d0
2c4eaea
6c78816
2ccd9a9
b03c43b
ea10359
ca17895
a401eea
f4deacc
125b0ca
9bfcfde
6838613
ca2ac60
87bd0a6
759b369
40f1aaa
ecc7d03
5f99b15
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4143,13 +4143,8 @@ def set_index(self, keys, drop=True, append=False, inplace=False, | |
'array, or a list containing only valid column keys and ' | ||
'one-dimensional arrays.') | ||
|
||
if (is_scalar(keys) or isinstance(keys, tuple) | ||
or isinstance(keys, (ABCIndexClass, ABCSeries, np.ndarray))): | ||
# make sure we have a container of keys/arrays we can iterate over | ||
# tuples can appear as valid column keys! | ||
if not isinstance(keys, list): | ||
keys = [keys] | ||
elif not isinstance(keys, list): | ||
raise ValueError(err_msg) | ||
|
||
missing = [] | ||
for col in keys: | ||
|
@@ -4158,10 +4153,29 @@ def set_index(self, keys, drop=True, append=False, inplace=False, | |
# tuples are always considered keys, never as list-likes | ||
if col not in self: | ||
missing.append(col) | ||
elif (not isinstance(col, (ABCIndexClass, ABCSeries, | ||
np.ndarray, list)) | ||
or getattr(col, 'ndim', 1) > 1): | ||
raise ValueError(err_msg) | ||
elif isinstance(col, (ABCIndexClass, ABCSeries, | ||
np.ndarray, list)): | ||
# arrays are fine as long as they are one-dimensional | ||
if getattr(col, 'ndim', 1) > 1: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. != 1, is a 0-dim numpy scalar valid? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
raise ValueError(err_msg) | ||
elif is_list_like(col) and not hasattr(col, '__len__'): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An alterative (in light of the general comment I made), an alternative might be to only specifically check for BTW, one example that still passes here is a range object (so |
||
# various iterators/generators are hashable, but should not | ||
# raise a KeyError; we identify them by their lack of __len__. | ||
# hashable listlikes with __len__ get tested as keys below. | ||
tipo = type(col) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just put type(col) directly rather than adding another line here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
raise TypeError(err_msg + ' Received column of ' | ||
'type {}'.format(tipo)) | ||
else: | ||
# everything else gets tried as a key; see GH 24969 | ||
try: | ||
found = col in self.columns | ||
except TypeError: | ||
tipo = type(col) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same as above |
||
raise TypeError(err_msg + ' Received column of ' | ||
'type {}'.format(tipo)) | ||
else: | ||
if not found: | ||
WillAyd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
missing.append(col) | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if missing: | ||
raise KeyError('{}'.format(missing)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If supporting custom types, we need to go back to the state pre-#22486 that just puts everything that's not listlike in list. Otherwise we can't guarantee that iteration below will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is best, let's just revert entirely to pre 22486
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre-#22486 had some other problems we do not need to re-instate (e.g. weird KeyErrors if it's an iter).
I agree that the code complexity must stay maintainable, and so the most reasonable thing (IMO), ist to just not bother trying to "fix" unhashable custom types being used as keys. At that point, the user is so far gone off the reservation that it's really not our job to give them a good error message (again, even the repr of such a frame would crash).