-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API/ERR: allow iterators in df.set_index & improve errors #24984
Changes from 1 commit
3e01681
1b71e68
caeb125
8bd5340
3c8b69a
cdfd86a
d76ecfb
d2ffb81
5863678
0a7d783
087d4f1
794f61d
0761633
c58e8b6
29fa8c0
b5c8fa8
5590433
7767ff7
37c12d0
2c4eaea
6c78816
2ccd9a9
b03c43b
ea10359
ca17895
a401eea
f4deacc
125b0ca
9bfcfde
6838613
ca2ac60
87bd0a6
759b369
40f1aaa
ecc7d03
5f99b15
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4135,13 +4135,8 @@ def set_index(self, keys, drop=True, append=False, inplace=False, | |
'array, or a list containing only valid column keys and ' | ||
'one-dimensional arrays.') | ||
|
||
if (is_scalar(keys) or isinstance(keys, tuple) | ||
or isinstance(keys, (ABCIndexClass, ABCSeries, np.ndarray))): | ||
# make sure we have a container of keys/arrays we can iterate over | ||
# tuples can appear as valid column keys! | ||
if not isinstance(keys, list): | ||
keys = [keys] | ||
elif not isinstance(keys, list): | ||
raise ValueError(err_msg) | ||
|
||
missing = [] | ||
for col in keys: | ||
|
@@ -4150,10 +4145,20 @@ def set_index(self, keys, drop=True, append=False, inplace=False, | |
# tuples are always considered keys, never as list-likes | ||
if col not in self: | ||
missing.append(col) | ||
elif (not isinstance(col, (ABCIndexClass, ABCSeries, | ||
np.ndarray, list)) | ||
or getattr(col, 'ndim', 1) > 1): | ||
raise ValueError(err_msg) | ||
elif isinstance(col, (ABCIndexClass, ABCSeries, | ||
np.ndarray, list)): | ||
# arrays are fine as long as they are one-dimensional | ||
if getattr(col, 'ndim', 1) > 1: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. != 1, is a 0-dim numpy scalar valid? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
raise ValueError(err_msg) | ||
else: | ||
# everything else gets tried as a key; see GH 24969 | ||
try: | ||
self[col] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think doing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried, but
this does not catch the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is the hashing needed? We just need to know whether it is a column name or not? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And it certainly checks hashing for some cases:
(just not an iter because as you said, it is hashable) |
||
str(col) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The error might be different, that's true. On second thought, |
||
except KeyError: | ||
tipo = type(col) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same as above |
||
raise ValueError(err_msg, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the reason for not treating it as a missing key? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because at that point we're dealing with an object we have absolutely no idea about, and assuming it is a key is worse (see the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I mean assuming it is a missing key. If it cannot be a key, it certainly is a missing key. Anyway, this is exactly the same behaviour as what There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I disagree with that quite strongly, again based on the |
||
'Received column of type {}'.format(tipo)) | ||
|
||
if missing: | ||
raise KeyError('{}'.format(missing)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If supporting custom types, we need to go back to the state pre-#22486 that just puts everything that's not listlike in list. Otherwise we can't guarantee that iteration below will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is best, let's just revert entirely to pre 22486
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre-#22486 had some other problems we do not need to re-instate (e.g. weird KeyErrors if it's an iter).
I agree that the code complexity must stay maintainable, and so the most reasonable thing (IMO), ist to just not bother trying to "fix" unhashable custom types being used as keys. At that point, the user is so far gone off the reservation that it's really not our job to give them a good error message (again, even the repr of such a frame would crash).