-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update DF.set_index #24762
DOC: update DF.set_index #24762
Conversation
Codecov Report
@@ Coverage Diff @@
## master #24762 +/- ##
===========================================
- Coverage 92.38% 42.91% -49.48%
===========================================
Files 166 166
Lines 52363 52363
===========================================
- Hits 48376 22471 -25905
- Misses 3987 29892 +25905
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24762 +/- ##
==========================================
- Coverage 92.38% 92.38% -0.01%
==========================================
Files 166 166
Lines 52379 52377 -2
==========================================
- Hits 48392 48389 -3
- Misses 3987 3988 +1
Continue to review full report at Codecov.
|
Before #22486, Strictly speaking this was orthogonal to the goal of fixing #22484, which is likely part of the reason why other devs like @toobaz missed it and now object. #24697 would rectify this, but since that is on hold, the least one can do (as noted in the OP) is to not advertise the addition of these ambiguous (and contested) list-likes. Hence, I'm changing the whatsnew-note added by #22486 to only reflect what it was supposed to fix: the three points from #22484. |
@TomAugspurger can you have a look here, I think we may need to revert #22486 to avoid any changes |
Sorry I can't keep this straight. What behavior changed? Something to do with |
I thought there was no changes, but I guess there are in #22486. @h-vetinari can you give show what changed. There was a lot of back and forth on that PR. I don't think anything should have changed but it seems it did. |
For the sake of discussion let's assume we have a list
if (is_scalar(col) or isinstance(col, tuple)) and col in frame:
# all good here
elif isinstance(col, (ABCSeries, ABCIndexClass, np.ndarray, list)):
# all good here as well (ABCIndexClass includes MultiIndex)
else:
# raise
for col in keys:
if (is_scalar(col) or isinstance(col, tuple)) and col in self:
# tuples can be both column keys or list-likes
# if they are valid column keys, everything is fine
continue
elif is_scalar(col) and col not in self:
# tuples that are not column keys are considered list-like,
# not considered missing
missing.append(col)
elif (not is_list_like(col, allow_sets=False)
or getattr(col, 'ndim', 1) > 1):
raise TypeError('The parameter "keys" may only contain a '
'combination of valid column keys and '
'one-dimensional list-likes') The source of the back-and-forth in #22486 was your orthogonal review requirement. I warned about this and went to great length (i.e. #23065) to avoid introducing something fundamentally broken (i.e. sets), but you were crystal-clear about wanting to avoid (the pre-existing) instance checks. But ok, you're reviewing basically everything here - oversights and misunderstandings can happen. To me, the right approach would be merging #24697, which is a small change, and also paves the way to (eventually) solve #24046 and #22225. |
@h-vetinari do you have a short example that behaves differently between 0.23.4 and master? |
Let's start with: >>> df = pd.DataFrame(np.arange(9).reshape((3, 3)), columns=['A', 'B', ('t', 'p', 'l')])
>>> df
A B (t, p, l)
0 0 1 2
1 3 4 5
2 6 7 8
It's the tuples that are now ambiguous (although with well-defined precedence; length must match of course for the list-like case). #24697 would take the fixes of #22484, keep the master-behaviour for tuples, and solve the list-ambiguity:
|
It's still a bit hard to follow, but I don't see anything in there that needs to be reverted. |
I also didn't follow everything, but
while this used (until last release) to raise |
Not sure how I can make it easier. I gave single line examples with behaviour before/after. Another way to look at it is that:
I'd suggest to keep that inspection step, but change the requirements the list-elements have to satisfy. I agree with you that there's nothing fundamentally broken (behaviour has well-defined rules, plus DFs are usually way longer than tuples), just that now there's more ambiguity instead of less (there's already |
Sorry, I should have more honestly written "I didn't have time to read everything". Will do it before the end of the week. |
@h-vetinari ok so this looks like this change slipped thru. Pls update to switch back to the original inspection code: isinstance(col, (ABCIndexClass, ABCSeries, np.ndarray, list)) to is_list_like(col, allow_sets=False), and only this change. |
I pondered opening a new PR for this, but in any case, this would have to be coupled with doc changes, so why not do it here. I removed the types that were added by #22486, and reinstated the instance-checks. This needs to come in two points though - first for the case that The changes in the test are just for removing the |
if (is_scalar(keys) or isinstance(keys, tuple) | ||
or isinstance(keys, (ABCIndexClass, ABCSeries, np.ndarray))): | ||
# make sure we have a container of keys/arrays we can iterate over | ||
# tuples can appear as valid column keys! | ||
keys = [keys] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strictly speaking, it would be possible to just keep wrapping everything that's not a list into a list, and raise in the for-loop below. But that's a bit hard to grok, and explicit is better than implicit, no?
@jreback |
thanks @h-vetinari |
Glad we could fix this before the release. Thanks. |
This reverts commit e984947.
Split off from #24697 by request of @jorisvandenbossche & @jreback
I kept the change for the whatsnew of #22486, to at least not emphasize that there are now ambiguous list-likes available for
DataFrame.set_index
(which haven't seen a release yet and would be removed again by #24697), which would/will make moving forward on this a bit easier. @toobaz