-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow-up #20347: incorporate review about _get_series_list #20923
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responding to review of @jreback
# without copy, this could change "others" | ||
# that was passed to str.cat | ||
others = others.copy() | ||
others.index = idx | ||
return ([others[x] for x in others], fu_wrn) | ||
return ([others[x] for x in others], warn) | ||
elif isinstance(others, np.ndarray) and others.ndim == 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback :
this is wrong
i don’t think we can align a ndarray at all like this
let’s can ndarray a that are > 1 dim
My response:
The DF-constructor works as expected for a 2-dim ndarray, but I haven't checked if this is tested behaviour. (essentially, df == DataFrame(df.values, columns=df.columns, index=df.index)
)
I would suggest not to can 2-dim ndarrays, because they are necessary to avoid alignment on the deprecation path for join
:
[...] To disable alignment (the behavior before v.0.23) and silence this warning, use
.values
on any Series/Index/DataFrame inothers
. [...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test for this, basically see what happens when I have a non-default index with the Series (e.g. 2,3, 4) or something and it gets aligned with the 0,1,2 of the ndarray-converted-to-DataFrame. It will 'work' but is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a general comment before all the cases for others
because it seems this point wasn't clear:
# Generally speaking, all objects without an index inherit the index
# `idx` of the calling Series/Index - i.e. must have matching length.
# Objects with an index (i.e. Series/Index/DataFrame) keep their own
# index, *unless* ignore_index is set to True.
So, the case you mention does not happen, because an ndarray is always automatically aligned with the calling Series (of course, the lengths must match for this to work).
There are several tests with objects with different indexes, both with join is None
(so with force-realign; successful but raising a warning), as well as with the different join
-types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i c, ok then
pandas/core/strings.py
Outdated
elif isinstance(others, np.ndarray) and others.ndim == 2: | ||
others = DataFrame(others, index=idx) | ||
return ([others[x] for x in others], False) | ||
elif is_list_like(others): | ||
others = list(others) # ensure iterators do not get read twice etc | ||
if all(is_list_like(x) for x in others): | ||
los = [] | ||
fu_wrn = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fu_wrn = fu_wrn or fwn | ||
return (los, fu_wrn) | ||
# test if there is a mix of list-like and non-list-like (e.g. str) | ||
elif (any(is_list_like(x) for x in others) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make this simpler by just checking for all is not list like (eg strings)
anything else will fail thru to the TypeError
Done
pandas/core/strings.py
Outdated
while others: | ||
nxt = others.pop(0) # list-like as per check above | ||
# safety for iterators and other non-persistent list-likes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback :
this whole section needs some work it’s way too hard to read and follow
I tried to comment in detail what's happening - which part is unclear?
pandas/core/strings.py
Outdated
is_legal = ((no_deep and nxt.dtype == object) | ||
or all((isinstance(x, compat.string_types) | ||
or (not is_list_like(x) and isnull(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback :
isnull already checks for None
only 1d objects are valid here (or all scalars)
do this check up front
I simplified the check, but I can't put it any further upfront (unless I remove the fast track). Need to convert iterators first, and the checking for fast-track for Series/Index etc. also needs to be done before
@TomAugspurger from #20922
What do you mean by being strict. Could you maybe edit the table from above to reflect what you think should be allowed? |
@TomAugspurger This PR will (almost certainly) be fully green in ~1h. |
Nope, just doing build stuff right now. |
Codecov Report
@@ Coverage Diff @@
## master #20923 +/- ##
==========================================
- Coverage 91.81% 91.81% -0.01%
==========================================
Files 153 153
Lines 49471 49470 -1
==========================================
- Hits 45422 45421 -1
Misses 4049 4049
Continue to review full report at Codecov.
|
# without copy, this could change "others" | ||
# that was passed to str.cat | ||
others = others.copy() | ||
others.index = idx | ||
return ([others[x] for x in others], fu_wrn) | ||
return ([others[x] for x in others], warn) | ||
elif isinstance(others, np.ndarray) and others.ndim == 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test for this, basically see what happens when I have a non-default index with the Series (e.g. 2,3, 4) or something and it gets aligned with the 0,1,2 of the ndarray-converted-to-DataFrame. It will 'work' but is wrong.
other than my comment above the cleanup looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I think the point you mention is a misunderstanding, but I pushed another commit to make it clearer what's happening.
# without copy, this could change "others" | ||
# that was passed to str.cat | ||
others = others.copy() | ||
others.index = idx | ||
return ([others[x] for x in others], fu_wrn) | ||
return ([others[x] for x in others], warn) | ||
elif isinstance(others, np.ndarray) and others.ndim == 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a general comment before all the cases for others
because it seems this point wasn't clear:
# Generally speaking, all objects without an index inherit the index
# `idx` of the calling Series/Index - i.e. must have matching length.
# Objects with an index (i.e. Series/Index/DataFrame) keep their own
# index, *unless* ignore_index is set to True.
So, the case you mention does not happen, because an ndarray is always automatically aligned with the calling Series (of course, the lengths must match for this to work).
There are several tests with objects with different indexes, both with join is None
(so with force-realign; successful but raising a warning), as well as with the different join
-types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we close this, maybe this question from the last PR merits some further discussion. @TomAugspurger @jreback @jorisvandenbossche
@@ -311,7 +311,7 @@ All one-dimensional list-likes can be arbitrarily combined in a list-like contai | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another point of @TomAugspurger in #20347, responding to the line
All one-dimensional list-likes can be arbitrarily combined in a list-like container (including iterators, ``dict``-views, etc.):
(which is just above this line pointer):
question: what happens with a dict? Do we use the keys, or does it transform to a series and get aligned?
Response 1:
I was talking about
d.keys()
ord.values()
. For passing adict
directly, currently the keys would get read (as that's whatx in d
would return).
Response 2:
I could of course add another safety that maps
d
tod.values()
.
However, I tend to think that this could be left to the python-skills of the user as well -- a dictionary is not "list-like" from the POV of normal python usage (whereas its keys and values are, seed = dict(zip(keys, values))
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a dict is treated consistently here how we handle list-likes else where, we effectively call list
on it
@@ -311,7 +311,7 @@ All one-dimensional list-likes can be arbitrarily combined in a list-like contai | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a dict is treated consistently here how we handle list-likes else where, we effectively call list
on it
# without copy, this could change "others" | ||
# that was passed to str.cat | ||
others = others.copy() | ||
others.index = idx | ||
return ([others[x] for x in others], fu_wrn) | ||
return ([others[x] for x in others], warn) | ||
elif isinstance(others, np.ndarray) and others.ndim == 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i c, ok then
In that case, I think this PR is good to go. |
@jreback @TomAugspurger @jorisvandenbossche
OTOH, I'm honestly fine to leave this PR as it is. Special treatment for dicts should be a separate issue, IMHO. |
thanks @h-vetinari good followup! as far as dicts go, certainly welcome clarification if needed (PR / issue all ok), or can just wait and see if that usecase is seen in real life |
@jreback |
Closes #20922
recap of #20347 :
Enabling alignment on top of the (v.0.22-)legal list of lists was relatively complex. The allowed argument types implemented in #20347 are as follows:
** to be permitted, list-likes (say
nxt
) within a list-likeothers
must pass (essentially)all(not is_list_like(x) for x in nxt)
and not be a DF.
Open review points from #20347 from @jreback (and my responses) reproduced in review comments.
PS. Managed to have a brainfart and there's a bug in the RC docs - very sorry. Fixed in this PR.