-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: str.cat will align on Series #20347
Conversation
Hello @h-vetinari! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on May 02, 2018 at 05:33 Hours UTC |
I just realised that due to the API-change, a what's-new entry is probably needed somewhere? |
Codecov Report
@@ Coverage Diff @@
## master #20347 +/- ##
==========================================
+ Coverage 91.79% 91.81% +0.01%
==========================================
Files 153 153
Lines 49411 49478 +67
==========================================
+ Hits 45359 45429 +70
+ Misses 4052 4049 -3
Continue to review full report at Codecov.
|
9ff31f8
to
8ec2f1a
Compare
There are some useless commits in the appveyor-queue - how can those be cancelled? I'm guessing I don't have sufficient rights to do it. If someone who can should see this, you can cancel builds for commits (starting with): |
pandas/tests/test_strings.py
Outdated
@@ -2760,6 +2761,17 @@ def test_str_cat_raises_intuitive_error(self): | |||
with tm.assert_raises_regex(ValueError, message): | |||
s.str.cat(' ') | |||
|
|||
def test_str_cat_align(self): | |||
# https://github.com/pandas-dev/pandas/issues/18657 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs another case where this would produce nans on some elements (e.g. the original issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add
pandas/core/strings.py
Outdated
@@ -65,6 +78,11 @@ def str_cat(arr, others=None, sep=None, na_rep=None): | |||
If None, concatenates without any separator. | |||
na_rep : string or None, default None | |||
If None, NA in the series are ignored. | |||
align : bool or None, default None | |||
If used between two Series, determines whether they are aligned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a versionadded tag
UserWarning -> FutureWarning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add
pandas/core/strings.py
Outdated
and len(others) and isinstance(others, Series)): | ||
if align is None: | ||
align = False | ||
warnings.warn("A future version of pandas will perform alignment " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FutureWarning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/core/strings.py
Outdated
if align is None: | ||
align = False | ||
warnings.warn("A future version of pandas will perform alignment " | ||
"when others is a series. To disable alignment (the " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit unclear, there is no 'previous behavior; the default is to not align
, but it will change in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text is the suggestion of @TomAugspurger in the original issue #18657
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a whatsnew subsection to explain this change, also pls update io.rst
@jreback, I assume you meant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few doc comments.
Have you checked the log output from Travis to see if any warnings in the test suite are uncaught? They're printed at the bottom of the test output.
doc/source/text.rst
Outdated
their indexes will be aligned before concatenation (if ``align=True``) or not (if ``align=False``). As usual, alignment will expand to the union of both | ||
indexes, while introducing ``NaN`` for missing values in the respective other series (which can be easily handled with the ``na_rep``-keyword). | ||
|
||
If the ``align`` keyword is not passed, the method will currently fall back to the previous behavior (i.e. ``align=False``), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this in a .. warning::
directive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part? I added one starting at "If the align
keyword is not passed"
doc/source/text.rst
Outdated
.. ipython:: python | ||
|
||
base = Series(['a', 'b', 'c', 'd', 'e']) | ||
s = base.reindex([0, 1, 2, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you indent all these the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
doc/source/text.rst
Outdated
s.str.cat(t, align=True) | ||
s.str.cat(t, align=False, na_rep='') | ||
|
||
.. versionadded:: 0.23.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear what this versionadded
is referring to. The keyword? The .str.cat
method? I think it's best to leave that to the API documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
doc/source/whatsnew/v0.23.0.txt
Outdated
``Series.str.cat`` has gained the ``align`` kwarg | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
So far, the method :meth:`Series.str.cat` did not -- in contrast to most of ``pandas`` -- align :class:`Series` on their index before concatenation (see :issue:`18657`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"So far, the method" -> "Previously"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
doc/source/whatsnew/v0.23.0.txt
Outdated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
So far, the method :meth:`Series.str.cat` did not -- in contrast to most of ``pandas`` -- align :class:`Series` on their index before concatenation (see :issue:`18657`). | ||
The method has now gained a keyword ``align`` which controls this behavior. If ``False``, the behavior will be as previously. If ``True`` and ``others`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"which controls this behavior" -> "to control alignment"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the second sentence and the next paragraph can be simplified.
The default behavior, not aligning, has not changed. If `align` is not specified, a ``FutureWarning``
is issued and the series are not aligned.
To silence the warning and not align, specify ``align=False``. To silence the warning and align
the Series before concatenating, specify ``align=True``.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understood correctly; removed part with na_rep
.
doc/source/whatsnew/v0.23.0.txt
Outdated
.. ipython:: python | ||
|
||
base = Series(['a', 'b', 'c', 'd', 'e']) | ||
s = base.reindex([0, 1, 2, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure these have the same indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@TomAugspurger @jreback Can one of you (tell me how to) remove all but the last of my commits from the appveyor-queue? It's quite far behind as it is, so no need to choke it with unnecessary commits (in Travis new commits automatically supersede old ones - why not in appveyor...?). |
Appveyor will auto cancel a build if there are newer commits on the PR
branch, but it doesn't show up as canceled until its turn in the queue.
…On Thu, Mar 15, 2018 at 10:47 AM, h-vetinari ***@***.***> wrote:
@TomAugspurger <https://github.com/tomaugspurger> @jreback
<https://github.com/jreback> Can one of you (tell me how to) remove all
but the last of my commits from the appveyor-queue? It's quite far behind
as it is, so no need to choke it with unnecessary commits (in Travis new
commits automatically supersede old ones - why not in appveyor...?).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20347 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHItmN8d6A8SplKgALB5ivkC4OwDPSks5teo0UgaJpZM4SqkE_>
.
|
@TomAugspurger , re:appveyor: this is not the case, at least not immediately. I can see in https://ci.appveyor.com/project/pandas-dev/pandas/history that some old commits were tried (while the new ones already existed) and ran for 1-2 minutes. I asked because I saw that some other commits were explicitly cancelled by users, but I don't know how to do that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a test where others
is a list of multiple Series ?
@jorisvandenbossche I wasn't even aware that's a legal signature? I'm guessing all Series would be concatenated with the same |
Yes, I think so
I suppose we should just process each element in the list separately, so then it does not really matter if it is a mixture. |
pandas/core/strings.py
Outdated
"'align=True'", FutureWarning, stacklevel=4) | ||
if align: | ||
arr, others = arr.align(others, join='outer') | ||
arrays = [list(arr), list(others)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the conversion to list here is needed (on line 60 they get converted to an array anyhow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was part of the _get_array_list
-function as I found it - I didn't investigate how it interplays with str_cat
(which is the only place it called from), so I left it as it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more comments. don't worry about the CI.
doc/source/text.rst
Outdated
Concatenating Series | ||
-------------------- | ||
|
||
The method :meth:`Series.str.cat` can be used to concatenate the records of two :class:`Series`. Depending on the value given to the ``align`` keyword, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you describe the simpler usecase first (IOW just concat with no other, or other is a simple list). These can just be examples, doesn't have to be so much text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my defense, so far there was no description about str.cat
in text.rst
. But I will try to write up an overview. Where should it be placed, in your opinion? I would say directly after the splitting-section (natural opposites).
doc/source/text.rst
Outdated
their indexes will be aligned before concatenation (if ``align=True``) or not (if ``align=False``). As usual, alignment will expand to the union of both | ||
indexes, while introducing ``NaN`` for missing values in the respective other series (which can be easily handled with the ``na_rep``-keyword). | ||
|
||
.. warning:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this you are already passing the align keyword. These docs should be written as if a user is seeing them w/o benefit of any past history. Just show what they should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the warning? The text was verbatim from @TomAugspurger, but I will rework this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not necessary here is my point.
doc/source/text.rst
Outdated
|
||
base = Series(['a', 'b', 'c', 'd', 'e']) | ||
s = base.reindex([0, 1, 2, 3]) | ||
t = base.reindex([3, 0, 4, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than showing a bunch of lines like this. Break this up into a conversation. E.g. show the construction of the Series (call it s), then do a cat with no other, then one with a list, finally with a Series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrote a long overview - conversation-style -- in text.rst
doc/source/whatsnew/v0.23.0.txt
Outdated
``Series.str.cat`` has gained the ``align`` kwarg | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Previously, :meth:`Series.str.cat` did not -- in contrast to most of ``pandas`` -- align :class:`Series` on their index before concatenation (see :issue:`18657`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this simpler. This should be just the first section and a previous and new section, see the other whatsnew entires for hints on structure. Add a reference to the docs in text.rst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what "This should be just the first section and a previous and new section" means exactly, but I tried to copy the style of other whatsnew entries. Reference added.
doc/source/text.rst
Outdated
@@ -429,6 +429,27 @@ String ``Index`` also supports ``get_dummies`` which returns a ``MultiIndex``. | |||
|
|||
See also :func:`~pandas.get_dummies`. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a reference tag here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what should be added. Reference to what? And where? - In the "concatenation" section I'm writing?
pandas/core/strings.py
Outdated
@@ -35,19 +35,32 @@ | |||
_shared_docs = dict() | |||
|
|||
|
|||
def _get_array_list(arr, others): | |||
def _get_array_list(arr, others, align=True): | |||
from pandas.core.series import Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a mini-doc string here (its an internal function so doesn't have to be full fledged, but Parameters / Returns)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current version, I'm not touching anything about _get_array_list
anymore. I agree that docstrings would be good, might add a draft.
pandas/core/strings.py
Outdated
def cat(self, others=None, sep=None, na_rep=None): | ||
def cat(self, others=None, sep=None, na_rep=None, align=None): | ||
from pandas.core.series import Series | ||
# FutureWarning for align=None emitted in one place only: str_cat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need any of these comments
pandas/core/strings.py
Outdated
if align is not None and align and isinstance(others, Series): | ||
# str_cat deals with arrays only; | ||
# make sure index is correct here as well for using _wrap_result | ||
self._orig, others = self._orig.align(others, join='outer') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed here? is this not handled in str_cat? (which dispatched to _get_array_list)?
pandas/tests/test_strings.py
Outdated
s = base.reindex([1, 3, 0, 2]) | ||
t = base.reindex([3, 0, 4, 1]) | ||
expect_rs_aligned = Series(['aa', 'bb', 'cc', 'dd']) | ||
expect_rs_unaligned = Series(['ab', 'bd', 'ca', 'dc']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put a blank line between cases. Add a comment to cases if needed. If you are writing things than once, pls parameterize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overlooked this, will change in next commit. You mean parametrisation of things like the following example?
def rt(**kwargs):
r.str.cat(t, **kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the tests, so that they are separated by lines, commented where necessary, and much easier to read.
Is there some pandas default how often a |
@jorisvandenbossche It was a substantial rewrite, but the code got better through it (found some bugs, and now its cleaner). |
@jreback Is |
@jorisvandenbossche @TomAugspurger @jreback I added functionality and tests for it. If you like how everything so far works, I'll update the docs (just wanna leave the doc-writing for when the code has converged). |
That's normal. FutureWarnings only show up the first time you use something (you can warnings.filterwarnings to have it show always) |
pandas/core/strings.py
Outdated
return self._wrap_result(result, use_codes=(not self._is_categorical)) | ||
if align and isinstance(others, Series): | ||
# str_cat deals with arrays only | ||
data, others = data.align(others, join='outer') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should actually align using a left join. The result of s.str.cat(others)
should always preserve the shape and index of s
IMO.
Current behaviour:
In [19]: s = pd.Series(['a', 'b'])
In [20]: s.str.cat(pd.Series(['a', 'b'], index=[1, 2]), align=True)
Out[20]:
0 NaN
1 ba
2 NaN
dtype: object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how I started out (see OP), but it would be inconsistent with how index-alignment is handled elsewhere -- and being consistent in that trumps shape preservation, IMHO. str.cat
is already special anyway, in that it is the only str-method that allows other Series as input.
And with na_rep
, the behavior gets very intuitive again, IMO.
In [0]: s = pd.Series(['a', 'b'])
In [1]: t = pd.Series(['a', 'b'], index=[1, 2])
In [2]: s.astype(bool) & t.astype(bool)
Out[2]:
0 False
1 True
2 False
dtype: bool
In [3]: s.str.cat(t, align=True)
Out[3]:
0 NaN
1 ba
2 NaN
dtype: object
In [4]: s.str.cat(t, align=True, na_rep='')
Out[4]:
0 a
1 ab
2 b
dtype: object
In [5]: s.str.cat(t, align=True, na_rep='x')
Out[5]:
0 ax
1 ab
2 xb
dtype: object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends a lot on the application which behavior is desired. How about exposing a join="inner"|"left"|"right"|"outer"
keyword, with default "left"
(or "outer"
...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A keyword with join='left'
as the default makes the most sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added join
-keyword with default 'left'
.
doc/source/text.rst
Outdated
their indexes will be aligned before concatenation (if ``align=True``) or not (if ``align=False``). As usual, alignment will expand to the union of both | ||
indexes, while introducing ``NaN`` for missing values in the respective other series (which can be easily handled with the ``na_rep``-keyword). | ||
|
||
.. warning:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not necessary here is my point.
pandas/core/strings.py
Outdated
# first achieve maximum extent of data | ||
for x in others: | ||
data, _ = data.align(x, join='outer') | ||
# then bring elements of others to same size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woa, why are you adding all of this code????
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the code additions from str_cat
and _get_array_list
because that was messy and for several other reasons not a good way to do it -- all the changes are now in str.cat
itself.
To be able to align in all the different cases (list of Series
, list of np.ndarrays
, mixture of both, plus all the other variants), it's necessary to add (some variant of) the code I added. I tried to keep it clean, non-redundant, documented, and nicely recursive. I don't believe the desired functionality can be added with substantially less code (have a look at the test cases). Currently working on the requested doc changes.
@jreback @TomAugspurger @jorisvandenbossche If you would be so kind, please compile and read Feedback welcome. |
The failure in the Travis-CI is an artefact -- the linter (only in the 2.7 run) complained about |
A comment why this is not so easy (and why I align all elements in a list before starting concatenation):
Even worse, this would mean that any other arrays following in the list would be of the wrong length and trigger a warning... |
Circle-CI failed due to #20906. Rebased onto upstream and restarted. |
Annoyingly, there are still unrelated failures in "ci/script_single.sh" of the travis "3.6, NumPy dev" job, but at least that job doesn't fail the build.
|
@TomAugspurger Green! =) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the checking code needs some work
others = others.copy() | ||
others.index = idx | ||
return ([others[x] for x in others], fu_wrn) | ||
elif isinstance(others, np.ndarray) and others.ndim == 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is wrong
i don’t think we can align a ndarray at all like this
let’s can ndarray a that are > 1 dim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DF-constructor works as expected for a 2-dim ndarray, but I haven't checked if this is tested behaviour. (essentially, df == DataFrame(df.values, columns=df.columns, index=df.index)
)
I would suggest not to can 2-dim ndarrays, because they are necessary to avoid alignment on the deprecation path for join
:
[...] To disable alignment (the behavior before v.0.23) and silence this warning, use
.values
on any Series/Index/DataFrame inothers
. [...]
return (los, fu_wrn) | ||
# test if there is a mix of list-like and non-list-like (e.g. str) | ||
elif (any(is_list_like(x) for x in others) | ||
and any(not is_list_like(x) for x in others)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make this simpler by just checking for all is not list like (eg strings)
anything else will fail thru to the TypeError
others = list(others) # ensure iterators do not get read twice etc | ||
if all(is_list_like(x) for x in others): | ||
los = [] | ||
fu_wrn = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u name this parameter just warn
fu_wrn = False | ||
while others: | ||
nxt = others.pop(0) # list-like as per check above | ||
# safety for iterators and other non-persistent list-likes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole section needs some work it’s way too hard to read and follow
is_legal = ((no_deep and nxt.dtype == object) | ||
or all((isinstance(x, compat.string_types) | ||
or (not is_list_like(x) and isnull(x)) | ||
or x is None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnull already checks for None
only 1d objects are valid here (or all scalars)
do this check up front
@h-vetinari, I'll merge this shortly. I'm opening a followup issue.
…On Wed, May 2, 2018 at 6:03 AM, Jeff Reback ***@***.***> wrote:
***@***.**** requested changes on this pull request.
the checking code needs some work
------------------------------
In pandas/core/strings.py
<#20347 (comment)>:
> + if ignore_index and fu_wrn else others]
+ return (los, fu_wrn)
+ elif isinstance(others, Index):
+ fu_wrn = not others.equals(idx)
+ los = [Series(others.values,
+ index=(idx if ignore_index else others))]
+ return (los, fu_wrn)
+ elif isinstance(others, DataFrame):
+ fu_wrn = not others.index.equals(idx)
+ if ignore_index and fu_wrn:
+ # without copy, this could change "others"
+ # that was passed to str.cat
+ others = others.copy()
+ others.index = idx
+ return ([others[x] for x in others], fu_wrn)
+ elif isinstance(others, np.ndarray) and others.ndim == 2:
this is wrong
i don’t think we can align a ndarray at all like this
let’s can ndarray a that are > 1 dim
------------------------------
In pandas/core/strings.py
<#20347 (comment)>:
> + or (not is_list_like(x) and isnull(x))
+ or x is None)
+ for x in nxt))
+ # DataFrame is false positive of is_legal
+ # because "x in df" returns column names
+ if not is_legal or isinstance(nxt, DataFrame):
+ raise TypeError(err_msg)
+
+ nxt, fwn = self._get_series_list(nxt,
+ ignore_index=ignore_index)
+ los = los + nxt
+ fu_wrn = fu_wrn or fwn
+ return (los, fu_wrn)
+ # test if there is a mix of list-like and non-list-like (e.g. str)
+ elif (any(is_list_like(x) for x in others)
+ and any(not is_list_like(x) for x in others)):
you can make this simpler by just checking for all is not list like (eg
strings)
anything else will fail thru to the TypeError
------------------------------
In pandas/core/strings.py
<#20347 (comment)>:
> + elif isinstance(others, DataFrame):
+ fu_wrn = not others.index.equals(idx)
+ if ignore_index and fu_wrn:
+ # without copy, this could change "others"
+ # that was passed to str.cat
+ others = others.copy()
+ others.index = idx
+ return ([others[x] for x in others], fu_wrn)
+ elif isinstance(others, np.ndarray) and others.ndim == 2:
+ others = DataFrame(others, index=idx)
+ return ([others[x] for x in others], False)
+ elif is_list_like(others):
+ others = list(others) # ensure iterators do not get read twice etc
+ if all(is_list_like(x) for x in others):
+ los = []
+ fu_wrn = False
can u name this parameter just warn
------------------------------
In pandas/core/strings.py
<#20347 (comment)>:
> + # without copy, this could change "others"
+ # that was passed to str.cat
+ others = others.copy()
+ others.index = idx
+ return ([others[x] for x in others], fu_wrn)
+ elif isinstance(others, np.ndarray) and others.ndim == 2:
+ others = DataFrame(others, index=idx)
+ return ([others[x] for x in others], False)
+ elif is_list_like(others):
+ others = list(others) # ensure iterators do not get read twice etc
+ if all(is_list_like(x) for x in others):
+ los = []
+ fu_wrn = False
+ while others:
+ nxt = others.pop(0) # list-like as per check above
+ # safety for iterators and other non-persistent list-likes
this whole section needs some work it’s way too hard to read and follow
------------------------------
In pandas/core/strings.py
<#20347 (comment)>:
> + # safety for iterators and other non-persistent list-likes
+ # do not map indexed/typed objects; would lose information
+ if not isinstance(nxt, (DataFrame, Series,
+ Index, np.ndarray)):
+ nxt = list(nxt)
+
+ # known types without deep inspection
+ no_deep = ((isinstance(nxt, np.ndarray) and nxt.ndim == 1)
+ or isinstance(nxt, (Series, Index)))
+ # Nested list-likes are forbidden - elements of nxt must be
+ # strings/NaN/None. Need to robustify NaN-check against
+ # x in nxt being list-like (otherwise ambiguous boolean)
+ is_legal = ((no_deep and nxt.dtype == object)
+ or all((isinstance(x, compat.string_types)
+ or (not is_list_like(x) and isnull(x))
+ or x is None)
isnull already checks for None
only 1d objects are valid here (or all scalars)
do this check up front
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20347 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIi5osRhzmq7QhxBxozIbONWCN5Yiks5tuZKYgaJpZM4SqkE_>
.
|
#20922 for the followup. Thanks! |
@TomAugspurger @jreback |
Fixes issue #18657, fixed existing tests, added new test; all pass.
After I pushed everything and thought about it some more, I realised that one may argue about the default alignment-behavior, and whether it should be changed to
join=outer
. The behavior as implemented is compatible with the current requirement that everything be of the same length. To me, it is more intuitive that the concatenatedother
is added to the current series without enlarging it, but I can also see the argument why that restriction is unnecessary.PS. This is my first PR, tried to follow all the rules. Sorry if I overlooked something.
Edit: Also fixes #20842