-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: change default behaviour of str.match from deprecated extract to match (GH5224) #15257
API: change default behaviour of str.match from deprecated extract to match (GH5224) #15257
Conversation
pandas/core/strings.py
Outdated
@@ -444,60 +442,33 @@ def str_match(arr, pat, case=True, flags=0, na=np.nan, as_indexer=False): | |||
flags : int, default 0 (no flags) | |||
re module flags, e.g. re.IGNORECASE | |||
na : default NaN, fill value for missing values. | |||
as_indexer : False, by default, gives deprecated behavior better achieved | |||
using str_extract. True return boolean indexer. | |||
as_indexer : ignored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would just take this out
alternatively accept kwargs and raise of the kw is passed ;to be helpful) ; but that is more complicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was not sure what to do with the keyword:
- remove -> but this will give a lot of errors, as up to now you had to specify the keyword to get the right behaviour (if you had groups in the regex)
- raise warning -> changed default to None, and when user specifies it, raise a warning to say it is ignored
- just ignore -> but then there is potential confusion on why it does nothing + people will keep specifying it although not needed
For now I choose the 'raise warning' option. The 'just ignore' would cause the less impact, but is also less informative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe I should give a more specific explanation when as_indexer=False
(the previous default behaviour) if people would have specified this explicitly, because in those cases there is actually a breaking change in behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has been deprecated for quite some time
i think taking it out is fine (and as i said i can still capture kwargs so you have a nice message), but it's not listed in tab completion (nor doc string) that way
Codecov Report
@@ Coverage Diff @@
## master #15257 +/- ##
==========================================
- Coverage 91.02% 90.99% -0.03%
==========================================
Files 143 143
Lines 49403 49396 -7
==========================================
- Hits 44967 44950 -17
- Misses 4436 4446 +10
Continue to review full report at Codecov.
|
821f128
to
a2bae51
Compare
Rebased this (and remove the "as_indexer : ignored" from the docstring). |
@@ -464,11 +464,9 @@ def rep(x, r): | |||
return result | |||
|
|||
|
|||
def str_match(arr, pat, case=True, flags=0, na=np.nan, as_indexer=False): | |||
def str_match(arr, pat, case=True, flags=0, na=np.nan, as_indexer=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't you take this arg out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this keyword needs to stay, because it was how people could specify the 'new' behaviour before (although we said we would change this in 0.14, we never did).
So all people still using match
are probably specifying this keyword, AFAIU.
See the removed warning from the documentation in the diff for some context.
In principle we could make it a FutureWarning instead of UserWarning, so we can remove it later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this should have been changed a long time ago. no reason to keep a dead API around.
and change to FutureWarning
. can remove in next major version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no reason to keep a dead API around.
To be clear, this is no dead API. Although it is ignored after this PR, everybody using this function uses that keyword.
So I certainly won't raise (FutureWarning is fine, probably even better as UserWarning anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well its going to be removed. So should for sure use FutureWarning
. UserWarning
is pretty useless as a warning IMHO. (not that FutureWarning
is much better but at least signals that we are going to remove it).
if as_indexer and regex.groups > 0: | ||
warnings.warn("This pattern has match groups. To actually get the" | ||
" groups, use str.extract.", UserWarning, stacklevel=3) | ||
if (as_indexer is False) and (regex.groups > 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why aren't you taking this out?
pandas/core/strings.py
Outdated
# Previously, this keyword was used for changing the default but | ||
# deprecated behaviour. This keyword is now no longer needed. | ||
warnings.warn("'as_indexer' keyword was specified but will be ignored;" | ||
" match now returns a boolean indexer by default.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should for sure be a FutureWarning. to be honest I would just raise. really no reason to continue supporting this. but if you want to make for 1 more cycle ok too.
thanks! |
… match (GH5224) This PR changes the default behaviour of `str.match` from extracting groups to just a match (True/False). The previous default behaviour was deprecated since 0.13.0 (pandas-dev#5224) Author: Joris Van den Bossche <jorisvandenbossche@gmail.com> Closes pandas-dev#15257 from jorisvandenbossche/str-match and squashes the following commits: 0ab36b6 [Joris Van den Bossche] Raise FutureWarning instead of UserWarning for as_indexer a2bae51 [Joris Van den Bossche] raise error in case of regex with groups and as_indexer=False 87446c3 [Joris Van den Bossche] fix test 0788de2 [Joris Van den Bossche] API: change default behaviour of str.match from deprecated extract to match (GH5224)
I just stumbled on this, and seems we didn't have this in our deprecations to do list (#6581).
This PR changes the default behaviour of
str.match
from extracting groups to just a match (True/False). The previous default behaviour was deprecated since 0.13.0 (#5224)