-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: change default behaviour of str.match from deprecated extract to match (GH5224) #15257
Changes from 3 commits
0788de2
87446c3
a2bae51
0ab36b6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -464,11 +464,9 @@ def rep(x, r): | |
return result | ||
|
||
|
||
def str_match(arr, pat, case=True, flags=0, na=np.nan, as_indexer=False): | ||
def str_match(arr, pat, case=True, flags=0, na=np.nan, as_indexer=None): | ||
""" | ||
Deprecated: Find groups in each string in the Series/Index | ||
using passed regular expression. | ||
If as_indexer=True, determine if each string matches a regular expression. | ||
Determine if each string matches a regular expression. | ||
|
||
Parameters | ||
---------- | ||
|
@@ -479,60 +477,35 @@ def str_match(arr, pat, case=True, flags=0, na=np.nan, as_indexer=False): | |
flags : int, default 0 (no flags) | ||
re module flags, e.g. re.IGNORECASE | ||
na : default NaN, fill value for missing values. | ||
as_indexer : False, by default, gives deprecated behavior better achieved | ||
using str_extract. True return boolean indexer. | ||
|
||
Returns | ||
------- | ||
Series/array of boolean values | ||
if as_indexer=True | ||
Series/Index of tuples | ||
if as_indexer=False, default but deprecated | ||
|
||
See Also | ||
-------- | ||
contains : analogous, but less strict, relying on re.search instead of | ||
re.match | ||
extract : now preferred to the deprecated usage of match (as_indexer=False) | ||
extract : extract matched groups | ||
|
||
Notes | ||
----- | ||
To extract matched groups, which is the deprecated behavior of match, use | ||
str.extract. | ||
""" | ||
|
||
if not case: | ||
flags |= re.IGNORECASE | ||
|
||
regex = re.compile(pat, flags=flags) | ||
|
||
if (not as_indexer) and regex.groups > 0: | ||
# Do this first, to make sure it happens even if the re.compile | ||
# raises below. | ||
warnings.warn("In future versions of pandas, match will change to" | ||
" always return a bool indexer.", FutureWarning, | ||
stacklevel=3) | ||
|
||
if as_indexer and regex.groups > 0: | ||
warnings.warn("This pattern has match groups. To actually get the" | ||
" groups, use str.extract.", UserWarning, stacklevel=3) | ||
if (as_indexer is False) and (regex.groups > 0): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why aren't you taking this out? |
||
raise ValueError("as_indexer=False with a pattern with groups is no " | ||
"longer supported. Use '.str.extract(pat)' instead") | ||
elif as_indexer is not None: | ||
# Previously, this keyword was used for changing the default but | ||
# deprecated behaviour. This keyword is now no longer needed. | ||
warnings.warn("'as_indexer' keyword was specified but will be ignored;" | ||
" match now returns a boolean indexer by default.", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should for sure be a FutureWarning. to be honest I would just raise. really no reason to continue supporting this. but if you want to make for 1 more cycle ok too. |
||
UserWarning, stacklevel=3) | ||
|
||
# If not as_indexer and regex.groups == 0, this returns empty lists | ||
# and is basically useless, so we will not warn. | ||
|
||
if (not as_indexer) and regex.groups > 0: | ||
dtype = object | ||
|
||
def f(x): | ||
m = regex.match(x) | ||
if m: | ||
return m.groups() | ||
else: | ||
return [] | ||
else: | ||
# This is the new behavior of str_match. | ||
dtype = bool | ||
f = lambda x: bool(regex.match(x)) | ||
dtype = bool | ||
f = lambda x: bool(regex.match(x)) | ||
|
||
return _na_map(f, arr, na, dtype=dtype) | ||
|
||
|
@@ -1587,7 +1560,7 @@ def contains(self, pat, case=True, flags=0, na=np.nan, regex=True): | |
return self._wrap_result(result) | ||
|
||
@copy(str_match) | ||
def match(self, pat, case=True, flags=0, na=np.nan, as_indexer=False): | ||
def match(self, pat, case=True, flags=0, na=np.nan, as_indexer=None): | ||
result = str_match(self._data, pat, case=case, flags=flags, na=na, | ||
as_indexer=as_indexer) | ||
return self._wrap_result(result) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't you take this arg out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this keyword needs to stay, because it was how people could specify the 'new' behaviour before (although we said we would change this in 0.14, we never did).
So all people still using
match
are probably specifying this keyword, AFAIU.See the removed warning from the documentation in the diff for some context.
In principle we could make it a FutureWarning instead of UserWarning, so we can remove it later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this should have been changed a long time ago. no reason to keep a dead API around.
and change to
FutureWarning
. can remove in next major version.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, this is no dead API. Although it is ignored after this PR, everybody using this function uses that keyword.
So I certainly won't raise (FutureWarning is fine, probably even better as UserWarning anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well its going to be removed. So should for sure use
FutureWarning
.UserWarning
is pretty useless as a warning IMHO. (not thatFutureWarning
is much better but at least signals that we are going to remove it).