-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: change default behaviour of str.match from deprecated extract to match (GH5224) #15257
Closed
jorisvandenbossche
wants to merge
4
commits into
pandas-dev:master
from
jorisvandenbossche:str-match
Closed
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
0788de2
API: change default behaviour of str.match from deprecated extract to…
jorisvandenbossche 87446c3
fix test
jorisvandenbossche a2bae51
raise error in case of regex with groups and as_indexer=False
jorisvandenbossche 0ab36b6
Raise FutureWarning instead of UserWarning for as_indexer
jorisvandenbossche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -464,11 +464,9 @@ def rep(x, r): | |
return result | ||
|
||
|
||
def str_match(arr, pat, case=True, flags=0, na=np.nan, as_indexer=False): | ||
def str_match(arr, pat, case=True, flags=0, na=np.nan, as_indexer=None): | ||
""" | ||
Deprecated: Find groups in each string in the Series/Index | ||
using passed regular expression. | ||
If as_indexer=True, determine if each string matches a regular expression. | ||
Determine if each string matches a regular expression. | ||
|
||
Parameters | ||
---------- | ||
|
@@ -479,60 +477,37 @@ def str_match(arr, pat, case=True, flags=0, na=np.nan, as_indexer=False): | |
flags : int, default 0 (no flags) | ||
re module flags, e.g. re.IGNORECASE | ||
na : default NaN, fill value for missing values. | ||
as_indexer : False, by default, gives deprecated behavior better achieved | ||
using str_extract. True return boolean indexer. | ||
as_indexer : ignored and deprecated | ||
|
||
Returns | ||
------- | ||
Series/array of boolean values | ||
if as_indexer=True | ||
Series/Index of tuples | ||
if as_indexer=False, default but deprecated | ||
|
||
See Also | ||
-------- | ||
contains : analogous, but less strict, relying on re.search instead of | ||
re.match | ||
extract : now preferred to the deprecated usage of match (as_indexer=False) | ||
extract : extract matched groups | ||
|
||
Notes | ||
----- | ||
To extract matched groups, which is the deprecated behavior of match, use | ||
str.extract. | ||
""" | ||
|
||
if not case: | ||
flags |= re.IGNORECASE | ||
|
||
regex = re.compile(pat, flags=flags) | ||
|
||
if (not as_indexer) and regex.groups > 0: | ||
# Do this first, to make sure it happens even if the re.compile | ||
# raises below. | ||
warnings.warn("In future versions of pandas, match will change to" | ||
" always return a bool indexer.", FutureWarning, | ||
stacklevel=3) | ||
|
||
if as_indexer and regex.groups > 0: | ||
warnings.warn("This pattern has match groups. To actually get the" | ||
" groups, use str.extract.", UserWarning, stacklevel=3) | ||
if (as_indexer is False) and (regex.groups > 0): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why aren't you taking this out? |
||
raise ValueError("as_indexer=False with a pattern with groups is no " | ||
"longer supported. Use '.str.extract(pat)' instead") | ||
elif as_indexer is not None: | ||
# Previously, this keyword was used for changing the default but | ||
# deprecated behaviour. This keyword is now no longer needed. | ||
warnings.warn("'as_indexer' keyword was specified but is ignored " | ||
"(match now returns a boolean indexer by default), " | ||
"and will be removed in a future version.", | ||
FutureWarning, stacklevel=3) | ||
|
||
# If not as_indexer and regex.groups == 0, this returns empty lists | ||
# and is basically useless, so we will not warn. | ||
|
||
if (not as_indexer) and regex.groups > 0: | ||
dtype = object | ||
|
||
def f(x): | ||
m = regex.match(x) | ||
if m: | ||
return m.groups() | ||
else: | ||
return [] | ||
else: | ||
# This is the new behavior of str_match. | ||
dtype = bool | ||
f = lambda x: bool(regex.match(x)) | ||
dtype = bool | ||
f = lambda x: bool(regex.match(x)) | ||
|
||
return _na_map(f, arr, na, dtype=dtype) | ||
|
||
|
@@ -1587,7 +1562,7 @@ def contains(self, pat, case=True, flags=0, na=np.nan, regex=True): | |
return self._wrap_result(result) | ||
|
||
@copy(str_match) | ||
def match(self, pat, case=True, flags=0, na=np.nan, as_indexer=False): | ||
def match(self, pat, case=True, flags=0, na=np.nan, as_indexer=None): | ||
result = str_match(self._data, pat, case=case, flags=flags, na=na, | ||
as_indexer=as_indexer) | ||
return self._wrap_result(result) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't you take this arg out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this keyword needs to stay, because it was how people could specify the 'new' behaviour before (although we said we would change this in 0.14, we never did).
So all people still using
match
are probably specifying this keyword, AFAIU.See the removed warning from the documentation in the diff for some context.
In principle we could make it a FutureWarning instead of UserWarning, so we can remove it later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this should have been changed a long time ago. no reason to keep a dead API around.
and change to
FutureWarning
. can remove in next major version.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, this is no dead API. Although it is ignored after this PR, everybody using this function uses that keyword.
So I certainly won't raise (FutureWarning is fine, probably even better as UserWarning anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well its going to be removed. So should for sure use
FutureWarning
.UserWarning
is pretty useless as a warning IMHO. (not thatFutureWarning
is much better but at least signals that we are going to remove it).