-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update the Series.str.ismethods docstring #20913
DOC: update the Series.str.ismethods docstring #20913
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good job, added some comments with ideas, and about the standards followed.
pandas/core/strings.py
Outdated
@@ -2401,11 +2401,137 @@ def rindex(self, sub, start=0, end=None): | |||
|
|||
_shared_docs['ismethods'] = (""" | |||
Check whether all characters in each string in the Series/Index | |||
are %(type)s. Equivalent to :meth:`str.%(method)s`. | |||
are %(type)s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short summary should fit in the first line. I'd get rid of in the Series/Index
.
pandas/core/strings.py
Outdated
|
||
Returns | ||
------- | ||
is : Series/array of boolean values | ||
Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in most cases we use Series or Index of bool
as the return type. I'd keep the same for consistency.
pandas/core/strings.py
Outdated
**Checks for Whitespace** | ||
|
||
>>> # All characters represent whitespace | ||
>>> s4 = pd.Series([' ','\\t\\r\\n ', '']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd find a bit clearer to use r'\t\r\n'
than '\\t\\r\\n'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So using r'\t\r\n'
seems to interfere with the validation script causing an error, likely because the newline and whitespace characters are still rendered as the docstring is parsed. I can correct for this if I make the whole docstring a raw string, but that seems a bit drastic, and not sure if it would have any other consequences. Would sticking with \\t\\r\\n
be preferable over this?
pandas/core/strings.py
Outdated
|
||
>>> s1 = pd.Series(['AB', 'C12', '42', '']) | ||
|
||
>>> # All are alphabetic characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find these comments before each function unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can strip them out where it's more self-explanatory, though I'm thinking to keep the ones in the section on More Detailed Checks for Numeric Characters as I found this the most confusing part. i.e. why there are so many checks for numeric values and what the difference between them is. Or I could put these as text instead of comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, the ones that you think the example is not clear by itself I think it's better to have the explanations as text and not as code comments.
pandas/core/strings.py
Outdated
-------- | ||
**Checks for Alphabetic and Numeric Characters** | ||
|
||
>>> s1 = pd.Series(['AB', 'C12', '42', '']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't done it anywhere yet, but I think it could make sense to add an index with the same values. It'd make very easy to follow the examples.
As a personal opinion, I find a bit distracting using too arbitrary examples. I'd prefer a real-world example, but as seems difficult for this case, I'd prefer something like ['one', 'one1', '1', '']
, which makes it obvious what is being shown, and don't let the users guessing why 42
and not 43
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking I understand, do you mean having the series values and index be the same so the values are displayed side by side like below?
>>> s1 = pd.Series(data=['one', 'one1', '1', ''],
index=['one', 'one1', '1', ''])
>>> s1.str.isalpha()
one True
one1 False
1 False
False
dtype: bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just an idea, but yes, that's what I meant. We haven't done it in any docstring yet, afaik. But I think it makes very easy to see which value is true for each method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make comparisons easier, though I find having a blank in the last index position spoils it a bit. Could explicitly label it with 'empty string' as below?
>>> s1.str.isalpha()
one True
one1 False
1 False
empty string False
dtype: bool
- Made docstring header fit on one line - Expanded return value dtype - Switched explanations in comments to text - Simplified examples - Fixed typos
Codecov Report
@@ Coverage Diff @@
## master #20913 +/- ##
==========================================
+ Coverage 91.79% 91.82% +0.02%
==========================================
Files 153 153
Lines 49411 49490 +79
==========================================
+ Hits 45359 45443 +84
+ Misses 4052 4047 -5
Continue to review full report at Codecov.
|
Thanks for the great docstring @MrKriss, and sorry for the delay in merging it. |
added/ passed (using scripts/validate_docstrings.py)git diff upstream/master -u -- "*.py" | flake8 --diff
A result of the Pandas Sprint at PyData London 2018.