DOC: update the Series.str.ismethods docstring #20913

MrKriss · 2018-05-01T22:34:50Z

closes #xxxx
tests ~~added~~ / passed (using scripts/validate_docstrings.py)
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

A result of the Pandas Sprint at PyData London 2018.

datapythonista

Really good job, added some comments with ideas, and about the standards followed.

datapythonista · 2018-05-02T10:54:37Z

pandas/core/strings.py

@@ -2401,11 +2401,137 @@ def rindex(self, sub, start=0, end=None):

    _shared_docs['ismethods'] = ("""
    Check whether all characters in each string in the Series/Index
-    are %(type)s. Equivalent to :meth:`str.%(method)s`.
+    are %(type)s.


The short summary should fit in the first line. I'd get rid of in the Series/Index.

datapythonista · 2018-05-02T10:59:05Z

pandas/core/strings.py


    Returns
    -------
-    is : Series/array of boolean values
+    Series


I think in most cases we use Series or Index of bool as the return type. I'd keep the same for consistency.

datapythonista · 2018-05-02T11:02:06Z

pandas/core/strings.py

+    **Checks for Whitespace**
+
+    >>> # All characters represent whitespace
+    >>> s4 = pd.Series([' ','\\t\\r\\n ', ''])


I'd find a bit clearer to use r'\t\r\n' than '\\t\\r\\n'

So using r'\t\r\n' seems to interfere with the validation script causing an error, likely because the newline and whitespace characters are still rendered as the docstring is parsed. I can correct for this if I make the whole docstring a raw string, but that seems a bit drastic, and not sure if it would have any other consequences. Would sticking with \\t\\r\\n be preferable over this?

datapythonista · 2018-05-02T11:03:26Z

pandas/core/strings.py

+
+    >>> s1 = pd.Series(['AB', 'C12', '42', ''])
+
+    >>> # All are alphabetic characters


I find these comments before each function unnecessary.

Sure, I can strip them out where it's more self-explanatory, though I'm thinking to keep the ones in the section on More Detailed Checks for Numeric Characters as I found this the most confusing part. i.e. why there are so many checks for numeric values and what the difference between them is. Or I could put these as text instead of comments.

Sounds good to me, the ones that you think the example is not clear by itself I think it's better to have the explanations as text and not as code comments.

datapythonista · 2018-05-02T11:06:53Z

pandas/core/strings.py

+    --------
+    **Checks for Alphabetic and Numeric Characters**
+
+    >>> s1 = pd.Series(['AB', 'C12', '42', ''])


We haven't done it anywhere yet, but I think it could make sense to add an index with the same values. It'd make very easy to follow the examples.

As a personal opinion, I find a bit distracting using too arbitrary examples. I'd prefer a real-world example, but as seems difficult for this case, I'd prefer something like ['one', 'one1', '1', ''], which makes it obvious what is being shown, and don't let the users guessing why 42 and not 43.

Just checking I understand, do you mean having the series values and index be the same so the values are displayed side by side like below?

>>> s1 = pd.Series(data=['one', 'one1', '1', ''], index=['one', 'one1', '1', '']) >>> s1.str.isalpha() one True one1 False 1 False False dtype: bool

It's just an idea, but yes, that's what I meant. We haven't done it in any docstring yet, afaik. But I think it makes very easy to see which value is true for each method.

It does make comparisons easier, though I find having a blank in the last index position spoils it a bit. Could explicitly label it with 'empty string' as below?

>>> s1.str.isalpha() one True one1 False 1 False empty string False dtype: bool

- Made docstring header fit on one line - Expanded return value dtype - Switched explanations in comments to text - Simplified examples - Fixed typos

codecov · 2018-05-08T19:27:57Z

Codecov Report

Merging #20913 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #20913      +/-   ##
==========================================
+ Coverage   91.79%   91.82%   +0.02%     
==========================================
  Files         153      153              
  Lines       49411    49490      +79     
==========================================
+ Hits        45359    45443      +84     
+ Misses       4052     4047       -5

Flag	Coverage Δ
#multiple	`90.22% <100%> (+0.02%)`	⬆️
#single	`41.85% <56.89%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/strings.py	`98.62% <100%> (+0.28%)`	⬆️
pandas/core/dtypes/missing.py	`91.95% <0%> (-0.99%)`	⬇️
pandas/core/series.py	`94.02% <0%> (-0.01%)`	⬇️
pandas/core/indexing.py	`93.55% <0%> (ø)`	⬆️
pandas/core/reshape/reshape.py	`100% <0%> (ø)`	⬆️
pandas/core/reshape/merge.py	`94.25% <0%> (ø)`	⬆️
pandas/core/indexes/interval.py	`93.08% <0%> (ø)`	⬆️
pandas/core/algorithms.py	`94.5% <0%> (+0.01%)`	⬆️
pandas/core/indexes/datetimes.py	`95.76% <0%> (+0.02%)`	⬆️
pandas/core/groupby/groupby.py	`92.66% <0%> (+0.03%)`	⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4da79b...7f066c9. Read the comment docs.

datapythonista · 2018-07-22T00:23:25Z

Thanks for the great docstring @MrKriss, and sorry for the delay in merging it.

Enhance docstrings for Series.str.ismethods

916d183

datapythonista reviewed May 2, 2018

View reviewed changes

jreback added Docs Strings String extension data type and string data labels May 4, 2018

cmusselle added 2 commits May 8, 2018 20:21

Specify file encoding to fix python 2.7 tests/checks

e03a463

Ammend docstring following feedback

7f066c9

- Made docstring header fit on one line - Expanded return value dtype - Switched explanations in comments to text - Simplified examples - Fixed typos

datapythonista merged commit 37c7458 into pandas-dev:master Jul 22, 2018

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DOC: update the Series.str.ismethods docstring (pandas-dev#20913)

55ffe0f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: update the Series.str.ismethods docstring #20913

DOC: update the Series.str.ismethods docstring #20913

MrKriss commented May 1, 2018

datapythonista left a comment

datapythonista May 2, 2018

datapythonista May 2, 2018

datapythonista May 2, 2018

MrKriss May 8, 2018 •

edited

Loading

datapythonista May 2, 2018

MrKriss May 2, 2018

datapythonista May 2, 2018

datapythonista May 2, 2018

MrKriss May 2, 2018

datapythonista May 2, 2018

MrKriss May 8, 2018

codecov bot commented May 8, 2018 •

edited

Loading

datapythonista commented Jul 22, 2018


		>>> s1 = pd.Series(['AB', 'C12', '42', ''])

		>>> # All are alphabetic characters

DOC: update the Series.str.ismethods docstring #20913

DOC: update the Series.str.ismethods docstring #20913

Conversation

MrKriss commented May 1, 2018

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrKriss May 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 8, 2018 • edited Loading

Codecov Report

datapythonista commented Jul 22, 2018

MrKriss May 8, 2018 •

edited

Loading

codecov bot commented May 8, 2018 •

edited

Loading