Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update the Series.str.ismethods docstring #20913

Merged

Conversation

MrKriss
Copy link
Contributor

@MrKriss MrKriss commented May 1, 2018

  • closes #xxxx
  • tests added / passed (using scripts/validate_docstrings.py)
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

A result of the Pandas Sprint at PyData London 2018.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good job, added some comments with ideas, and about the standards followed.

@@ -2401,11 +2401,137 @@ def rindex(self, sub, start=0, end=None):

_shared_docs['ismethods'] = ("""
Check whether all characters in each string in the Series/Index
are %(type)s. Equivalent to :meth:`str.%(method)s`.
are %(type)s.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short summary should fit in the first line. I'd get rid of in the Series/Index.


Returns
-------
is : Series/array of boolean values
Series
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in most cases we use Series or Index of bool as the return type. I'd keep the same for consistency.

**Checks for Whitespace**

>>> # All characters represent whitespace
>>> s4 = pd.Series([' ','\\t\\r\\n ', ''])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd find a bit clearer to use r'\t\r\n' than '\\t\\r\\n'

Copy link
Contributor Author

@MrKriss MrKriss May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So using r'\t\r\n' seems to interfere with the validation script causing an error, likely because the newline and whitespace characters are still rendered as the docstring is parsed. I can correct for this if I make the whole docstring a raw string, but that seems a bit drastic, and not sure if it would have any other consequences. Would sticking with \\t\\r\\n be preferable over this?


>>> s1 = pd.Series(['AB', 'C12', '42', ''])

>>> # All are alphabetic characters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find these comments before each function unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can strip them out where it's more self-explanatory, though I'm thinking to keep the ones in the section on More Detailed Checks for Numeric Characters as I found this the most confusing part. i.e. why there are so many checks for numeric values and what the difference between them is. Or I could put these as text instead of comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, the ones that you think the example is not clear by itself I think it's better to have the explanations as text and not as code comments.

--------
**Checks for Alphabetic and Numeric Characters**

>>> s1 = pd.Series(['AB', 'C12', '42', ''])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't done it anywhere yet, but I think it could make sense to add an index with the same values. It'd make very easy to follow the examples.

As a personal opinion, I find a bit distracting using too arbitrary examples. I'd prefer a real-world example, but as seems difficult for this case, I'd prefer something like ['one', 'one1', '1', ''], which makes it obvious what is being shown, and don't let the users guessing why 42 and not 43.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking I understand, do you mean having the series values and index be the same so the values are displayed side by side like below?

>>> s1 = pd.Series(data=['one', 'one1', '1', ''], 
                   index=['one', 'one1', '1', ''])

>>> s1.str.isalpha()
one      True
one1    False
1       False
        False
dtype: bool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just an idea, but yes, that's what I meant. We haven't done it in any docstring yet, afaik. But I think it makes very easy to see which value is true for each method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make comparisons easier, though I find having a blank in the last index position spoils it a bit. Could explicitly label it with 'empty string' as below?

>>> s1.str.isalpha()
one              True
one1            False
1               False
empty string    False
dtype: bool

@jreback jreback added Docs Strings String extension data type and string data labels May 4, 2018
cmusselle added 2 commits May 8, 2018 20:21
- Made docstring header fit on one line
- Expanded return value dtype
- Switched explanations in comments to text
- Simplified examples
- Fixed typos
@codecov
Copy link

codecov bot commented May 8, 2018

Codecov Report

Merging #20913 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20913      +/-   ##
==========================================
+ Coverage   91.79%   91.82%   +0.02%     
==========================================
  Files         153      153              
  Lines       49411    49490      +79     
==========================================
+ Hits        45359    45443      +84     
+ Misses       4052     4047       -5
Flag Coverage Δ
#multiple 90.22% <100%> (+0.02%) ⬆️
#single 41.85% <56.89%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/strings.py 98.62% <100%> (+0.28%) ⬆️
pandas/core/dtypes/missing.py 91.95% <0%> (-0.99%) ⬇️
pandas/core/series.py 94.02% <0%> (-0.01%) ⬇️
pandas/core/indexing.py 93.55% <0%> (ø) ⬆️
pandas/core/reshape/reshape.py 100% <0%> (ø) ⬆️
pandas/core/reshape/merge.py 94.25% <0%> (ø) ⬆️
pandas/core/indexes/interval.py 93.08% <0%> (ø) ⬆️
pandas/core/algorithms.py 94.5% <0%> (+0.01%) ⬆️
pandas/core/indexes/datetimes.py 95.76% <0%> (+0.02%) ⬆️
pandas/core/groupby/groupby.py 92.66% <0%> (+0.03%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4da79b...7f066c9. Read the comment docs.

@datapythonista datapythonista merged commit 37c7458 into pandas-dev:master Jul 22, 2018
@datapythonista
Copy link
Member

Thanks for the great docstring @MrKriss, and sorry for the delay in merging it.

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants