Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Formatting in Series.str.extractall #22565

Merged
merged 3 commits into from
Sep 18, 2018

Conversation

lucadonini96
Copy link
Contributor

@lucadonini96 lucadonini96 commented Sep 1, 2018

In Series.str.extractall, corrected the formatting in the return value and added a period at the end of the parameter descriptions. Can also clarify descriptions if useful.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
################################################################################
################### Docstring (pandas.Series.str.extractall) ###################
################################################################################

For each subject string in the Series, extract groups from all
matches of regular expression pat. When each subject string in the
Series has exactly one match, extractall(pat).xs(0, level='match')
is the same as extract(pat).

.. versionadded:: 0.18.0

Parameters
----------
pat : str
    Regular expression pattern with capturing groups.
flags : int, default 0 (no flags)
    re module flags, e.g. re.IGNORECASE.

Returns
-------
DataFrame
    A DataFrame with one row for each match, and one column for each
    group. Its rows have a MultiIndex with first levels that come from
    the subject Series. The last level is named 'match' and indexes the
    matches in each item of the Series. Any capture group names in regular
    expression pat will be used for column names; otherwise capture
    group numbers will be used.

See Also
--------
extract : returns first match only (not all matches)

Examples
--------
A pattern with one group will return a DataFrame with one column.
Indices with no matches will not appear in the result.

>>> s = pd.Series(["a1a2", "b1", "c1"], index=["A", "B", "C"])
>>> s.str.extractall(r"[ab](\d)")
         0
  match
A 0      1
  1      2
B 0      1

Capture group names are used for column names of the result.

>>> s.str.extractall(r"[ab](?P<digit>\d)")
        digit
  match
A 0         1
  1         2
B 0         1

A pattern with two groups will return a DataFrame with two columns.

>>> s.str.extractall(r"(?P<letter>[ab])(?P<digit>\d)")
        letter digit
  match
A 0          a     1
  1          a     2
B 0          b     1

Optional groups that do not match are NaN in the result.

>>> s.str.extractall(r"(?P<letter>[ab])?(?P<digit>\d)")
        letter digit
  match
A 0          a     1
  1          a     2
B 0          b     1
C 0        NaN     1

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Closing quotes should be placed in the line after the last text in the docstring (do not close the quotes in the same line as the text, or leave a blank line between the last text and the quotes)
	Use only one blank line to separate sections or paragraphs
	Errors in parameters section
		Parameter "flags" description should start with a capital letter

…e end of the parameter descriptions. Can also clarify descriptions if useful.
@datapythonista datapythonista added Docs Strings String extension data type and string data labels Sep 1, 2018
@datapythonista
Copy link
Member

Can you check comments in #22562 and see if they make sense for the changes here too?

@pep8speaks
Copy link

pep8speaks commented Sep 1, 2018

Hello @lucadonini96! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 01, 2018 at 14:15 Hours UTC

@codecov
Copy link

codecov bot commented Sep 1, 2018

Codecov Report

Merging #22565 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #22565   +/-   ##
=======================================
  Coverage   92.04%   92.04%           
=======================================
  Files         169      169           
  Lines       50787    50787           
=======================================
  Hits        46745    46745           
  Misses       4042     4042
Flag Coverage Δ
#multiple 90.45% <ø> (ø) ⬆️
#single 42.29% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/strings.py 98.63% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 98fb53c...c964407. Read the comment docs.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good changes, added just couple of minor things

@@ -1000,7 +1004,6 @@ def str_extractall(arr, pat, flags=0):
1 a 2
B 0 b 1
C 0 NaN 1

"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this blank line at the end before the quotes.

A ``DataFrame`` with one row for each match, and one column for each
group. Its rows have a ``MultiIndex`` with first levels that come from
the subject ``Series``. The last level is named 'match' and indexes the
matches in each item of the ``Series``. Any capture group names in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in DataFrame... is better to use backticks, as they are more "links to other pages" than "code" in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant single backticks in the previous comment, instead of double backticks.

@jreback
Copy link
Contributor

jreback commented Sep 18, 2018

@datapythonista

@datapythonista datapythonista merged commit c994e80 into pandas-dev:master Sep 18, 2018
aeltanawy pushed a commit to aeltanawy/pandas that referenced this pull request Sep 20, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants