Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update the pandas.DataFrame.any docstring #20217

Merged
merged 13 commits into from Mar 12, 2018
Merged

DOC: update the pandas.DataFrame.any docstring #20217

merged 13 commits into from Mar 12, 2018

Conversation

ghost
Copy link

@ghost ghost commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • [+] PR title is "DOC: update the docstring"
  • [+] The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • [+] The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • [+] The html version looks good: python doc/make.py --single <your-function-or-method>
  • [+] It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
####################### Docstring (pandas.DataFrame.any) #######################
################################################################################

Return whether any element is True over requested axis.

Unlike :meth:`DataFrame.all`, this performs an *or* operation. If any of the
values along the specified axis is True, this will return True.

Parameters
----------
axis : int, default 0
    Select the axis which can be 0 for indices and 1 for columns.
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA.
level : int or level name, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a Series.
bool_only : boolean, default None
    Include only boolean columns. If None, will attempt to use everything,
    then use only boolean data. Not implemented for Series.
**kwargs : any, default None
    Additional keywords have no affect but might be accepted for
    compatibility with numpy.

Returns
-------
any : Series or DataFrame (if level specified)

See Also
--------
pandas.DataFrame.all : Return whether all elements are True.

Examples
--------
**Series**

For Series input, the output is a scalar indicating whether any element
is True.

>>> pd.Series([True, False]).any()
True

**DataFrame**

Whether each column contains at least one True element (the default).

>>> pd.DataFrame({
...     "A": [1, 2, 3],
...     "B": [4, 5, 6]
... }).any()
A    True
B    True
dtype: bool

Aggregating over the columns.

>>> pd.DataFrame({
...     "A": [True, False, True],
...     "B": [4, 5, 6]
... }).any(axis='columns')
0    True
1    True
2    True
dtype: bool

>>> pd.DataFrame({
...     "A": [True, False, True],
...     "B": [4, 0, 6]
... }).any(axis='columns')
0    True
1    False
2    True
dtype: bool

`any` for an empty DataFrame is an empty Series.

>>> pd.DataFrame([]).any()
Series([], dtype: bool)

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Use only one blank line to separate sections or paragraphs
	Errors in parameters section
		Parameters {'kwargs'} not documented
		Unknown parameters {'**kwargs'}

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm looks like some extra files were committed. I thikn we have another PR adding savefig to our gitignore.

Can you remove those files rm -rf doc/source/savefig and then update your PR. I thikn with git rm doc/source/savefig.

Reviewers: we should wait for at least one CI to finish since this is changing parameters passed through the functions making the docstrings.

_any_also = """\
See Also
--------
pandas.DataFrame.all : Return whether all elements are True \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the trailing \ here do you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be clear, you do need the one on the first line, jsut not these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and the same for the ones below as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger if I don't put \, line becomes longer than 79 characters and it isn't passing git diff origin/master -u -- "*.py" | flake8 --diff validation...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want them, else we get long lines in the text docstring liek

One dimensional boolean pandas.Series is returned. Unlike pandas.DataFrame.all, pandas.DataFrame.any performs OR operation; in other word, if any of the values along the specified axis is True, pandas.DataFrame.any will return True.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger & @jorisvandenbossche , let's say I removed \ and , then the result of git diff origin/master -u -- "*.py" | flake8 --diff is going to be pandas/core/generic.py:7834:80: E501 line too long (83 > 79 characters), since the line is pandas.DataFrame.all : Return whether all elements are True over requested axis. - I don't want to break the line with \n, instead, I'm using \. There is exactly same reason behind the cases I used \.

_any_examples = """\
Examples
--------
By default, any from an empty DataFrame is empty Series::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No double colon, just a .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger according to this documentation, double colon is required to show code samples.

--------
By default, any from an empty DataFrame is empty Series::

>> pd.DataFrame([]).any()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three >. Doesn't need to be indented.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger , you mean, without double-colon and three >, is it going to show code samples as required?


Non-boolean values will always give True::

>> pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}).any()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.


It is performing OR along the specified axis::

>> pd.DataFrame({"A": [1, False, 3], "B": [4, 5, 6]}).any(axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

2 True
dtype: bool

>> pd.DataFrame({"A": [1, False, 3], "B": [4, False, 6]}).any(axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add an example with Series ? (the docstring is shared for both Series and DataFrame)

_any_also = """\
See Also
--------
pandas.DataFrame.all : Return whether all elements are True \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and the same for the ones below as well

_any_desc = """\
Return whether any element is True over requested axis.

One dimensional pandas.Series having boolean values will be returned. \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.Series -> Series

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also say "boolean Series" instead of "Series having boolean values"

One dimensional pandas.Series having boolean values will be returned. \
Unlike pandas.DataFrame.all, pandas.DataFrame.any performs OR operation; \
in other word, if any of the values along the specified axis is True, \
pandas.DataFrame.any will return True."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also mention here something that for Series the return value is a single boolean value?

@jreback jreback added Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 10, 2018
@jreback
Copy link
Contributor

jreback commented Mar 11, 2018

this needs a rebase now

@TomAugspurger
Copy link
Contributor

@smusali I'm doing the rebase / merge. 1 minute.

@TomAugspurger
Copy link
Contributor

@smusali fixed the merge conflict. Also made an update to the grammar and examples.

I changed the examples to have consistent types for the columns. In general, having a mix like [4, False, 6] is less common than having all bools or all ints like[4, 0, 6]

@ghost
Copy link
Author

ghost commented Mar 11, 2018

Done some requested changes and made some fixes - please, review, @TomAugspurger, @jreback and @jorisvandenbossche; thanks in advance!

Split Series and DataFrame

Edgecase last in examples.

Use axis='columns'

Simplify extended description.
@TomAugspurger
Copy link
Contributor

################################################################################
####################### Docstring (pandas.DataFrame.any) #######################
################################################################################

Return whether any element is True over requested axis.

Unlike :meth:`DataFrame.all`, this performs an *or* operation. If any of the
values along the specified axis is True, this will return True.

Parameters
----------
axis : int, default 0
    Select the axis which can be 0 for indices and 1 for columns.
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA.
level : int or level name, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a Series.
bool_only : boolean, default None
    Include only boolean columns. If None, will attempt to use everything,
    then use only boolean data. Not implemented for Series.
**kwargs : any, default None
    Additional keywords have no affect but might be accepted for
    compatibility with numpy.

Returns
-------
any : Series or DataFrame (if level specified)

See Also
--------
pandas.DataFrame.all : Return whether all elements are True.

Examples
--------
**Series**

For Series input, the output is a scalar indicating whether any element
is True.

>>> pd.Series([True, False]).any()
True

**DataFrame**

Whether each column contains at least one True element (the default).

>>> pd.DataFrame({
...     "A": [1, 2, 3],
...     "B": [4, 5, 6]
... }).any()
A    True
B    True
dtype: bool

Aggregating over the columns.

>>> pd.DataFrame({
...     "A": [True, False, True],
...     "B": [4, 5, 6]
... }).any(axis='columns')
0    True
1    True
2    True
dtype: bool

>>> pd.DataFrame({
...     "A": [True, False, True],
...     "B": [4, 0, 6]
... }).any(axis='columns')
0    True
1    False
2    True
dtype: bool

`any` for an empty DataFrame is an empty Series.

>>> pd.DataFrame([]).any()
Series([], dtype: bool)

################################################################################
################################## Validation ##################################
################################################################################

@ghost
Copy link
Author

ghost commented Mar 11, 2018

@TomAugspurger , do u have any more change request?

@TomAugspurger
Copy link
Contributor

Just updated the examples a tad to show the dataframes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants