Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC iteritems docstring update and examples #22658

Merged
merged 12 commits into from
Sep 27, 2018

Conversation

Ecboxer
Copy link
Contributor

@Ecboxer Ecboxer commented Sep 10, 2018

Updated iteritems docstring to start with an infinitive and added a short example

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Updated iteritems docstring to start with an infinitive and added a short example
@pep8speaks
Copy link

pep8speaks commented Sep 10, 2018

Hello @Ecboxer! Thanks for updating the PR.

Line 784:80: E501 line too long (83 > 79 characters)
Line 796:80: E501 line too long (82 > 79 characters)
Line 797:80: E501 line too long (88 > 79 characters)

Comment last updated on September 25, 2018 at 14:12 Hours UTC

--------

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
index=['a', 'b'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuation lines need to start with ... (right below the >>>)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, make sure the i in index is right below the opening { on the line above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I will make the changes.

@TomAugspurger
Copy link
Contributor

You can run the validation script locally FYI. python scripts/validate_docstring.py pandas.DataFrame.iteritems I think.

@@ -726,13 +726,32 @@ def style(self):

def iteritems(self):
"""
Iterator over (column name, Series) pairs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this returns an Iterator (or more specifically a Generator). So I would leave this as Iterator or change this to Generator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to make the change to 'Iterate over DataFrame ... as (..., Series) pairs to stay within the style of the iterrows and itertuples functions, but I can revert back to 'Iterator over (column name, Series) pairs.

@jschendel jschendel added the Docs label Sep 11, 2018
@codecov
Copy link

codecov bot commented Sep 12, 2018

Codecov Report

Merging #22658 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #22658   +/-   ##
=======================================
  Coverage   92.18%   92.18%           
=======================================
  Files         169      169           
  Lines       50820    50820           
=======================================
  Hits        46850    46850           
  Misses       3970     3970
Flag Coverage Δ
#multiple 90.6% <ø> (ø) ⬆️
#single 42.38% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 97.2% <ø> (ø) ⬆️
pandas/core/generic.py 96.67% <0%> (ø) ⬆️
pandas/core/reshape/melt.py 97.34% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27de8e6...d8e5370. Read the comment docs.

@TomAugspurger
Copy link
Contributor

@Ecboxer do you have time to update?

Copy link
Contributor Author

@Ecboxer Ecboxer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ecboxer do you have time to update?

The edited commit passed my local tests, I would like to try to update though. Can you direct me on how to understand/correct the issues in the checks?

--------

>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
index=['a', 'b'])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I will make the changes.

@@ -726,13 +726,32 @@ def style(self):

def iteritems(self):
"""
Iterator over (column name, Series) pairs.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to make the change to 'Iterate over DataFrame ... as (..., Series) pairs to stay within the style of the iterrows and itertuples functions, but I can revert back to 'Iterator over (column name, Series) pairs.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments

@@ -728,11 +728,35 @@ def iteritems(self):
"""
Iterator over (column name, Series) pairs.

Iterates over columns as key, value dict-like pairs with columns name as keys and Series as values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is a bit complex, and can give the impression that it's returning a dictionary.

Something like "Iterates over the DataFrame columns, returning a tuple with the label and the content as a Series" or something similar would be clearer IMHO.

Returns
-------
it : generator
A generator that iterates over the columns of the frame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of a genrator, instead of documenting that it returns a generator, we use the section Yields. In this case something like:

Yields
-------
label : object
    Description of label
content : Series
   Description of content

-------
it : generator
A generator that iterates over the columns of the frame.

See also
--------
iterrows : Iterate over DataFrame rows as (index, Series) pairs.
itertuples : Iterate over DataFrame rows as namedtuples of the values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A in See Also should be capitalized.

Use DataFrame.iterrows instead of iterrows and same for itertuples.

Examples
--------
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
... index=['a', 'b'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use something that looks more like a real world example? I think it makes things easier to read.

See for example the example DataFrame in https://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DataFrame.reset_index.html

a 1 0.1
b 2 0.2
>>> for col in df.iteritems():
... print(col)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like next would make things easier to understand:

for label, content in df.iteritems():
    print('label:', label)
    print('content:' content)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the edits! I rephrased the long description, changed the Yields section to cover label and content, prepended DataFrame. to the See Also examples, and reworked the Examples section to hopefully be more engaging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you push the changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got my git commands mixed up sheepishly, but it's pushed now

…t, edits to See Also, and made a more engaging example
Edited iteritems documentation: long description, Yields, See Also and Examples
koala marsupial 80000
>>> for label, content in df.iteritems():
... print('label:', label)
... print('content:', content)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this second print we could have print('content:', content, sep='\n', end='\n') and the output would be easier to read.

And use population istead of pop.

Other than that, looks great, thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I didn't know that print could take sep and end args, that's neat!

@datapythonista
Copy link
Member

Can you run the validation script? I think the output in the tests is not accurate.

You can do it by ./scripts/validate_docstrings.py pandas.DataFrame.iteritems (you need pandas compiled), and it should say everything is ok.

@Ecboxer
Copy link
Contributor Author

Ecboxer commented Sep 25, 2018

Can you run the validation script? I think the output in the tests is not accurate.

You can do it by ./scripts/validate_docstrings.py pandas.DataFrame.iteritems (you need pandas compiled), and it should say everything is ok.

When I run scripts/validate_docstrings.py pandas.DataFrame.iteritems I get the following error under #Validation#
Errors found:
No Returns section found
(When I renamed Yields to Returns, I would get No Yields section found). Should I make some sort of empty Returns section?

@datapythonista
Copy link
Member

datapythonista commented Sep 25, 2018

If that's the only error, ignore it, it may be an error in the validation script.

I'd just make a quick update if you don't mind. Instead of using \\n in the code to escape the backslash, I think it'd be more readable if we set the docstring as a raw string with (r""" at the beginning). And may be we can set end='\n\n'? that would add a blank line between the two items (I didn't realised that end='\n' is actually the default, so it's actually doing nothing now). If you think the \n\n it's adding too much complexity, just get rid of the end parameter.

Thanks!

@Ecboxer
Copy link
Contributor Author

Ecboxer commented Sep 25, 2018

If that's the only error, ignore it, it may be an error in the validation script.

I'd just make a quick update if you don't mind. Instead of using \\n in the code to escape the backslash, I think it'd be more readable if we set the docstring as a raw string with (r""" at the beginning). And may be we can set end=\n\n? that would add a blank line between the two items (I didn't realised thatend='\n'is actually the default, so it's actually doing nothing now). If you think the\n\nit's adding too much complexity, just get rid of theend` parameter.

Thanks!

I tried it with the end='\n\n' but there was another Validation error for
Errors found:
Closing quotes should be placed in the line after the last text in the docstring (do not close the quotes in the same line as the text, or leave a blank line between the last text and the quotes)
I've dropped the end parameter, because I'm not sure how to get around that test.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this is very clear anyway. Looks great, really useful docstring. Thanks a lot @Ecboxer

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ecboxer thanks for working on this! Added some more comments

Iterator over (column name, Series) pairs.

See also
Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't fit within 80 chars line length limit (PEP8), I assume the CI is failing due to that.

There are also some other lines below that are too long. I would recommend to activate a flake8 / linter plugin in your editor, or run flake8 locally. You can also check the output of the failing build on Travis.

--------
>>> df = pd.DataFrame({'species': ['bear', 'bear', 'bear', 'bear', 'marsupial'],
... 'population': [300000, 200000, 1864, 22000, 80000]},
... index=['black', 'brown', 'panda', 'polar', 'koala'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it a little bit shorter (eg 3 rows seems enough), that will make the output below more clear I think

label : object
The column names for the DataFrame being iterated over.
content : Series
The column entries belonging to each label, as a Series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista I think in this case, the above is actually a bit confusing. Typically, we use the formatting above if there are actually two return values (so if you could do label, content = df.iteritems()), which is not the case here.
So I think the original single item was better, but we could try to make it clearer that it the values of the iterator consist of those two items.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks clear to me, not sure why it doesn't to you. In this case you can do for label, content in df.iteritems(): which is equivalent to what you said.

Not a big deal changing this to a Returns saying it's a generator returning tuples. But I don't think that would be clearer to me, and feels a bit inconsistent.

What do you think is clearer for you @Ecboxer? Also, may be @WillAyd want to give an opinion, and he's doing a lot with the docstrings? Happy with whatever option is clearer to most people.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think of the rephrased it under Yields. It may be too wordy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for label, content in df.iteritems() is not the same as label, content = df.iteritems() ..

The thing is that otherwise we are using the same visual formatting to mean two different things. I would prefer that a user can know from the return type if there is a single or multiple return values (but maybe I am overestimating our users?)

We can maybe still combine both, something like:

Iterator over (label, content) pairs
    label : object
        The column names for the DataFrame being iterated over.
    content : Series
        The column entries belonging to each label, as a Series.

or does that only make it more complicated?

Copy link
Member

@jorisvandenbossche jorisvandenbossche Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I missed that it was a "Yields" section, and not a "Returns" section. In that case, it is correct that it yields two values in each iteration! (and how you did it here is consistent with the numpydoc guidelines)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ecboxer sorry, you can change it back to how it was before I commented :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe, I see what you meant now. Cool then. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it back :)

@jorisvandenbossche jorisvandenbossche merged commit a851401 into pandas-dev:master Sep 27, 2018
@jorisvandenbossche
Copy link
Member

@Ecboxer Thanks!

@jorisvandenbossche jorisvandenbossche added this to the 0.24.0 milestone Sep 27, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants