DOC iteritems docstring update and examples #22658

Ecboxer · 2018-09-10T12:34:25Z

Updated iteritems docstring to start with an infinitive and added a short example

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Updated iteritems docstring to start with an infinitive and added a short example

pep8speaks · 2018-09-10T12:34:30Z

Hello @Ecboxer! Thanks for updating the PR.

In the file pandas/core/frame.py, following are the PEP8 issues :

Line 784:80: E501 line too long (83 > 79 characters)
Line 796:80: E501 line too long (82 > 79 characters)
Line 797:80: E501 line too long (88 > 79 characters)

Comment last updated on September 25, 2018 at 14:12 Hours UTC

TomAugspurger · 2018-09-10T16:14:52Z

pandas/core/frame.py

+        --------
+
+        >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
+                             index=['a', 'b'])


Continuation lines need to start with ... (right below the >>>)

Also, make sure the i in index is right below the opening { on the line above.

Gotcha, I will make the changes.

TomAugspurger · 2018-09-10T16:15:48Z

You can run the validation script locally FYI. python scripts/validate_docstring.py pandas.DataFrame.iteritems I think.

mroeschke · 2018-09-10T17:10:02Z

pandas/core/frame.py

@@ -726,13 +726,32 @@ def style(self):

    def iteritems(self):
        """
-        Iterator over (column name, Series) pairs.


Technically this returns an Iterator (or more specifically a Generator). So I would leave this as Iterator or change this to Generator.

I thought to make the change to 'Iterate over DataFrame ... as (..., Series) pairs to stay within the style of the iterrows and itertuples functions, but I can revert back to 'Iterator over (column name, Series) pairs.

…on to original

codecov · 2018-09-12T01:20:57Z

Codecov Report

Merging #22658 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #22658   +/-   ##
=======================================
  Coverage   92.18%   92.18%           
=======================================
  Files         169      169           
  Lines       50820    50820           
=======================================
  Hits        46850    46850           
  Misses       3970     3970

Flag	Coverage Δ
#multiple	`90.6% <ø> (ø)`	⬆️
#single	`42.38% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.2% <ø> (ø)`	⬆️
pandas/core/generic.py	`96.67% <0%> (ø)`	⬆️
pandas/core/reshape/melt.py	`97.34% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27de8e6...d8e5370. Read the comment docs.

TomAugspurger · 2018-09-20T22:19:32Z

@Ecboxer do you have time to update?

Ecboxer

@Ecboxer do you have time to update?

The edited commit passed my local tests, I would like to try to update though. Can you direct me on how to understand/correct the issues in the checks?

Ecboxer · 2018-09-11T13:06:38Z

pandas/core/frame.py

+        --------
+
+        >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
+                             index=['a', 'b'])


Gotcha, I will make the changes.

Ecboxer · 2018-09-11T13:08:45Z

pandas/core/frame.py

@@ -726,13 +726,32 @@ def style(self):

    def iteritems(self):
        """
-        Iterator over (column name, Series) pairs.


I thought to make the change to 'Iterate over DataFrame ... as (..., Series) pairs to stay within the style of the iterrows and itertuples functions, but I can revert back to 'Iterator over (column name, Series) pairs.

datapythonista

Added some comments

datapythonista · 2018-09-23T12:57:21Z

pandas/core/frame.py

@@ -728,11 +728,35 @@ def iteritems(self):
        """
        Iterator over (column name, Series) pairs.

+        Iterates over columns as key, value dict-like pairs with columns name as keys and Series as values.


This description is a bit complex, and can give the impression that it's returning a dictionary.

Something like "Iterates over the DataFrame columns, returning a tuple with the label and the content as a Series" or something similar would be clearer IMHO.

datapythonista · 2018-09-23T12:59:51Z

pandas/core/frame.py

+        Returns
+        -------
+        it : generator
+            A generator that iterates over the columns of the frame.


In the case of a genrator, instead of documenting that it returns a generator, we use the section Yields. In this case something like:

Yields ------- label : object Description of label content : Series Description of content

datapythonista · 2018-09-23T13:00:40Z

pandas/core/frame.py

+        -------
+        it : generator
+            A generator that iterates over the columns of the frame.
+
        See also
        --------
        iterrows : Iterate over DataFrame rows as (index, Series) pairs.
        itertuples : Iterate over DataFrame rows as namedtuples of the values.


A in See Also should be capitalized.

Use DataFrame.iterrows instead of iterrows and same for itertuples.

datapythonista · 2018-09-23T13:03:26Z

pandas/core/frame.py

+        Examples
+        --------
+        >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
+        ...                   index=['a', 'b'])


Can you use something that looks more like a real world example? I think it makes things easier to read.

See for example the example DataFrame in https://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DataFrame.reset_index.html

datapythonista · 2018-09-23T13:04:39Z

pandas/core/frame.py

+        a     1  0.1
+        b     2  0.2
+        >>> for col in df.iteritems():
+        ...     print(col)


I think something like next would make things easier to understand:

for label, content in df.iteritems(): print('label:', label) print('content:' content)

Thanks for the edits! I rephrased the long description, changed the Yields section to cover label and content, prepended DataFrame. to the See Also examples, and reworked the Examples section to hopefully be more engaging.

did you push the changes?

I got my git commands mixed up sheepishly, but it's pushed now

…t, edits to See Also, and made a more engaging example

Edited iteritems documentation: long description, Yields, See Also and Examples

datapythonista · 2018-09-25T08:55:42Z

pandas/core/frame.py

+        koala 	marsupial 80000
+        >>> for label, content in df.iteritems():
+        ...     print('label:', label)
+        ...     print('content:', content)


I think for this second print we could have print('content:', content, sep='\n', end='\n') and the output would be easier to read.

And use population istead of pop.

Other than that, looks great, thank you!

Gotcha. I didn't know that print could take sep and end args, that's neat!

datapythonista · 2018-09-25T12:33:16Z

Can you run the validation script? I think the output in the tests is not accurate.

You can do it by ./scripts/validate_docstrings.py pandas.DataFrame.iteritems (you need pandas compiled), and it should say everything is ok.

Ecboxer · 2018-09-25T12:58:04Z

Can you run the validation script? I think the output in the tests is not accurate.

You can do it by ./scripts/validate_docstrings.py pandas.DataFrame.iteritems (you need pandas compiled), and it should say everything is ok.

When I run scripts/validate_docstrings.py pandas.DataFrame.iteritems I get the following error under #Validation#
Errors found:
No Returns section found
(When I renamed Yields to Returns, I would get No Yields section found). Should I make some sort of empty Returns section?

datapythonista · 2018-09-25T13:07:24Z

If that's the only error, ignore it, it may be an error in the validation script.

I'd just make a quick update if you don't mind. Instead of using \\n in the code to escape the backslash, I think it'd be more readable if we set the docstring as a raw string with (r""" at the beginning). And may be we can set end='\n\n'? that would add a blank line between the two items (I didn't realised that end='\n' is actually the default, so it's actually doing nothing now). If you think the \n\n it's adding too much complexity, just get rid of the end parameter.

Thanks!

Ecboxer · 2018-09-25T13:16:50Z

If that's the only error, ignore it, it may be an error in the validation script.

I'd just make a quick update if you don't mind. Instead of using \\n in the code to escape the backslash, I think it'd be more readable if we set the docstring as a raw string with (r""" at the beginning). And may be we can set end=\n\n? that would add a blank line between the two items (I didn't realised thatend='\n'is actually the default, so it's actually doing nothing now). If you think the\n\nit's adding too much complexity, just get rid of theend` parameter.

Thanks!

I tried it with the end='\n\n' but there was another Validation error for
Errors found:
Closing quotes should be placed in the line after the last text in the docstring (do not close the quotes in the same line as the text, or leave a blank line between the last text and the quotes)
I've dropped the end parameter, because I'm not sure how to get around that test.

datapythonista

Ok, I think this is very clear anyway. Looks great, really useful docstring. Thanks a lot @Ecboxer

jorisvandenbossche

@Ecboxer thanks for working on this! Added some more comments

jorisvandenbossche · 2018-09-25T13:31:08Z

pandas/core/frame.py

        Iterator over (column name, Series) pairs.

-        See also
+        Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series.


this doesn't fit within 80 chars line length limit (PEP8), I assume the CI is failing due to that.

There are also some other lines below that are too long. I would recommend to activate a flake8 / linter plugin in your editor, or run flake8 locally. You can also check the output of the failing build on Travis.

jorisvandenbossche · 2018-09-25T13:33:29Z

pandas/core/frame.py

+        --------
+        >>> df = pd.DataFrame({'species': ['bear', 'bear', 'bear', 'bear', 'marsupial'],
+        ...                   'population': [300000, 200000, 1864, 22000, 80000]},
+        ...                   index=['black', 'brown', 'panda', 'polar', 'koala'])


I would make it a little bit shorter (eg 3 rows seems enough), that will make the output below more clear I think

jorisvandenbossche · 2018-09-25T13:34:40Z

pandas/core/frame.py

+        label : object
+            The column names for the DataFrame being iterated over.
+        content : Series
+            The column entries belonging to each label, as a Series.


@datapythonista I think in this case, the above is actually a bit confusing. Typically, we use the formatting above if there are actually two return values (so if you could do label, content = df.iteritems()), which is not the case here.
So I think the original single item was better, but we could try to make it clearer that it the values of the iterator consist of those two items.

Looks clear to me, not sure why it doesn't to you. In this case you can do for label, content in df.iteritems(): which is equivalent to what you said.

Not a big deal changing this to a Returns saying it's a generator returning tuples. But I don't think that would be clearer to me, and feels a bit inconsistent.

What do you think is clearer for you @Ecboxer? Also, may be @WillAyd want to give an opinion, and he's doing a lot with the docstrings? Happy with whatever option is clearer to most people.

Let me know what you think of the rephrased it under Yields. It may be too wordy?

for label, content in df.iteritems() is not the same as label, content = df.iteritems() ..

The thing is that otherwise we are using the same visual formatting to mean two different things. I would prefer that a user can know from the return type if there is a single or multiple return values (but maybe I am overestimating our users?)

We can maybe still combine both, something like:

Iterator over (label, content) pairs label : object The column names for the DataFrame being iterated over. content : Series The column entries belonging to each label, as a Series.

or does that only make it more complicated?

Ah sorry, I missed that it was a "Yields" section, and not a "Returns" section. In that case, it is correct that it yields two values in each iteration! (and how you did it here is consistent with the numpydoc guidelines)

@Ecboxer sorry, you can change it back to how it was before I commented :-)

hehe, I see what you meant now. Cool then. :)

Changed it back :)

…cription of one item it

jorisvandenbossche · 2018-09-27T12:54:22Z

@Ecboxer Thanks!

DOC

0521552

Updated iteritems docstring to start with an infinitive and added a short example

TomAugspurger reviewed Sep 10, 2018

View reviewed changes

mroeschke reviewed Sep 10, 2018

View reviewed changes

jschendel added the Docs label Sep 11, 2018

Ecboxer added 3 commits September 11, 2018 09:14

Continuation lines start with ... and are aligned, reverted descripti…

5447a1f

…on to original

Removed trailing space and added Returns/Yields section

8ccd554

Added extended summary

fcc27e8

Ecboxer commented Sep 21, 2018

View reviewed changes

datapythonista requested changes Sep 23, 2018

View reviewed changes

Ecboxer added 2 commits September 23, 2018 10:15

Edits to long description, split Yields section into label and conten…

6dad21c

…t, edits to See Also, and made a more engaging example

Merge remote-tracking branch 'upstream/master' into shiny-new-feature

25da7f8

Edited iteritems documentation: long description, Yields, See Also and Examples

datapythonista reviewed Sep 25, 2018

View reviewed changes

Ecboxer added 2 commits September 25, 2018 07:59

Changed pop to population and reformatted example

30026a4

Changed pop to population in example output

5110b7c

Changed docstring to string literal and removed escape chars

76243b3

Removed end="\n"

1b52a08

datapythonista approved these changes Sep 25, 2018

View reviewed changes

jorisvandenbossche reviewed Sep 25, 2018

View reviewed changes

Ecboxer added 2 commits September 25, 2018 09:53

Shortened example, fit long desc to char limit, changed Yields to des…

618318c

…cription of one item it

Returned to two part Yields section, label and contents

d8e5370

jorisvandenbossche approved these changes Sep 27, 2018

View reviewed changes

jorisvandenbossche merged commit a851401 into pandas-dev:master Sep 27, 2018

jorisvandenbossche added this to the 0.24.0 milestone Sep 27, 2018

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DOC iteritems docstring update and examples (pandas-dev#22658)

eea97cb

DOC iteritems docstring update and examples #22658

DOC iteritems docstring update and examples #22658

Conversation

Ecboxer commented Sep 10, 2018

pep8speaks commented Sep 10, 2018 • edited Loading

Comment last updated on September 25, 2018 at 14:12 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Sep 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Sep 12, 2018 • edited Loading

Codecov Report

TomAugspurger commented Sep 20, 2018

Ecboxer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista commented Sep 25, 2018

Ecboxer commented Sep 25, 2018

datapythonista commented Sep 25, 2018 • edited Loading

Ecboxer commented Sep 25, 2018

datapythonista left a comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche Sep 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Sep 27, 2018

pep8speaks commented Sep 10, 2018 •

edited

Loading

codecov bot commented Sep 12, 2018 •

edited

Loading

datapythonista commented Sep 25, 2018 •

edited

Loading

jorisvandenbossche Sep 25, 2018 •

edited

Loading