-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC iteritems docstring update and examples #22658
DOC iteritems docstring update and examples #22658
Conversation
Hello @Ecboxer! Thanks for updating the PR.
Comment last updated on September 25, 2018 at 14:12 Hours UTC |
pandas/core/frame.py
Outdated
-------- | ||
|
||
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]}, | ||
index=['a', 'b']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuation lines need to start with ...
(right below the >>>
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, make sure the i
in index
is right below the opening {
on the line above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I will make the changes.
You can run the validation script locally FYI. |
pandas/core/frame.py
Outdated
@@ -726,13 +726,32 @@ def style(self): | |||
|
|||
def iteritems(self): | |||
""" | |||
Iterator over (column name, Series) pairs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this returns an Iterator
(or more specifically a Generator
). So I would leave this as Iterator
or change this to Generator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought to make the change to 'Iterate over DataFrame ... as (..., Series) pairs to stay within the style of the iterrows and itertuples functions, but I can revert back to 'Iterator over (column name, Series) pairs.
Codecov Report
@@ Coverage Diff @@
## master #22658 +/- ##
=======================================
Coverage 92.18% 92.18%
=======================================
Files 169 169
Lines 50820 50820
=======================================
Hits 46850 46850
Misses 3970 3970
Continue to review full report at Codecov.
|
@Ecboxer do you have time to update? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ecboxer do you have time to update?
The edited commit passed my local tests, I would like to try to update though. Can you direct me on how to understand/correct the issues in the checks?
pandas/core/frame.py
Outdated
-------- | ||
|
||
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]}, | ||
index=['a', 'b']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I will make the changes.
pandas/core/frame.py
Outdated
@@ -726,13 +726,32 @@ def style(self): | |||
|
|||
def iteritems(self): | |||
""" | |||
Iterator over (column name, Series) pairs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought to make the change to 'Iterate over DataFrame ... as (..., Series) pairs to stay within the style of the iterrows and itertuples functions, but I can revert back to 'Iterator over (column name, Series) pairs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments
pandas/core/frame.py
Outdated
@@ -728,11 +728,35 @@ def iteritems(self): | |||
""" | |||
Iterator over (column name, Series) pairs. | |||
|
|||
Iterates over columns as key, value dict-like pairs with columns name as keys and Series as values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is a bit complex, and can give the impression that it's returning a dictionary.
Something like "Iterates over the DataFrame columns, returning a tuple with the label and the content as a Series" or something similar would be clearer IMHO.
pandas/core/frame.py
Outdated
Returns | ||
------- | ||
it : generator | ||
A generator that iterates over the columns of the frame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of a genrator, instead of documenting that it returns a generator, we use the section Yields
. In this case something like:
Yields
-------
label : object
Description of label
content : Series
Description of content
pandas/core/frame.py
Outdated
------- | ||
it : generator | ||
A generator that iterates over the columns of the frame. | ||
|
||
See also | ||
-------- | ||
iterrows : Iterate over DataFrame rows as (index, Series) pairs. | ||
itertuples : Iterate over DataFrame rows as namedtuples of the values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A
in See Also
should be capitalized.
Use DataFrame.iterrows
instead of iterrows
and same for itertuples
.
pandas/core/frame.py
Outdated
Examples | ||
-------- | ||
>>> df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]}, | ||
... index=['a', 'b']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use something that looks more like a real world example? I think it makes things easier to read.
See for example the example DataFrame in https://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DataFrame.reset_index.html
pandas/core/frame.py
Outdated
a 1 0.1 | ||
b 2 0.2 | ||
>>> for col in df.iteritems(): | ||
... print(col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like next would make things easier to understand:
for label, content in df.iteritems():
print('label:', label)
print('content:' content)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the edits! I rephrased the long description, changed the Yields section to cover label and content, prepended DataFrame. to the See Also examples, and reworked the Examples section to hopefully be more engaging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you push the changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got my git commands mixed up sheepishly, but it's pushed now
…t, edits to See Also, and made a more engaging example
Edited iteritems documentation: long description, Yields, See Also and Examples
pandas/core/frame.py
Outdated
koala marsupial 80000 | ||
>>> for label, content in df.iteritems(): | ||
... print('label:', label) | ||
... print('content:', content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this second print we could have print('content:', content, sep='\n', end='\n')
and the output would be easier to read.
And use population
istead of pop
.
Other than that, looks great, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I didn't know that print could take sep and end args, that's neat!
Can you run the validation script? I think the output in the tests is not accurate. You can do it by |
When I run scripts/validate_docstrings.py pandas.DataFrame.iteritems I get the following error under #Validation# |
If that's the only error, ignore it, it may be an error in the validation script. I'd just make a quick update if you don't mind. Instead of using Thanks! |
I tried it with the end='\n\n' but there was another Validation error for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think this is very clear anyway. Looks great, really useful docstring. Thanks a lot @Ecboxer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ecboxer thanks for working on this! Added some more comments
pandas/core/frame.py
Outdated
Iterator over (column name, Series) pairs. | ||
|
||
See also | ||
Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't fit within 80 chars line length limit (PEP8), I assume the CI is failing due to that.
There are also some other lines below that are too long. I would recommend to activate a flake8 / linter plugin in your editor, or run flake8
locally. You can also check the output of the failing build on Travis.
pandas/core/frame.py
Outdated
-------- | ||
>>> df = pd.DataFrame({'species': ['bear', 'bear', 'bear', 'bear', 'marsupial'], | ||
... 'population': [300000, 200000, 1864, 22000, 80000]}, | ||
... index=['black', 'brown', 'panda', 'polar', 'koala']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it a little bit shorter (eg 3 rows seems enough), that will make the output below more clear I think
label : object | ||
The column names for the DataFrame being iterated over. | ||
content : Series | ||
The column entries belonging to each label, as a Series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datapythonista I think in this case, the above is actually a bit confusing. Typically, we use the formatting above if there are actually two return values (so if you could do label, content = df.iteritems()
), which is not the case here.
So I think the original single item was better, but we could try to make it clearer that it the values of the iterator consist of those two items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks clear to me, not sure why it doesn't to you. In this case you can do for label, content in df.iteritems():
which is equivalent to what you said.
Not a big deal changing this to a Returns
saying it's a generator returning tuples. But I don't think that would be clearer to me, and feels a bit inconsistent.
What do you think is clearer for you @Ecboxer? Also, may be @WillAyd want to give an opinion, and he's doing a lot with the docstrings? Happy with whatever option is clearer to most people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what you think of the rephrased it under Yields. It may be too wordy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for label, content in df.iteritems()
is not the same as label, content = df.iteritems()
..
The thing is that otherwise we are using the same visual formatting to mean two different things. I would prefer that a user can know from the return type if there is a single or multiple return values (but maybe I am overestimating our users?)
We can maybe still combine both, something like:
Iterator over (label, content) pairs
label : object
The column names for the DataFrame being iterated over.
content : Series
The column entries belonging to each label, as a Series.
or does that only make it more complicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I missed that it was a "Yields" section, and not a "Returns" section. In that case, it is correct that it yields two values in each iteration! (and how you did it here is consistent with the numpydoc guidelines)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ecboxer sorry, you can change it back to how it was before I commented :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hehe, I see what you meant now. Cool then. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it back :)
@Ecboxer Thanks! |
Updated iteritems docstring to start with an infinitive and added a short example
git diff upstream/master -u -- "*.py" | flake8 --diff