Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC iteritems docstring update and examples #22658

Merged
merged 12 commits into from
Sep 27, 2018
44 changes: 40 additions & 4 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -778,14 +778,50 @@ def style(self):
return Styler(self)

def iteritems(self):
"""
r"""
Iterator over (column name, Series) pairs.

See also
Iterates over the DataFrame columns, returning a tuple with the column name
and the content as a Series.

Yields
------
label : object
The column names for the DataFrame being iterated over.
content : Series
The column entries belonging to each label, as a Series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista I think in this case, the above is actually a bit confusing. Typically, we use the formatting above if there are actually two return values (so if you could do label, content = df.iteritems()), which is not the case here.
So I think the original single item was better, but we could try to make it clearer that it the values of the iterator consist of those two items.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks clear to me, not sure why it doesn't to you. In this case you can do for label, content in df.iteritems(): which is equivalent to what you said.

Not a big deal changing this to a Returns saying it's a generator returning tuples. But I don't think that would be clearer to me, and feels a bit inconsistent.

What do you think is clearer for you @Ecboxer? Also, may be @WillAyd want to give an opinion, and he's doing a lot with the docstrings? Happy with whatever option is clearer to most people.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think of the rephrased it under Yields. It may be too wordy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for label, content in df.iteritems() is not the same as label, content = df.iteritems() ..

The thing is that otherwise we are using the same visual formatting to mean two different things. I would prefer that a user can know from the return type if there is a single or multiple return values (but maybe I am overestimating our users?)

We can maybe still combine both, something like:

Iterator over (label, content) pairs
    label : object
        The column names for the DataFrame being iterated over.
    content : Series
        The column entries belonging to each label, as a Series.

or does that only make it more complicated?

Copy link
Member

@jorisvandenbossche jorisvandenbossche Sep 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I missed that it was a "Yields" section, and not a "Returns" section. In that case, it is correct that it yields two values in each iteration! (and how you did it here is consistent with the numpydoc guidelines)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ecboxer sorry, you can change it back to how it was before I commented :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe, I see what you meant now. Cool then. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it back :)


See Also
--------
iterrows : Iterate over DataFrame rows as (index, Series) pairs.
itertuples : Iterate over DataFrame rows as namedtuples of the values.
DataFrame.iterrows : Iterate over DataFrame rows as (index, Series) pairs.
DataFrame.itertuples : Iterate over DataFrame rows as namedtuples of the values.

Examples
--------
>>> df = pd.DataFrame({'species': ['bear', 'bear', 'marsupial'],
... 'population': [1864, 22000, 80000]},
... index=['panda', 'polar', 'koala'])
>>> df
species population
panda bear 1864
polar bear 22000
koala marsupial 80000
>>> for label, content in df.iteritems():
... print('label:', label)
... print('content:', content, sep='\n')
...
label: species
content:
panda bear
polar bear
koala marsupial
Name: species, dtype: object
label: population
content:
panda 1864
polar 22000
koala 80000
Name: population, dtype: int64
"""
if self.columns.is_unique and hasattr(self, '_item_cache'):
for k in self.columns:
Expand Down