Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: improve doc string for .aggregate and .transform #22641

Merged
merged 4 commits into from
Sep 18, 2018
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 48 additions & 38 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -4545,17 +4545,16 @@ def pipe(self, func, *args, **kwargs):

Parameters
----------
func : function, string, dictionary, or list of string/functions
func : function, string, list of functions and/or strings or dict
Function to use for aggregating the data. If a function, must either
work when passed a %(klass)s or when passed to %(klass)s.apply. For
a DataFrame, can pass a dict, if the keys are DataFrame column names.
work when passed a %(klass)s or when passed to %(klass)s.apply.

Accepted combinations are:

- string function name.
- function.
- list of functions.
- dict of column names -> functions (or list of functions).
- string function name
- function
- list of functions and/or function names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be you can add the example you wrote in the comments? I think that would make much clearer that we can mix both

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be you can add the example you wrote in the comment?

Not sure I understand whst you mean here, could you expand?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant that we could have an example like [np.exp, 'sqrt'] that you mentioned before, so it's easier to see that you can use both strings and functions together in the same list.

- dict of axis labels -> functions, function names or list of such
%(axis)s
*args
Positional arguments to pass to `func`.
Expand All @@ -4564,7 +4563,7 @@ def pipe(self, func, *args, **kwargs):

Returns
-------
aggregated : %(klass)s
pandas.%(klass)s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we prefix Series and DataFrame with pandas. I'd just leave with the class for consistency.


Notes
-----
Expand All @@ -4574,50 +4573,61 @@ def pipe(self, func, *args, **kwargs):
""")

_shared_docs['transform'] = ("""
Call function producing a like-indexed %(klass)s
and return a %(klass)s with the transformed values
Call ``func`` on self producing a %(klass)s with transformed values
and that has the same axis length as self.

.. versionadded:: 0.20.0

Parameters
----------
func : callable, string, dictionary, or list of string/callables
To apply to column
func : function, string, list of functions and/or strings or dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str instead of string

Function to use for transforming the data. If a function, must either
work when passed a %(klass)s or when passed to %(klass)s.apply.

Accepted Combinations are:
Accepted combinations are:

- string function name
- function
- list of functions
- dict of column names -> functions (or list of functions)
- list of functions and/or function names
- dict of axis labels -> functions, function names or list of such.
%(axis)s
*args
Positional arguments to pass to `func`.
**kwargs
Keyword arguments to pass to `func`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you prefer, numpydoc also accepts having both in one line:

*args, **kwargs
    Arguments to pass to `func`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I did that, though my own preference would be to not use star arguments and use args=None, kwargs=None instead to better have a distinction what gets passed to func and what doesn't get passed on. But that's for a whole another discussion, and may be too late now :-)

Copy link
Contributor Author

@topper-123 topper-123 Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just found out that scripts/validate_docstrings.py doesnt accept putting those on the same line, so I've reverted to the previous style.


Returns
-------
transformed : %(klass)s
pandas.%(klass)s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before.

A %(klass)s that must have the same length as self.

Examples
Raises
------
ValueError : if the returned %(klass)s has a different length than self.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capital I in if.


See Also
--------
>>> df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
... index=pd.date_range('1/1/2000', periods=10))
df.iloc[3:7] = np.nan

>>> df.transform(lambda x: (x - x.mean()) / x.std())
A B C
2000-01-01 0.579457 1.236184 0.123424
2000-01-02 0.370357 -0.605875 -1.231325
2000-01-03 1.455756 -0.277446 0.288967
2000-01-04 NaN NaN NaN
2000-01-05 NaN NaN NaN
2000-01-06 NaN NaN NaN
2000-01-07 NaN NaN NaN
2000-01-08 -0.498658 1.274522 1.642524
2000-01-09 -0.540524 -1.012676 -0.828968
2000-01-10 -1.366388 -0.614710 0.005378

See also
pandas.%(klass)s.agg : only perform aggregating type operations
pandas.%(klass)s.apply : Invoke function on a Series
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for pandas.. Capital O in only. Descriptions finishing with period.


Examples
--------
pandas.%(klass)s.aggregate
pandas.%(klass)s.apply
>>> df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})
>>> df.transform(lambda x: x + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would display df first before the transform. So, it's immediate to compare the original data and the transformed data, even without reading the constructor of the DataFrame.

A B
0 1 2
1 2 3
2 3 4

Even though the resulting %(klass)s must have the length as the input
%(klass)s, it is possible to provide several input functions:

>>> s = pd.Series(range(3))
>>> s.transform([np.sqrt, np.exp])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also show s before the transform.

sqrt exp
0 0.000000 1.000000
1 1.000000 2.718282
2 1.414214 7.389056
""")

# ----------------------------------------------------------------------
Expand Down Expand Up @@ -9401,7 +9411,7 @@ def ewm(self, com=None, span=None, halflife=None, alpha=None,

cls.ewm = ewm

@Appender(_shared_docs['transform'] % _shared_doc_kwargs)
@Appender(_shared_docs['transform'] % dict(axis="", **_shared_doc_kwargs))
def transform(self, func, *args, **kwargs):
result = self.agg(func, *args, **kwargs)
if is_scalar(result) or len(result) != len(self):
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -3098,6 +3098,12 @@ def aggregate(self, func, axis=0, *args, **kwargs):

agg = aggregate

@Appender(generic._shared_docs['transform'] % _shared_doc_kwargs)
def transform(self, func, axis=0, *args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm generally not sure its worth changing actual implementation for docstrings. If this is solely to isolate the various Examples I'd think it preferable to just have one shared Example docstring that covers Series and DataFrame rather than making code changes like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc string is currently inherited from NDFrame, but it's not pretty IMO, see here

The issues are:

  • The current doc string discussed NDFrames, not Series, which is confusing,
  • the links in the SeeAlso don't work,
  • Series.transform's signature is in master different from the signature for Series.agg and Series.apply in that it misses the axis parameter that the other two have. This may/may not be a problem (I actually couldn't produce a bug caused by this), but the signature should be consistent amongst the three methods, and I agree with the current design for agg and apply (i.e. to have an axis parameter).

Thoughts? I can remove this if there's not consensus.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say just use substitution for the class name (i.e. Series or DataFrame) and update the See Also links to point to both the Series and DataFrame methods. One of those will obviously be self referencing, but we've done this in other places as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see what you mean. I've made a new commit with common examples but the SeeAlso can actually be made correct if Series gets it's own transform method (which is needed for the signature issue to be resolved).

# Validate the axis parameter
self._get_axis_number(axis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks that the value passed to axis is 0 or “index”, else an exception is raised. So, a minor check for consistency.

return super(Series, self).transform(func, *args, **kwargs)

def apply(self, func, convert_dtype=True, args=(), **kwds):
"""
Invoke function on values of Series. Can be ufunc (a NumPy function
Expand Down