-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: improve doc string for .aggregate and .transform #22641
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4545,17 +4545,16 @@ def pipe(self, func, *args, **kwargs): | |
|
||
Parameters | ||
---------- | ||
func : function, string, dictionary, or list of string/functions | ||
func : function, string, list of string/functions or dictionary | ||
Function to use for aggregating the data. If a function, must either | ||
work when passed a %(klass)s or when passed to %(klass)s.apply. For | ||
a DataFrame, can pass a dict, if the keys are DataFrame column names. | ||
work when passed a %(klass)s or when passed to %(klass)s.apply. | ||
|
||
Accepted combinations are: | ||
|
||
- string function name. | ||
- function. | ||
- list of functions. | ||
- dict of column names -> functions (or list of functions). | ||
- string function name | ||
- function | ||
- list of functions and/or function names | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. may be you can add the example you wrote in the comments? I think that would make much clearer that we can mix both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not sure I understand whst you mean here, could you expand? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I meant that we could have an example like |
||
- dict of axis labels -> functions, function names or list of such | ||
%(axis)s | ||
*args | ||
Positional arguments to pass to `func`. | ||
|
@@ -4581,38 +4580,61 @@ def pipe(self, func, *args, **kwargs): | |
|
||
Parameters | ||
---------- | ||
func : callable, string, dictionary, or list of string/callables | ||
To apply to column | ||
func : function, string, list of string/functions or dictionary | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same as before |
||
Function to use for transforming the data. If a function, must either | ||
work when passed a %(klass)s or when passed to %(klass)s.apply. | ||
|
||
Accepted Combinations are: | ||
Accepted combinations are: | ||
|
||
- string function name | ||
- function | ||
- list of functions | ||
- dict of column names -> functions (or list of functions) | ||
- list of functions and/or function names | ||
- dict of axis labels -> functions, function names or list of such | ||
%(axis)s | ||
*args | ||
Positional arguments to pass to `func`. | ||
**kwargs | ||
Keyword arguments to pass to `func`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you prefer, numpydoc also accepts having both in one line:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright, I did that, though my own preference would be to not use star arguments and use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just found out that |
||
|
||
Returns | ||
------- | ||
transformed : %(klass)s | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
|
||
Raises | ||
------ | ||
ValueError: if the returned %(klass)s has a different length than self. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if sphinx requires a space before the colon. Did you render the docstring? Also, if you can capitalize the sentence Then, I don't find the description very clear on what the user did wrong, and how they should fix the problem. Do you think you can add a bit more information? |
||
|
||
Examples | ||
-------- | ||
>>> df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'], | ||
>>> df = pd.DataFrame({'A': range(10), 'B': range(10, 0, -1)}, | ||
... index=pd.date_range('1/1/2000', periods=10)) | ||
df.iloc[3:7] = np.nan | ||
>>> df.iloc[3:7] = np.nan | ||
|
||
>>> df.transform(lambda x: (x - x.mean()) / x.std()) | ||
A B C | ||
2000-01-01 0.579457 1.236184 0.123424 | ||
2000-01-02 0.370357 -0.605875 -1.231325 | ||
2000-01-03 1.455756 -0.277446 0.288967 | ||
2000-01-04 NaN NaN NaN | ||
2000-01-05 NaN NaN NaN | ||
2000-01-06 NaN NaN NaN | ||
2000-01-07 NaN NaN NaN | ||
2000-01-08 -0.498658 1.274522 1.642524 | ||
2000-01-09 -0.540524 -1.012676 -0.828968 | ||
2000-01-10 -1.366388 -0.614710 0.005378 | ||
A B | ||
2000-01-01 -1.143001 1.143001 | ||
2000-01-02 -0.889001 0.889001 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this example could be simplified, so users don't need to spend a lot of time understanding what's going on. I think 3 or 4 rows should be enough to illustrate So, I'd do:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, that's a good point, changed. For the Series, I've also shortened it, but kept |
||
2000-01-03 -0.635001 0.635001 | ||
2000-01-04 NaN NaN | ||
2000-01-05 NaN NaN | ||
2000-01-06 NaN NaN | ||
2000-01-07 NaN NaN | ||
2000-01-08 0.635001 -0.635001 | ||
2000-01-09 0.889001 -0.889001 | ||
2000-01-10 1.143001 -1.143001 | ||
|
||
It is only required for the axis specified in the ``axis`` parameter | ||
to have the same length for output and for self. The other axis may have a | ||
different length: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't quite understand what you mean here. We're not specifying the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Code-wise I've tried to word the doc string differently to highlight this better. |
||
|
||
>>> s = pd.Series(range(5)) | ||
>>> s.transform([np.sqrt, np.exp]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd also show |
||
sqrt exp | ||
0 0.000000 1.000000 | ||
1 1.000000 2.718282 | ||
2 1.414214 7.389056 | ||
3 1.732051 20.085537 | ||
4 2.000000 54.598150 | ||
|
||
See also | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
-------- | ||
|
@@ -9401,7 +9423,7 @@ def ewm(self, com=None, span=None, halflife=None, alpha=None, | |
|
||
cls.ewm = ewm | ||
|
||
@Appender(_shared_docs['transform'] % _shared_doc_kwargs) | ||
@Appender(_shared_docs['transform'] % dict(axis="", **_shared_doc_kwargs)) | ||
def transform(self, func, *args, **kwargs): | ||
result = self.agg(func, *args, **kwargs) | ||
if is_scalar(result) or len(result) != len(self): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3098,6 +3098,12 @@ def aggregate(self, func, axis=0, *args, **kwargs): | |
|
||
agg = aggregate | ||
|
||
@Appender(generic._shared_docs['transform'] % _shared_doc_kwargs) | ||
def transform(self, func, axis=0, *args, **kwargs): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm generally not sure its worth changing actual implementation for docstrings. If this is solely to isolate the various Examples I'd think it preferable to just have one shared Example docstring that covers Series and DataFrame rather than making code changes like this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The doc string is currently inherited from NDFrame, but it's not pretty IMO, see here The issues are:
Thoughts? I can remove this if there's not consensus. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd say just use substitution for the class name (i.e. Series or DataFrame) and update the See Also links to point to both the Series and DataFrame methods. One of those will obviously be self referencing, but we've done this in other places as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I see what you mean. I've made a new commit with common examples but the SeeAlso can actually be made correct if Series gets it's own transform method (which is needed for the signature issue to be resolved). |
||
# Validate the axis parameter | ||
self._get_axis_number(axis) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the point of this statement? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This checks that the value passed to axis is 0 or “index”, else an exception is raised. So, a minor check for consistency. |
||
return super(Series, self).transform(func, *args, **kwargs) | ||
|
||
def apply(self, func, convert_dtype=True, args=(), **kwds): | ||
""" | ||
Invoke function on values of Series. Can be ufunc (a NumPy function | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to use only Python types in this row, and be consistent with the format.
func : function, str, list or dict
would be the preferred format (or specifying the list types, likelist of str, list of function
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, but the list can contain bot functions and strings (
[np.exp, 'sqrt']
). Maybefunc : function, string, list of functions and/or strings or dict
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point I'd like to validate that all the types provided in that line are from a subset, to avoid typos and inconsistencies. I understand your point about mixing both, but I'd prefer
function, str, list or dict
(notestr
, the Python type, instead ofstring
), and then provide the details about mixing strings and everything else in the description.