-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update the pandas.Series.str.repeat docstring #20634
Conversation
Codecov Report
@@ Coverage Diff @@
## master #20634 +/- ##
==========================================
- Coverage 92.04% 92.04% -0.01%
==========================================
Files 169 169
Lines 50787 50787
==========================================
- Hits 46746 46745 -1
- Misses 4041 4042 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, added some comments.
pandas/core/strings.py
Outdated
|
||
See also | ||
-------- | ||
numpy.ndarray.repeat: Repeat elements of an array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes
and See Also
goes before examples. But in this case, I personally wouldn't have any. The comment in the notes could go in the extended summary, as well as an explanation that the value can be a literal or a list of the same size of the elements.
For the see also, I don't think ndarray.repeat
is actually related, even if they repeat things. The user cases are unrelated. May be we could have str.len
as it's slightly related. In that case See Also
should have a capital A
, and also should be placed before the examples.
pandas/core/strings.py
Outdated
2 | ||
3 | ||
4 | ||
dtype: object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this example could be merged with the previous, if you have a list with the positives and negative together (e.g. s.str.repeat([-3, 0, 1, 3, 5])
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - I'll merge with the previous example 👍
pandas/core/strings.py
Outdated
@@ -594,17 +594,59 @@ def str_replace(arr, pat, repl, n=-1, case=None, flags=0, regex=True): | |||
|
|||
def str_repeat(arr, repeats): | |||
""" | |||
Duplicate each string in the Series/Index by indicated number | |||
of times. | |||
Duplicate each string repeated by indicated number of times. | |||
|
|||
Parameters | |||
---------- | |||
repeats : int or array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int or array-like
is more standard
pandas/core/strings.py
Outdated
|
||
Parameters | ||
---------- | ||
repeats : int or array | ||
Same value for all (int) or different value per (array) | ||
Same value for all (int) or different value per (array). | ||
|
||
Returns | ||
------- | ||
repeated : Series/Index of objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of repeated
and simply leave Series or Index
.
@manojpandey do you have time to make the changes based on the review? |
Hey @datapythonista - surely. I totally missed this. Thanks for reminding :) Let me make the changes! |
@manojpandey If you have time now that PyData Delhi is over, that would be great. :) Otherwise I'll try to find time to make the fixes and merge myself. |
Yes, I'm updating this. Thanks for understanding. I should have done it wayy before :( Let me send the PR this Mon/Tue! |
8b9fdbe
to
b023d7a
Compare
I just pushed all the changes requested. Please have a look when you get a moment. Thanks, and apologies again for the delay. |
Thanks for the update @manojpandey , seems like @JesperDramsch has been working on the same changes in #22571. Can the two of you discuss which PR makes more sense to keep? Sorry I've been encouraging too much to work on the |
Haha, the sprint-prophecy has been fulfilled. |
b023d7a
to
6ab42c6
Compare
Hi, I feel that this current PR has better examples than those of #22571 - and if no (or minimum) changes are required, can we move forward to merge this and close it? @datapythonista @JesperDramsch What are your thoughts? Thanks! |
I think both have some things that are better, whoever feels like integrating the best of both, and addressing any additional feedback is welcome to do so. :) |
Hej @manojpandey, I think it's great you document the effect of repeat values <1. I wonder why you think these are significantly better, but personally, I don't have too many stakes in my PR, so it's fine if we close mine and you update yours. Things I see that should be done here are:
You have been working at this much longer, so feel free to take it to the end. |
e4c25c5
to
a924e45
Compare
@JesperDramsch Thanks for your feedback on this, really appreciated - and I've incorporated the changes. I updated the @datapythonista can you see if everything is alright now? |
a924e45
to
1f6204a
Compare
Hello @manojpandey! Thanks for updating the PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments.
of times. | ||
Duplicate each string repeated by indicated number of times. | ||
|
||
Duplicate each string in the Series/Index by indicated number of times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems redundant, can we get rid of it?
|
||
Parameters | ||
---------- | ||
repeats : int or array | ||
Same value for all (int) or different value per (array) | ||
repeats : int or array-like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use sequence
instead of array-like
(we use array-like
to refer to objects that follow numpy ndarray api)
|
||
Examples | ||
-------- | ||
>>> s = pd.Series(['a', 'b', 'c', 'd', 'e']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having 5 cases that illustrate exactly the same is not very useful. Can we have:
- A single letter (may be two or three, for the next example to show different number of repetitions)
- A word
- A
NaN
- A number
Can you also show the value of s
, so it's easier to compare with the result of str.repeat
Sure, updating with the changes requested. |
Closing in favor of #22571 Thanks @manojpandey for the work on this. Feel free to make improvements to the docstring in a new PR. |
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.
Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):
git diff upstream/master -u -- "*.py" | flake8 --diff