Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update the pandas.Series.str.repeat docstring #20634

Closed

Conversation

manojpandey
Copy link

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant
################################################################################
##################### Docstring (pandas.Series.str.repeat) #####################
################################################################################

Duplicate each string repeated by indicated number of times.

Parameters
----------
repeats : int or array
    Same value for all (int) or different value per (array).

Returns
-------
repeated : Series/Index of objects
    Same type as the original object

Examples
--------
>>> s = pd.Series(['a', 'b', 'c', 'd', 'e'])

Using same value for all:

>>> s.str.repeat(4)
0    aaaa
1    bbbb
2    cccc
3    dddd
4    eeee
dtype: object

Using different value per element:

>>> s.str.repeat([3, 2, 5, 1, 4])
0      aaa
1       bb
2    ccccc
3        d
4     eeee
dtype: object

Passing zero or negative integer will return an empty string

>>> s.str.repeat([0, 0, -2, -1, 0])
0
1
2
3
4
dtype: object

Notes
--------
A passed value of zero or negative integer will return an empty string.

See also
--------
numpy.ndarray.repeat: Repeat elements of an array.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	No extended summary found

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@codecov
Copy link

codecov bot commented Apr 8, 2018

Codecov Report

Merging #20634 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20634      +/-   ##
==========================================
- Coverage   92.04%   92.04%   -0.01%     
==========================================
  Files         169      169              
  Lines       50787    50787              
==========================================
- Hits        46746    46745       -1     
- Misses       4041     4042       +1
Flag Coverage Δ
#multiple 90.45% <ø> (-0.01%) ⬇️
#single 42.29% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/strings.py 98.63% <ø> (ø) ⬆️
pandas/util/_depr_module.py 65.11% <0%> (-2.33%) ⬇️
pandas/core/indexes/multi.py 95.41% <0%> (ø) ⬆️
pandas/core/frame.py 97.2% <0%> (ø) ⬆️
pandas/core/indexes/base.py 96.45% <0%> (ø) ⬆️
pandas/core/resample.py 96.13% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3285bdc...1f6204a. Read the comment docs.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added some comments.


See also
--------
numpy.ndarray.repeat: Repeat elements of an array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes and See Also goes before examples. But in this case, I personally wouldn't have any. The comment in the notes could go in the extended summary, as well as an explanation that the value can be a literal or a list of the same size of the elements.

For the see also, I don't think ndarray.repeat is actually related, even if they repeat things. The user cases are unrelated. May be we could have str.len as it's slightly related. In that case See Also should have a capital A, and also should be placed before the examples.

2
3
4
dtype: object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this example could be merged with the previous, if you have a list with the positives and negative together (e.g. s.str.repeat([-3, 0, 1, 3, 5]))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - I'll merge with the previous example 👍

@@ -594,17 +594,59 @@ def str_replace(arr, pat, repl, n=-1, case=None, flags=0, regex=True):

def str_repeat(arr, repeats):
"""
Duplicate each string in the Series/Index by indicated number
of times.
Duplicate each string repeated by indicated number of times.

Parameters
----------
repeats : int or array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int or array-like is more standard


Parameters
----------
repeats : int or array
Same value for all (int) or different value per (array)
Same value for all (int) or different value per (array).

Returns
-------
repeated : Series/Index of objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of repeated and simply leave Series or Index.

@datapythonista datapythonista added the Strings String extension data type and string data label Jul 29, 2018
@datapythonista
Copy link
Member

@manojpandey do you have time to make the changes based on the review?

@manojpandey
Copy link
Author

Hey @datapythonista - surely. I totally missed this. Thanks for reminding :) Let me make the changes!

@datapythonista
Copy link
Member

@manojpandey If you have time now that PyData Delhi is over, that would be great. :) Otherwise I'll try to find time to make the fixes and merge myself.

@manojpandey
Copy link
Author

Yes, I'm updating this. Thanks for understanding. I should have done it wayy before :(

Let me send the PR this Mon/Tue!

@manojpandey
Copy link
Author

I just pushed all the changes requested. Please have a look when you get a moment. Thanks, and apologies again for the delay.

@datapythonista
Copy link
Member

Thanks for the update @manojpandey , seems like @JesperDramsch has been working on the same changes in #22571. Can the two of you discuss which PR makes more sense to keep? Sorry I've been encouraging too much to work on the Series.str methods, and we ended up with overlapping work.

@JesperDramsch
Copy link
Contributor

Haha, the sprint-prophecy has been fulfilled.

@manojpandey manojpandey force-pushed the docstring_str_repeat branch 2 times, most recently from b023d7a to 6ab42c6 Compare September 2, 2018 19:13
@manojpandey
Copy link
Author

Hi, I feel that this current PR has better examples than those of #22571 - and if no (or minimum) changes are required, can we move forward to merge this and close it? @datapythonista

@JesperDramsch What are your thoughts?

Thanks!

@datapythonista
Copy link
Member

I think both have some things that are better, whoever feels like integrating the best of both, and addressing any additional feedback is welcome to do so. :)

@JesperDramsch
Copy link
Contributor

Hej @manojpandey, I think it's great you document the effect of repeat values <1. I wonder why you think these are significantly better, but personally, I don't have too many stakes in my PR, so it's fine if we close mine and you update yours.

Things I see that should be done here are:

  • Mention the keyword in the call
  • Update the Returns string
  • Remove the word array and replace it with sequence to make it package agnostic

You have been working at this much longer, so feel free to take it to the end.

@manojpandey manojpandey force-pushed the docstring_str_repeat branch 3 times, most recently from e4c25c5 to a924e45 Compare September 4, 2018 10:01
@manojpandey
Copy link
Author

@JesperDramsch Thanks for your feedback on this, really appreciated - and I've incorporated the changes. I updated the Returns string part from your PR, thanks for that!

@datapythonista can you see if everything is alright now?

@pep8speaks
Copy link

Hello @manojpandey! Thanks for updating the PR.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.

of times.
Duplicate each string repeated by indicated number of times.

Duplicate each string in the Series/Index by indicated number of times.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant, can we get rid of it?


Parameters
----------
repeats : int or array
Same value for all (int) or different value per (array)
repeats : int or array-like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use sequence instead of array-like (we use array-like to refer to objects that follow numpy ndarray api)


Examples
--------
>>> s = pd.Series(['a', 'b', 'c', 'd', 'e'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having 5 cases that illustrate exactly the same is not very useful. Can we have:

  • A single letter (may be two or three, for the next example to show different number of repetitions)
  • A word
  • A NaN
  • A number

Can you also show the value of s, so it's easier to compare with the result of str.repeat

@manojpandey
Copy link
Author

Sure, updating with the changes requested.

@datapythonista
Copy link
Member

Closing in favor of #22571

Thanks @manojpandey for the work on this. Feel free to make improvements to the docstring in a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants