Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some of [T]’s methods to strings and vice versa #1152

Merged
merged 2 commits into from
Jul 1, 2015

Conversation

ftxqxd
Copy link
Contributor

@ftxqxd ftxqxd commented Jun 6, 2015

Add some methods that already exist on slices to strings and vice versa. Specifically, the following methods should be added:

  • str::chunks
  • str::windows
  • str::into_string
  • String::into_boxed_slice
  • <[T]>::subslice_offset

Rendered

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jun 6, 2015
@Kimundi
Copy link
Member

Kimundi commented Jun 6, 2015

Is the reason for adding <[T]>::subslice_offset just symmetry? I'm asking because I'm not sure if that function is actually used by anyone.

@ftxqxd
Copy link
Contributor Author

ftxqxd commented Jun 6, 2015

@Kimundi I’ve wanted to use it in the past, but yeah, it’s mainly just to match the str API. I wouldn’t really care if subslice_offset got removed from str (it’s not very useful), but I’d certainly prefer to either have it for both types or neither, rather than just one.

@Gankra
Copy link
Contributor

Gankra commented Jun 6, 2015

Is there really a usecase for str::chunks or windows?

@gkoz
Copy link

gkoz commented Jun 6, 2015

Out of curiosity, what is Box<str> for? I haven't encountered any mentions of this type before.

@Gankra
Copy link
Contributor

Gankra commented Jun 6, 2015

@gkoz In principle it's just a runtime-constructed &'static str (string litteral). one usize thinner than a String, but effectively immutable as a result.

@ftxqxd
Copy link
Contributor Author

ftxqxd commented Jun 6, 2015

@gankro str::windows could be used to calculate the Sørensen–Dice coefficient of two strings, and str::chunks has been asked for on IRC a few times.

@huonw
Copy link
Member

huonw commented Jun 6, 2015

Hm, I suspect windows and chunks will encourage bad Unicode practice without a huge amount of benefit other than symmetry. Specifically they will split between combining characters.

Add the following methods to `str`, presumably as inherent methods:

- `chunks(&self, n: usize) -> Chunks`: Returns an iterator that yields the
*characters* (not bytes) of the string in groups of `n` at a time. Iterator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Character” is kinda ambiguous. (Unicode has four different definitions.) Maybe use chars instead?

@krdln
Copy link
Contributor

krdln commented Jun 9, 2015

+1 to functions involving Box<str>. I was surprised they weren't implemented. And it's good to lose those 4/8 bytes sometimes (especially when within an enum).

-1 to windows and chunks. I second @huonw that they encourage bad practices. I think they belong in unicode crate with codepoint_ and grapheme_ variants.

@krdln
Copy link
Contributor

krdln commented Jun 10, 2015

I just stumbled upon the lack of str::shift_char() counterpart for a slice. I guess it would also be worth adding.

@ftxqxd
Copy link
Contributor Author

ftxqxd commented Jun 10, 2015

I just stumbled upon the lack of str::shift_char() counterpart for a slice. I guess it would also be worth adding.

@krdln #1058 already adds that, I believe.

@comex
Copy link

comex commented Jun 11, 2015

May be out of scope, but I'd like to see some form of find, rfind, replace, and other pattern-based methods added to [T] or Vec<T>. This would be useful, for example, when dealing with a &[u8] which represents a "string" that may not be valid UTF-8 (such as the contents of a Unix OsStr) - or just random binary data, for that matter.

@Gankra Gankra self-assigned this Jun 11, 2015
@Gankra
Copy link
Contributor

Gankra commented Jun 11, 2015

I concur with @huonw on the dubiousness of windows and chunks in the face of combining characters. It's not clear to me how the Sørensen–Dice coefficient is even supposed to interact with utf8 in that regard.

👍 on the Box conversions. Makes sense.

I'm completely indifferent on subslice_offset

@alexcrichton alexcrichton added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Jun 16, 2015
@alexcrichton
Copy link
Member

This RFC is now entering its final comment period.


- `subslice_offset(&self, inner: &[T]) -> usize`: Returns the offset (in
elements) of an inner slice relative to an outer slice. Panics of `inner` is
not contained within `self`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally prefer to not add this method to slices just yet, I suspect it won't survive stabilization of strings.

@alexcrichton
Copy link
Member

I also agree with @huonw about being hesitant to add windows/chunks to strings due to unicode fun stuff.

- `windows(&self, n: usize) -> Windows`: Returns an iterator over all contiguous
windows of character length `n`. Iterator element type: `&str`.

- `into_string(self: Box<str>) -> String`: Returns `self` as a `String`. This is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfackler
Copy link
Member

The Box<str> methods seem good to me but I share the unicode weirdness concerns above with respect to chunks and windows and am not sold on the utility of subslice_offset.

@Gankra
Copy link
Contributor

Gankra commented Jun 22, 2015

@P1start the libs team seems pretty unanimous about Box<str> is great, everything else is not. Is this an acceptable result of this RFC, for you?

@ftxqxd
Copy link
Contributor Author

ftxqxd commented Jun 22, 2015

@gankro Yep, that’s fine by me. I’ll update the RFC soon to include just those methods (and rename into_boxed_slice to into_boxed_str unless anyone has any objections).

@ftxqxd
Copy link
Contributor Author

ftxqxd commented Jun 27, 2015

RFC updated, removing str::windows, str::chunks, and <[T]>::subslice_offset.

@alexcrichton
Copy link
Member

The consensus of the libs team is to merge this RFC, so I will do so. Thanks again @P1start!

@alexcrichton alexcrichton merged commit 566bb70 into rust-lang:master Jul 1, 2015
@aturon
Copy link
Member

aturon commented Jul 1, 2015

Tracking issue.

bluss added a commit to bluss/rust that referenced this pull request Aug 13, 2015
This is the name that was decided in rust-lang/rfcs#1152, and it's
better if we say “boxed str” for `Box<str>`.

The old name `String::into_boxed_slice` is deprecated.
bors added a commit to rust-lang/rust that referenced this pull request Aug 13, 2015
Rename String::into_boxed_slice -> into_boxed_str

This is the name that was decided in rust-lang/rfcs#1152, and it's
better if we say “boxed str” for `Box<str>`.

The old name `String::into_boxed_slice` is deprecated.
bors added a commit to rust-lang/rust that referenced this pull request Aug 14, 2015
Rename String::into_boxed_slice -> into_boxed_str

This is the name that was decided in rust-lang/rfcs#1152, and it's
better if we say “boxed str” for `Box<str>`.

The old name `String::into_boxed_slice` is deprecated.
@Centril Centril added the A-slice Slice related proposals & ideas label Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-slice Slice related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.