-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add str::{Split,RSplit,SplitN,RSplitN,SplitTerminator,RSplitTerminator,SplitInclusive}::as_str
methods
#75265
Conversation
r? @kennytm (rust_highfive has picked a reviewer for you, use r? to override) |
77f0f05
to
6f119d5
Compare
library/core/src/str/mod.rs
Outdated
fn as_str(&self) -> &'a str { | ||
// `Self::get_end` doesn't change `self.start` | ||
if self.finished { | ||
return ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause surprising result for those who want to calculate ptr offsets of this &str from the whole &str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a use case when you need to do those kinds of tricks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the fix is trivial, but for me, it's not clear if we should provide such guarantees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like the return to be a slice of the provided input.
When parsing a http header, I find myself wanting to see how much is remaining in the buffer to determine if I have all the contents (specified by Content-Length).
Example
let http_response_buffer: &str;
let mut header_lines = http_response_buffer.split("\r\n");
let mut content_length = None;
/* read through header and set content_length, header ends on empty header_line */
/* ... */
let unread_buffer_size = unsafe {
header_lines.as_str().as_ptr().offset_from(response_buffer.as_ptr())
};
if content_length.unwrap() < unread_buffer_size {
panic!("http_response_buffer does not contain response body");
}
This would break if http_response_buffer
was
Content-Length: 100
granted I am using unsafe
but this pattern doesn't seem too unreasonable (I might be biased as it's what I want to do).
Edit
Just realized I could just do header_lines.as_str().len()
. Late night for me, my justification isn't really holding water anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this today and thought that there should be a function (probably in a crate) like
fn substr_bounds(string: &str, sub: &str) -> Option<Range<usize>> { ... }
let string = "aabb";
let sub = &r[1..3];
assert_eq!(substr_bound(string, sub), Some(1..3));
assert_eq!(substr_bound(string, "ab"), None);
It feels natural, but are there any cases when it's useful? Anyway, as said before, for these kinds of things returning an empty substr may be a good idea.
If there is code that may benefit from this, I could make a PR with a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be related to the semantics of raw pointer comparison.
rust-lang/unsafe-code-guidelines#239
r? @dtolnay |
☔ The latest upstream changes (presumably #76325) made this pull request unmergeable. Please resolve the merge conflicts. Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:
|
This commit introduses 2 methods under "str_split_as_str" gate with common signature of `&Split<'a, _> -> &'a str'`. Both of them work like `Chars::as_str` - return unyield part of the inner string.
…r` methods This commit entroduce 4 methods smililar to `Split::as_str` all under the same gate "str_split_as_str".
6f119d5
to
076514c
Compare
This commit entroduces `core::str::SplitInclusive::as_str` method similar to `core::str::Split::as_str`, but under different gate - "str_split_inclusive_as_str" (this is done so because `SplitInclusive` is itself unstable).
Hm. I've fixed the conflict, but is it normal that bors didn't change the labels ( |
@WaffleLapkin not sure why, the change has been reported last week as well, will check it accordingly :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good!
Please make a tracking issue and update the stability attributes in the PR.
@dtolnay done :) |
What's an use case of this as_str method? |
@bors r=dtolnay |
📌 Commit 7bd6403 has been approved by |
…olnay Add `str::{Split,RSplit,SplitN,RSplitN,SplitTerminator,RSplitTerminator,SplitInclusive}::as_str` methods tl;dr this allows viewing unyelded part of str-split-iterators, like so: ```rust let mut split = "Mary had a little lamb".split(' '); assert_eq!(split.as_str(), "Mary had a little lamb"); split.next(); assert_eq!(split.as_str(), "had a little lamb"); split.by_ref().for_each(drop); assert_eq!(split.as_str(), ""); ``` -------------- This PR adds semi-identical `as_str` methods to most str-split-iterators with signatures like `&'_ Split<'a, P: Pattern<'a>> -> &'a str` (Note: output `&str` lifetime is bound to the `'a`, not the `'_`). The methods are similar to [`Chars::as_str`] `SplitInclusive::as_str` is under `"str_split_inclusive_as_str"` feature gate, all other methods are under `"str_split_as_str"` feature gate. Before this PR you had to sum `len`s of all yielded parts or collect into `String` to emulate `as_str`. [`Chars::as_str`]: https://doc.rust-lang.org/core/str/struct.Chars.html#method.as_str
Rollup of 14 pull requests Successful merges: - rust-lang#75023 (ensure arguments are included in count mismatch span) - rust-lang#75265 (Add `str::{Split,RSplit,SplitN,RSplitN,SplitTerminator,RSplitTerminator,SplitInclusive}::as_str` methods) - rust-lang#75675 (mangling: mangle impl params w/ v0 scheme) - rust-lang#76084 (Refactor io/buffered.rs into submodules) - rust-lang#76119 (Stabilize move_ref_pattern) - rust-lang#77493 (ICEs should always print the top of the query stack) - rust-lang#77619 (Use futex-based thread-parker for Wasm32.) - rust-lang#77646 (For backtrace, use StaticMutex instead of a raw sys Mutex.) - rust-lang#77648 (Static mutex is static) - rust-lang#77657 (Cleanup cloudabi mutexes and condvars) - rust-lang#77672 (Simplify doc-cfg rendering based on the current context) - rust-lang#77780 (rustc_parse: fix spans on cast and range exprs with attrs) - rust-lang#77935 (BTreeMap: make PartialCmp/PartialEq explicit and tested) - rust-lang#77980 (Fix intra doc link for needs_drop) Failed merges: r? `@ghost`
@leonardo-m I needed this feature while implementing very simple argument parser. Let's imagine that I want to parse the following command: So in |
Would it be possible to also implement this for SplitWhitespace? |
@crumblingstatue this should be possible, but it's not trivial. |
…r, r=m-ou-se Add `as_str` method for split whitespace str iterators This PR adds `as_str` methods to `SplitWhitespace` and `SplitAsciiWhitespace` str iterators. The methods return the remainder, similar to `as_str` methods on `Chars` and other split iterators. This PR is a continuation of rust-lang#75265, which added `as_str` for all other str split iterators. The feature gate for new methods is `#![feature(str_split_whitespace_as_str)]`. `SplitWhitespace` and `SplitAsciiWhitespace` use iterators under the hood, so to implement `as_str` it's required to either 1. Make fields of some iterators `pub(crate)` 2. Add getter methods (like `into_inner`, `inner`, `inner_mut`...) to some (all) iterators 3. Completely rewrite `SplitWhitespace` and `SplitAsciiWhitespace` This PR uses the 1. approach since it's easier to implement and requires fewer changes (and no changes to the public API). If you think that's not the right way, please, tell me. r? `@m-ou-se`
tl;dr this allows viewing unyelded part of str-split-iterators, like so:
This PR adds semi-identical
as_str
methods to most str-split-iterators with signatures like&'_ Split<'a, P: Pattern<'a>> -> &'a str
(Note: output&str
lifetime is bound to the'a
, not the'_
). The methods are similar toChars::as_str
SplitInclusive::as_str
is under"str_split_inclusive_as_str"
feature gate, all other methods are under"str_split_as_str"
feature gate.Before this PR you had to sum
len
s of all yielded parts or collect intoString
to emulateas_str
.