Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add str pointer methods #85816

Closed
wants to merge 4 commits into from

Conversation

WaffleLapkin
Copy link
Member

This PR adds the following functions for pointers to str, similar to already existing functions for pointers to [T].

impl *const str {
    pub const fn len(self) -> usize;
    pub const fn as_ptr(self) -> *const u8;
    pub unsafe fn get_unchecked<I>(self, index: I) -> *const I::Output
    where
        I: SliceIndex<str>;
}

impl *mut str {
    pub const fn len(self) -> usize;
    pub const fn as_mut_ptr(self) -> *mut u8;
    pub unsafe fn get_unchecked_mut<I>(self, index: I) -> *mut I::Output
    where
        I: SliceIndex<str>;
}

// mod ptr
pub const fn str_from_raw_parts(data: *const u8, len: usize) -> *const str;
pub const fn str_from_raw_parts_mut(data: *mut u8, len: usize) -> *mut str;

impl NonNull<str> {
    pub const fn str_from_raw_parts(data: NonNull<u8>, len: usize) -> Self;
    pub const fn len(self) -> usize;
    pub const fn as_non_null_ptr(self) -> NonNull<u8>;
    pub const fn as_mut_ptr(self) -> *mut u8;
    pub unsafe fn get_unchecked_mut<I>(self, index: I) -> NonNull<I::Output>
    where
        I: SliceIndex<str>;
}

I wasn't sure about feature gates and how to group them, so the advice is very appreciated.


This PR also adds const_str_ptr and mut_str_ptr lang items (they are required for *const str and *mut str impls).


See also:

@rust-highfive
Copy link
Collaborator

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 29, 2021
@jfrimmel
Copy link
Contributor

I'm unsure about ptr.as_ptr(), as we already have a pointer. Maybe use as_bytes() instead?

@WaffleLapkin
Copy link
Member Author

@jfager I would expect that as_bytes returns some sort of byte slice, e.g. it's *const str -> *const [u8], I don't think it's a good name for a *const str -> *const u8 function. Maybe as_ptr isn't a great name either, but it's consistent with <*const [T]>::as_ptr.

@jfager
Copy link
Contributor

jfager commented May 30, 2021

@WaffleLapkin looks like you tagged the wrong person!

@WaffleLapkin
Copy link
Member Author

WaffleLapkin commented May 30, 2021

Oops, right. I've meant to tag @jfrimmel 😅

@m-ou-se
Copy link
Member

m-ou-se commented May 31, 2021

The len functions look fine to me. For those, you can just use the same tracking issue as for [T]: #71146

The as_[mut_]ptr and get_unchecked ones also look fine. Feel free to just re-use #74265.

But I'm not sure about the str_from_raw_parts, because we don't have a &str variant (yet?). Right now, you'd have to use slice::from_raw_parts and str::from_utf8_unchecked to make a &str from raw parts. If we have a str_from_raw_parts for *const str, I'm worried that might be confusing, since users searching for a &str will only find the *const str version. Is this ptr::str_from_raw_parts that useful? Should we leave it out? Or maybe we need one for &str too? (Part of the reason it's good it needs two steps, is that it makes it clear there's two unsafe assumptions: One about the pointer+len being valid, and one about the UTF-8 encoding being valid.)

@m-ou-se m-ou-se added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 31, 2021
@WaffleLapkin
Copy link
Member Author

@m-ou-se my original motivation for ptr::str_from_raw_parts was that it seemed weird to have methods on *const str without a convenient way to create such pointers. &str on the other hand can be easily created, e.g. str::from_utf8(slice::from_raw_parts(ptr, len))?. Though for symmetry and convenience str::from_raw_parts seems nice.

Part of the reason it's good it needs two steps, is that it makes it clear there's two unsafe assumptions: One about the pointer+len being valid, and one about the UTF-8 encoding being valid.

Then maybe instead of ptr::str_from_raw_parts we could have str::from_raw_utf8(*const [u8]) -> *const str.


Not sure which way is better.

@joelpalmer joelpalmer added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 15, 2021
@crlf0710
Copy link
Member

crlf0710 commented Jul 4, 2021

@WaffleLapkin Ping from triage, what's next steps here?

@m-ou-se
Copy link
Member

m-ou-se commented Jul 5, 2021

Then maybe instead of ptr::str_from_raw_parts we could have str::from_raw_utf8(*const [u8]) -> *const str.

Yeah maybe. I'd still put that in ptr:: instead of str:: though. Most users of str aren't interested in raw pointers, so it's probably good to keep all the raw pointer functions inside the ptr module.

cc @rust-lang/libs-api Any thoughts on these functions?

@dtolnay
Copy link
Member

dtolnay commented Jul 5, 2021

I don't see why the safety requirement of get_unchecked involves utf8 boundaries. I would expect only the same safety requirements as https://doc.rust-lang.org/std/primitive.pointer.html#method.add.

Please document safety requirements in a # Safety section.

Can we have safe indexing that works like https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_add ?

@WaffleLapkin
Copy link
Member Author

I don't see why the safety requirement of get_unchecked involves utf8 boundaries. I would expect only the same safety requirements as https://doc.rust-lang.org/std/primitive.pointer.html#method.add.

Hm. So *const str doesn't have utf8 guarantees, and it's only unsafe to create non-utf8 &str?

@WaffleLapkin
Copy link
Member Author

So, to summarize the state of this PR. ​

  1. len, as_ptr, as_mut_ptr, as_non_null_ptr and get_unchecked are considered ok, I just need to fill tracking issues.
  2. str_from_raw_parts{_mut} is more controversial, since it may be confusing to have a function *const u8, usize -> *const str, but not the -> &str. There are a couple ways to 'fix' this, e.g.:
    • Add a ptr::str_from_utf8(*const [u8]) -> *const str function instead. This way we reuse slice ptr functions (like ptr::slice_from_raw_parts) and split safety into 2 parts: ptr validity and utf8 validity.
    • Add a -> &str counterpart too, like str::from_raw_parts (similar to slice::from_raw_parts)

I'm waiting for the decision on str_from_raw_parts{_mut} to fix everything and one go.

@bors
Copy link
Contributor

bors commented Jul 28, 2021

☔ The latest upstream changes (presumably #85769) made this pull request unmergeable. Please resolve the merge conflicts.

@m-ou-se m-ou-se added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 28, 2021
@JohnCSimon
Copy link
Member

Triage: Still has conflicts

@bors
Copy link
Contributor

bors commented Aug 18, 2021

☔ The latest upstream changes (presumably #86808) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented Aug 25, 2021

☔ The latest upstream changes (presumably #87875) made this pull request unmergeable. Please resolve the merge conflicts.

The functions are under feature gate `str_from_raw_parts` and are similar to
`slice_from_raw_parts`, `slice_from_raw_parts_mut`.
@rust-log-analyzer

This comment has been minimized.

These items allow to make inherent impls for `*const str` and `*mut str`.
This patch adds the following methods to `*const str` and `*mut str`:
- `len`
- `as_ptr` (`as_mut_ptr`)
- `get_unchecked` (`get_unchecked_mut`)

Similar methods have already existed for raw slices.
This patch adds the following methods to `NonNull<str>`:
- `str_from_raw_parts`
- `len`
- `as_non_null_ptr`
- `as_mut_ptr`
- `get_unchecked_mut`

Similar methods have already existed for raw slices, raw strings and nonnull
raw strings.
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 20, 2021
@camelid camelid added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 8, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 24, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 16, 2021
@bors
Copy link
Contributor

bors commented Nov 25, 2021

☔ The latest upstream changes (presumably #91203) made this pull request unmergeable. Please resolve the merge conflicts.

@WaffleLapkin
Copy link
Member Author

Closing in favour of #91216 and #91218

@WaffleLapkin WaffleLapkin deleted the str_ptr_len_get branch December 21, 2021 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.