Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement feature string_from_utf8_lossy_owned for lossy conversion from Vec<u8> to String methods #129439

Merged
merged 1 commit into from
Sep 15, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions library/alloc/src/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -662,6 +662,56 @@ impl String {
Cow::Owned(res)
}

/// Converts a [`Vec<u8>`] to a `String`, substituting invalid UTF-8
/// sequences with replacement characters.
///
/// See [`from_utf8_lossy`] for more details.
///
/// [`from_utf8_lossy`]: String::from_utf8_lossy
///
/// Note that this function does not guarantee reuse of the original `Vec`
/// allocation.
///
/// # Examples
///
/// Basic usage:
///
/// ```
/// #![feature(string_from_utf8_lossy_owned)]
/// // some bytes, in a vector
/// let sparkle_heart = vec![240, 159, 146, 150];
///
/// let sparkle_heart = String::from_utf8_lossy_owned(sparkle_heart);
///
/// assert_eq!(String::from("💖"), sparkle_heart);
/// ```
///
/// Incorrect bytes:
///
/// ```
/// #![feature(string_from_utf8_lossy_owned)]
/// // some invalid bytes
/// let input: Vec<u8> = b"Hello \xF0\x90\x80World".into();
/// let output = String::from_utf8_lossy_owned(input);
///
/// assert_eq!(String::from("Hello �World"), output);
/// ```
#[must_use]
#[cfg(not(no_global_oom_handling))]
#[unstable(feature = "string_from_utf8_lossy_owned", issue = "129436")]
pub fn from_utf8_lossy_owned(v: Vec<u8>) -> String {
if let Cow::Owned(string) = String::from_utf8_lossy(&v) {
string
} else {
// SAFETY: `String::from_utf8_lossy`'s contract ensures that if
// it returns a `Cow::Borrowed`, it is a valid UTF-8 string.
// Otherwise, it returns a new allocation of an owned `String`, with
// replacement characters for invalid sequences, which is returned
// above.
unsafe { String::from_utf8_unchecked(v) }
}
}

/// Decode a UTF-16–encoded vector `v` into a `String`, returning [`Err`]
/// if `v` contains any invalid data.
///
Expand Down Expand Up @@ -2012,6 +2062,30 @@ impl FromUtf8Error {
&self.bytes[..]
}

/// Converts the bytes into a `String` lossily, substituting invalid UTF-8
/// sequences with replacement characters.
///
/// See [`String::from_utf8_lossy`] for more details on replacement of
/// invalid sequences, and [`String::from_utf8_lossy_owned`] for the
/// `String` function which corresponds to this function.
///
/// # Examples
///
/// ```
/// #![feature(string_from_utf8_lossy_owned)]
/// // some invalid bytes
/// let input: Vec<u8> = b"Hello \xF0\x90\x80World".into();
/// let output = String::from_utf8(input).unwrap_or_else(|e| e.into_utf8_lossy());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example isn't great, since it's just String::from_utf8_lossy_owned(input) but with more characters. I can't think of any better one though, so it's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

After some thinking, something more illustrative could be matching on the result of String::from_utf8

// some invalid bytes
let input: Vec<u8> = b"Hello \xF0\x90\x80World".into();

let output = match String::from_utf8(input) {
    Ok(s) => s,
    Err(e) => {
        // Log invalid UTF-8 error
        // ...
        
        // Lossily convert the bytes to a `String`
        e.into_utf8_lossy()
    }
};

assert_eq!(String::from("Hello �World"), output);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds better

///
/// assert_eq!(String::from("Hello �World"), output);
/// ```
#[must_use]
#[cfg(not(no_global_oom_handling))]
#[unstable(feature = "string_from_utf8_lossy_owned", issue = "129436")]
pub fn into_utf8_lossy(self) -> String {
String::from_utf8_lossy_owned(self.bytes)
}

/// Returns the bytes that were attempted to convert to a `String`.
///
/// This method is carefully constructed to avoid allocation. It will
Expand Down
Loading