Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement feature string_from_utf8_lossy_owned for lossy conversion from Vec<u8> to String methods #129439

Merged
merged 1 commit into from
Sep 15, 2024

Conversation

okaneco
Copy link
Contributor

@okaneco okaneco commented Aug 23, 2024

Accepted ACP: rust-lang/libs-team#116
Tracking issue: #129436

Implement feature for lossily converting from Vec<u8> to String

  • Add String::from_utf8_lossy_owned
  • Add FromUtf8Error::into_utf8_lossy

Related to #64727, but unsure whether to mark it "fixed" by this PR.
That issue partly asks for in-place replacement of the original allocation. We fulfill the other half of that request with these functions.

closes #64727

@rustbot
Copy link
Collaborator

rustbot commented Aug 23, 2024

r? @Noratrieb

rustbot has assigned @Noratrieb.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 23, 2024
@rust-log-analyzer

This comment has been minimized.

Implement feature for lossily converting from `Vec<u8>` to `String`
- Add `String::from_utf8_lossy_owned`
- Add `FromUtf8Error::into_utf8_lossy`
/// #![feature(string_from_utf8_lossy_owned)]
/// // some invalid bytes
/// let input: Vec<u8> = b"Hello \xF0\x90\x80World".into();
/// let output = String::from_utf8(input).unwrap_or_else(|e| e.into_utf8_lossy());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example isn't great, since it's just String::from_utf8_lossy_owned(input) but with more characters. I can't think of any better one though, so it's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

After some thinking, something more illustrative could be matching on the result of String::from_utf8

// some invalid bytes
let input: Vec<u8> = b"Hello \xF0\x90\x80World".into();

let output = match String::from_utf8(input) {
    Ok(s) => s,
    Err(e) => {
        // Log invalid UTF-8 error
        // ...
        
        // Lossily convert the bytes to a `String`
        e.into_utf8_lossy()
    }
};

assert_eq!(String::from("Hello �World"), output);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds better

@Noratrieb
Copy link
Member

sorry for the late review, but thanks for the patience and implementation!
@bors r+ rollup

@bors
Copy link
Contributor

bors commented Sep 15, 2024

📌 Commit 65abcc2 has been approved by Noratrieb

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 15, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 15, 2024
…iaskrgr

Rollup of 5 pull requests

Successful merges:

 - rust-lang#129439 (Implement feature `string_from_utf8_lossy_owned` for lossy conversion from `Vec<u8>` to `String` methods)
 - rust-lang#129828 (miri: treat non-memory local variables properly for data race detection)
 - rust-lang#130110 (make dist vendoring configurable)
 - rust-lang#130293 (Fix lint levels not getting overridden by attrs on `Stmt` nodes)
 - rust-lang#130342 (interpret, miri: fix dealing with overflow during slice indexing and allocation)

Failed merges:

 - rust-lang#130394 (const: don't ICE when encountering a mutable ref to immutable memory)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit df3cf91 into rust-lang:master Sep 15, 2024
6 checks passed
@rustbot rustbot added this to the 1.83.0 milestone Sep 15, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Sep 15, 2024
Rollup merge of rust-lang#129439 - okaneco:vec_string_lossy, r=Noratrieb

Implement feature `string_from_utf8_lossy_owned` for lossy conversion from `Vec<u8>` to `String` methods

Accepted ACP: rust-lang/libs-team#116
Tracking issue: rust-lang#129436

Implement feature for lossily converting from `Vec<u8>` to `String`
- Add `String::from_utf8_lossy_owned`
- Add `FromUtf8Error::into_utf8_lossy`

---
Related to rust-lang#64727, but unsure whether to mark it "fixed" by this PR.
That issue partly asks for in-place replacement of the original allocation. We fulfill the other half of that request with these functions.

closes rust-lang#64727
@okaneco okaneco deleted the vec_string_lossy branch September 15, 2024 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request: Lossy way to move Vec<u8> into a String
5 participants