-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue: UTF-8 decoder in libcore #33906
Comments
add core::char::DecodeUtf8 See [issue](#33906)
I’ve submitted #35947 to make this emit errors as specified in Unicode, like |
I think this is the tracking issue for this functionality? DecodeUtf8, like DecodeUtf16, should have a way to recover invalid byte sequences encountered. |
std::str::next_code_point being public is bad; it assumes valid UTF-8 input and libcore needs the agility to use unsafe code in this function if it turns out to be beneficial. |
The initial message of this issue is somewhat misleading now that this is the tracking issue for the It is not anymore about |
I’m inclined to not stabilize this. Now that
If you want to build a It’s tempting to add new APIs to libcore for something like the example above, but there’s a lot of possible variation: returning an https://docs.rs/utf-8/ tries to support all of these use cases (still on top of @strake what do you think? |
… r=alexcrichton Add an example of lossy decoding to str::Utf8Error docs CC rust-lang#33906
The libs team discussed this and the consensus was to deprecate this feature. The use case motivating it can be handled by using @rfcbot fcp close |
Team member @SimonSapin has proposed to close this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
@SimonSapin No objection here — this should be the common use case, and for cases where one truly wants to operate on a single byte at a time from an iterator, the code need not be in libcore. |
The final comment period is now complete. |
Deprecate Read::chars and char::decode_utf8 Per FCP: * rust-lang#27802 (comment) * rust-lang#33906 (comment)
Deprecated in #49970 |
Update (@SimonSapin): this is now the tracking issue for these items in both
core::char
andstd::char
:decode_utf8()
which takes an iterable ofu8
and returnDecodeUtf8
DecodeUtf8
which implementsIterator<Item=Result<char, InvalidSequence>>
InvalidSequence
which is opaqueOriginal issue:
In libcore we have a facility to encode a character to UTF-8, i.e.
char::EncodeUtf8
, but no facility to decode a character from potentially-invalid UTF-8, and return 0xFFFD if it reads an invalid sequence, which seems a surprising omission to me as a libcore user, given in libstd we havestring::String::from_utf8_lossy
.These options came to mind:
str::next_code_point_lossy
or so which behaves asstr::next_code_point
but checks whether its input is valid and returns 0xFFFD if notDecodeUtf8
which one can make from an arbitrary iterator of bytes, which decodes themThe text was updated successfully, but these errors were encountered: