Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a display_lossy() to write a JsString lossily #4023

Merged
merged 2 commits into from
Oct 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions core/string/src/display.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,59 @@ impl<'a> From<JsStr<'a>> for JsStrDisplayEscaped<'a> {
}
}

/// Display implementation for [`crate::JsString`] that escapes unicode characters.
#[derive(Debug)]
pub struct JsStrDisplayLossy<'a> {
inner: JsStr<'a>,
}

impl fmt::Display for JsStrDisplayLossy<'_> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
// No need to optimize latin1.
self.inner
.code_points_lossy()
.try_for_each(|c| f.write_char(c))
}
}

impl<'a> From<JsStr<'a>> for JsStrDisplayLossy<'a> {
fn from(inner: JsStr<'a>) -> Self {
Self { inner }
}
}

#[test]
fn latin1() {
// 0xE9 is `é` in ISO-8859-1 (see https://www.ascii-code.com/ISO-8859-1).
let s = JsStr::latin1(b"Hello \xE9 world!");

let rust_str = format!("{}", JsStrDisplayEscaped { inner: s });
assert_eq!(rust_str, "Hello é world!");

let rust_str = format!("{}", JsStrDisplayLossy { inner: s });
assert_eq!(rust_str, "Hello é world!");
}

#[test]
fn emoji() {
// 0x1F600 is `😀` (see https://www.fileformat.info/info/unicode/char/1f600/index.htm).
let s = JsStr::utf16(&[0xD83D, 0xDE00]);

let rust_str = format!("{}", JsStrDisplayEscaped { inner: s });
assert_eq!(rust_str, "😀");

let rust_str = format!("{}", JsStrDisplayLossy { inner: s });
assert_eq!(rust_str, "😀");
}

#[test]
fn unpaired_surrogates() {
// 0xD800 is an unpaired surrogate (see https://www.fileformat.info/info/unicode/char/d800/index.htm).
let s = JsStr::utf16(&[0xD800]);

let rust_str = format!("{}", JsStrDisplayEscaped { inner: s });
assert_eq!(rust_str, "\\uD800");

let rust_str = format!("{}", JsStrDisplayLossy { inner: s });
assert_eq!(rust_str, "�");
}
12 changes: 10 additions & 2 deletions core/string/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ mod tagged;
mod tests;

use self::{iter::Windows, str::JsSliceIndex};
use crate::display::JsStrDisplayEscaped;
use crate::display::{JsStrDisplayEscaped, JsStrDisplayLossy};
use crate::tagged::{Tagged, UnwrappedTagged};
#[doc(inline)]
pub use crate::{
Expand Down Expand Up @@ -960,14 +960,22 @@ impl JsString {
}
}

/// Gets a displayable escaped string. This may be faster and has less
/// Gets a displayable escaped string. This may be faster and has fewer
/// allocations than `format!("{}", str.to_string_escaped())` when
/// displaying.
#[inline]
#[must_use]
pub fn display_escaped(&self) -> JsStrDisplayEscaped<'_> {
JsStrDisplayEscaped::from(self.as_str())
}

/// Gets a displayable lossy string. This may be faster and has fewer
/// allocations than `format!("{}", str.to_string_lossy())` when displaying.
#[inline]
#[must_use]
pub fn display_lossy(&self) -> JsStrDisplayLossy<'_> {
JsStrDisplayLossy::from(self.as_str())
}
}

impl Clone for JsString {
Expand Down
8 changes: 8 additions & 0 deletions core/string/src/str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,14 @@ impl<'a> JsStr<'a> {
m >= n && needle == self.get(m - n..).expect("already checked size")
}

/// Gets an iterator of all the Unicode codepoints of a [`JsStr`], replacing
/// unpaired surrogates with the replacement character. This is faster than
/// using [`Self::code_points`].
#[inline]
pub(crate) fn code_points_lossy(self) -> impl Iterator<Item = char> + 'a {
char::decode_utf16(self.iter()).map(|res| res.unwrap_or('\u{FFFD}'))
}

/// Gets an iterator of all the Unicode codepoints of a [`JsStr`].
/// This is not optimized for Latin1 strings.
#[inline]
Expand Down
Loading