Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreadable io::Error debug strings #34318

Closed
liigo opened this issue Jun 17, 2016 · 8 comments
Closed

Unreadable io::Error debug strings #34318

liigo opened this issue Jun 17, 2016 · 8 comments
Labels
relnotes Marks issues that should be documented in the release notes of the next release.

Comments

@liigo
Copy link
Contributor

liigo commented Jun 17, 2016

While trying to open a non-existing file (on Windows 10, Chinese):

use std::fs::File;
fn main() {
    File::open("no-such-file.rs").unwrap();
}

It panicked with a message:

thread '<main>' panicked at 'called `Result::unwrap()` on an `Err` value: Error { repr: Os { code: 2, message: "\u{7cfb}\u{7edf}\u{627e}\u{4e0d}\u{5230}\u{6307}\u{5b9a}\u{7684}\u{6587}\u{4ef6}\u{3002}" } }', ../src/libcore\result.rs:746
note: Run with `RUST_BACKTRACE=1` for a backtrace.

The OS native error message here is unreadable for human being.

The message ("\u{7cfb}\u{7edf}\u{627e}\u{4e0d}\u{5230}...") is the escaped result of "No such file or directory" in Chinese ("系统找不到指定的文件")。impl fmt::Debug for str do this escape.

I can read Chinese, which is my mother tongue, but I can't read \u{7cfb}....

Possible solutions:

  • Don't use local language (Chinese here) OS error string, use English instead.
  • Change impl fmt::Debug for str to not escape most Unicode chars. (breaking-change?)

fn main() {
    let s = "Hello δεΣ❤ 日月";
    println!("{}", s);   // prints "Hello δεΣ❤ 日月"
    println!("{:?}", s); // prints "Hello \u{3b4}\u{3b5}\u{3a3}\u{2764} \u{65e5}\u{6708}"
}

Run it on play.rust-lang.org

Is impl fmt::Debug for str really friendly enough for debug purpose? Is there possibility that we change its implementation as similar as impl fmt::Display for str?

@retep998
Copy link
Member

I think printing escapes is very useful for debugging strings, since often consoles will be either incapable of displaying the character (the WIndows console is notorious for this), or it is something like the wrong kind of whitespace slipping in which can be really hard to notice otherwise. I feel any solution for this issue should be targeted at Debug for io::Error specifically and not strings in general.

@bluss
Copy link
Member

bluss commented Jun 17, 2016

I'd prefer to output printable characters as they are, escaping is inconvenient for everything that's not in English.

@liigo
Copy link
Contributor Author

liigo commented Jun 19, 2016

since often consoles will be either incapable of displaying the character (the WIndows console is notorious for this)

Which morden console don't support printing Unicode strings? Do you means that very old one? (Windows 10 console do its job perfectly.)

@retep998
Copy link
Member

@liigo While the Windows console is capable of storing arbitrary unicode text, its unicode text rendering abilities are far more limited. It only displays unicode glyphs when the font supports it (it doesn't use font fallback) and it also doesn't work for anything outside the BMP, nor does it support combining characters. It's a simple wchar -> glyph mapping. Here is your chinese error string in the Windows 10 console:

Granted I can easily copy the text from the console and paste it into another program to view it, unless I choose a raster font for the console, in which case the Windows console actually limits itself to u8 instead of u16, completely ruining unicode and the ability to copy paste unicode text, and the only characters you can rely on are ascii.

@liigo
Copy link
Contributor Author

liigo commented Jun 19, 2016

Yes. It's the same when you println! something. I don't think we should implement println! to escape almost all unicode characters. If the (old) consoles can't display unicode properly, things should be changed to consoles.

@liigo
Copy link
Contributor Author

liigo commented Jun 19, 2016

Here is your chinese error string in the Windows 10 console:

For the specific issue, the os error string is returned at runtime by os as local language, I'm sure it has full support to be displayed properly by the os itself.

liigo added a commit to liigo/rust that referenced this issue Jun 23, 2016
tbu- added a commit to tbu-/rust that referenced this issue Jul 25, 2016
Use the same procedure as Python to determine whether a character is
printable, described in [PEP 3138]. In particular, this means that the
following character classes are escaped:

- Cc (Other, Control)
- Cf (Other, Format)
- Cs (Other, Surrogate), even though they can't appear in Rust strings
- Co (Other, Private Use)
- Cn (Other, Not Assigned)
- Zl (Separator, Line)
- Zp (Separator, Paragraph)
- Zs (Separator, Space), except for the ASCII space `' '` (`0x20`)

This allows for user-friendly inspection of strings that are not
English (e.g. compare `"\u{e9}\u{e8}\u{ea}"` to `"éèê"`).

Fixes rust-lang#34318.

[PEP 3138]: https://www.python.org/dev/peps/pep-3138/
bors added a commit that referenced this issue Jul 28, 2016
Escape fewer Unicode codepoints in `Debug` impl of `str`

Use the same procedure as Python to determine whether a character is
printable, described in [PEP 3138]. In particular, this means that the
following character classes are escaped:

- Cc (Other, Control)
- Cf (Other, Format)
- Cs (Other, Surrogate), even though they can't appear in Rust strings
- Co (Other, Private Use)
- Cn (Other, Not Assigned)
- Zl (Separator, Line)
- Zp (Separator, Paragraph)
- Zs (Separator, Space), except for the ASCII space `' '` `0x20`

This allows for user-friendly inspection of strings that are not
English (e.g. compare `"\u{e9}\u{e8}\u{ea}"` to `"éèê"`).

Fixes #34318.
CC #34422.

[PEP 3138]: https://www.python.org/dev/peps/pep-3138/
@liigo
Copy link
Contributor Author

liigo commented Nov 17, 2016

This issue is still exists in nightly. Could you reopen it? @brson @alexcrichton

@tbu- (the author of #34485)
cc #35068


pub fn main() {
    println!("{:?}", "In Chinese: 文件不存在");
    // still prints: "In Chinese: \u{6587}\u{4ef6}\u{4e0d}\u{5b58}\u{5728}"
    // expected: "In Chinese: 文件不存在"
}

@alexcrichton alexcrichton reopened this Nov 17, 2016
tbu- added a commit to tbu-/rust that referenced this issue Nov 18, 2016
The problem occured due to lines like

```
3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
```

in `UnicodeData.txt`, which the script previously interpreted as two
characters, although it represents the whole range.

Fixes rust-lang#34318.
bors added a commit that referenced this issue Nov 20, 2016
Fix `fmt::Debug` for strings, e.g. for Chinese characters

The problem occured due to lines like

```
3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
```

in `UnicodeData.txt`, which the script previously interpreted as two
characters, although it represents the whole range.

Fixes #34318.
@bluss bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 20, 2016
@liigo
Copy link
Contributor Author

liigo commented Nov 23, 2016

Fixed in nightly by #37855. Many thans to @tbu- !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release.
Projects
None yet
Development

No branches or pull requests

5 participants