-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Julia doesn't like Pizza #3721
Comments
I nominate for most interesting issue subject. In other news, this looks like something to take up with
We don't allow negative lengths, so we clip to 0, which seems pretty reasonable to me |
This prints just fine on my terminal and looks to have a width of two. Not sure how we should handle this given that the C function is wrong. |
Unsurprisingly apple seems to be the first to update their unicode tables. I think our only options here are to rely on whatever libc is available, or use our own unicode tables. Not sure we want to get into all that. |
Is it possible to test some characters during build time and emit runtime warnings if |
Eh, what's the point? That's just a warning that people are going to ignore or re-open this issue. If the system libc has the wrong character width for something, then you get mangled garbage. Get a better OS. |
The fact that this even exists as a character is a clear sign that unicode allows too many bits :-). |
What's the status here? @Keno? |
Looks like somebody submitted a patch to glibc yesterday to update their unicode data: |
Did the bump from libutf8proc to libmojibake solve this? #7917. |
I don't think the char width problem has been solved yet. @jiahao went through all the codepoints and computed the correct widths, but this information has not made it into libmojibake. |
ping @jiahao |
The last time I discussed this with @stevengj, we weren't entirely settled on whether the new A correct |
I really think it makes sense for it to be in libmojibake. It's in line with the other functionality in there, and won't bloat the library by a large percentage. |
I don't actually care much either way, but I'm fine with putting it in libmojibake. |
The advantage of putting it in Julia (replacing |
Ok, then let's drop it in as our wcwidth.c to fix this issue, and possibly move it to libmojibake later. |
utf8proc now includes an up-to-date |
We should probably turn the name back to utf8proc here. I'd also slightly prefer going back to using a tarball for it (once 1.2.0 is tagged) rather than a submodule. |
+1 |
I slightly prefer submodules, since the tarballs tend to leave a bunch of old versions littering the |
Submodules tend to confuse newcomers when versions are upgraded, introducing confusing diffs after they |
@stevengj iTerm2 does need to interoperate with anything you can ssh, telnet, etc. to, not to mention Julia, so I'm open to giving users a way to opt in to more a sensible wcwidth(). I don't use wcwidth() on the client so I could use utf8proc_charwidth in the right circumstances. Since AFAIK only Julia departs from the standard, there'd need to be a new escape sequence to tell the terminal emulator to switch character-width lookup tables. OTOH, since Julia is the black sheep in this regard, it probably makes the most sense for Julia to print a space after characters that it treats as wide but wcwidth does not. And deal with cursor movement across them correctly, etc. That'll work with every terminal out there. If a window gets resized it won't wrap correctly, though. Terminal and iTerm2 will both refuse to "break" a fullwidth character into two half-width pieces, choosing instead to move the whole thing to the start of the next line, but that's a small price to pay. |
@gnachman If I print a space next to a character, is there a chance the space will get drawn on top of it during a redraw? I think I've seen that behavior in my experiments. |
@Keno Yes, that can happen. I'm working on a fix to that issue in my refactor_drawing branch. Feel free to try it if you're feeling brave :). I expect to merge it into master in a week or two. Terminal.app doesn't have that issue, so that approach is safe to use. |
It's not our preference to depart from standards here. Just looking at that glyph, clearly somebody thinks it is double-width. As @stevengj said there is no clear standard. |
Note that UAX#11 provides a clear standard for a subset of Unicode, and |
@gnachman The rationale and details of the analyses used to justify Julia's implementation are explained in JuliaStrings/utf8proc#2 and JuliaStrings/utf8proc#27 and in this notebook, which amongst other things details the exact discrepancies between my system As @JeffBezanson and @stevengj have already stated, there is no standard governing character widths, and so it is not possible to characterize Julia as "departing from the standard". On the contrary, it appears that not enough thought has gone into any other implementation for the purpose of determining character widths. To illustrate our reasoning, consider the pizza character U+1F355. The relevant entry in
which assigns it the "neutral" category (not "narrow", which is coded as "Na"). Thus it falls into the nebulous category where UAX 11 has essentially nothing to say because the character doesn't exist in legacy East Asian encodings. (UAX 11 even says in its Scope not to consider it an authoritative source on character widths, but rather that
) In the absence of a clear standard, the best I could come up with is to look at a font that actually bothered to provide a glyph for that code point, hence settling on Unifont, which provides this glyph: Note that the character width assigned by inspecting the advance width from Unifont agrees with the eyeball comparison of the reference glyph in the Unicode character charts (pdf). Superimposed for reference is a square box. I do not see any reason why this should be 'narrow' instead of 'fullwidth'. |
@jiahao, I wasn't criticizing your work. The informal agreement between client and server, which as you note is underspecified, is what is rickety. Your work is really valuable--I wish it (or something like it) were widely adopted. I had believed that EastAsianWidth.txt was "the standard", but I'm persuaded that there isn't really one at all. AFAIK most apps treat N as narrow, but it leads to the problems described in this bug. |
It sounds like this should be reported upstream/more widely, if it hasn't been already. |
Unfortunately, it seems like the only upstream that can really fix this is libc, in order to fix |
Julia has the right (most updated) char widths, so it's up to the user to demand their terminal emulators are displaying properly. most likely, that'll happen gradually with various companies (#7267) lagging behind more or less from the standards committee. |
iTerm2 PR: gnachman/iTerm2#294 |
The text was updated successfully, but these errors were encountered: