-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Finish full unicode support (M:N cell rendering, ZWJ?) #1472
Comments
This is now the master issue for all good good rendering efforts. |
I apologize for not providing this detail previously, but I have just checked all three modes available on my laptop - PowerShell, CMD and Ubuntu WSL1 - it's not available in any of them. |
It's alright. We know that we're not done in this space and when we sat in triage, we couldn't believe we didn't file the issue yet. Congrats, yours is now the master tracking that we still have work to do to live up to our Unicode promise. |
The table that we refer to in `CodepointWidthDetector.cpp` to determine whether or not a codepoint should be rendered as Wide vs Narrow was based off EastAsianWidth[1]. If a codepoint wasn't included in this table, they're considered Narrow. Many emojis aren't specified in the EAW list, so this PR supplements our table with emoji codepoints from emoji-data[2] in order to render most, if not all, emojis as full-width. There are certain codepoints I've added to the comments (in case we want to add them officially to the table in the future) that Microsoft decided to give an emoji presentation even if it's specified as Narrow/Ambiguous in the EAW list and are _not_ specified in the Unicode emoji list. These include all of the Mahjong Tiles block, different direction pencils (✎✐), different pointing index fingers (☜, ☞) among others. I have no idea if I've captured all of them, as I don't know of an easy way to detect which are Microsoft specific emojis. ## Validation Steps Performed I have looked at so many emojis that I dream emoji. These screenshots aren't encompassing _all_ emoji but I've tried to grab a couple from all across the codepoint ranges: Before: ![before](https://user-images.githubusercontent.com/57155886/81445092-2051a980-912d-11ea-9739-c9f588da407d.png) After: ![after](https://user-images.githubusercontent.com/57155886/81445107-2778b780-912d-11ea-9615-676c2150e798.png) [1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt [2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt Closes #900 (cherry picked from commit 7ae3433)
maybe its related to this issue #6615 |
Has there been any progress on this? |
That would be why this workitem is still open 😄 |
@DHowett whoops, I didn't fully flesh out that info, I should have made it a sub-issue, sorry for the bump notify, it wasn't my intent :) |
First, this adds `GraphemeTableGen` which * parses `ucd.nounihan.grouped.xml` * computes the cluster break property for each codepoint * computes the East Asian Width property for each codepoint * compresses everything into a 4-stage trie * computes a LUT of cluster break rules between 2 codepoints * and serializes everything to C++ tables and helper functions Next, this adds `GraphemeTestTableGen` which * parses `GraphemeBreakTest.txt` * splits each test into graphemes and break opportunities * and serializes everything to a C++ table for use as unit tests `CodepointWidthDetector.cpp` was rewritten from scratch to * use an iterator struct (`GraphemeState`) to maintain state * accumulate codepoints until a break opportunity arises * accumulate the total width of a grapheme * support 3 different measurement modes: Grapheme clusters, `wcswidth`-style, and a mode identical to the old conhost With this in place the following changes were made: * `ROW::WriteHelper::_replaceTextUnicode` now uses the new grapheme cluster text iterators * The same function was modified to join new text with existing contents of the current cell if they join to form a cluster * Otherwise, a ton of places were modified to funnel the selection of the measurement mode over from WT's settings to ConPTY This is part of #1472 ## Validation Steps Performed * So many tests ✅ * https://github.com/apparebit/demicode works fantastic ✅ * UTF8-torture-test.txt works fantastic ✅
Similar to #190, this issue can now also be closed. #16916 added support for ZWJ and thus this work is now complete. There are some smaller issues left to clean up, but I expect them to be a rare encounter. As such I'll close this issue for now. This will ship in Windows Terminal 1.22 this year. If you want to try it out right now, please feel free to download our Canary (nightly) build here: #16121 Please note that PowerShell does not have support for complex Unicode yet, but that's expected to change in the foreseeable future (no exact date yet). |
Summary of the new feature/enhancement
When the Terminal came out, there was a mention of Unicode support, but I can see that it's still not there. There is no support for Georgian script yet.
The text was updated successfully, but these errors were encountered: