Unicode handling in xterm.js #1709

jerch · 2018-09-25T16:05:34Z

Coming from #1707 it seems the correct unicode handling is more and more an issue for people due to emojis. Since we all love emojis this should get fixed ASAP 😄

Proposal:
Create a provider for different unicode versions, that is capable of hiding the version specific data and implementations behind a nice API. Currently we only need version dependent implementations for wcwidth, so a rough sketchup could look like this:

interface IUnicodeProvider {
  supportedVersions(): string[];
  getVersion(): string;
  setVersion(version?: string);  // version optional for fallback behavior
  wcwidth(ucs: number): number;
  getStringCellWidth(s: string): number;
  ... // more to come with support of other unicode features
}

Ideally the provider is self containing, thus the terminal just needs to deal with the interface methods and updates the version/locale when needed. The provider would have to deal with the low level stuff to provide the correct data sets so the methods just work as expected for a supported version.
Within the provider we then can decide whether the data is provided statically in the code base or even tries to create the data on the fly. First will have quite an impact on xterm.js' size, the second will raise async questions (remember - most of the core parts are synchronous atm). The whole unicode stuff could also be bundled into some addon like feature for version XY.

Up for discussion.
/cc @Tyriar, @bgw, @mofux, @dnfield

dnfield · 2018-09-25T16:43:40Z

I think something like this might be more approachable:

interface IUnicodeProvider {
  getVersion(): string;
  wcwidth(ucs: number): number;
  getStringCellWidth(s: string): number;
  ... // more to come with support of other unicode features
}

With something like a UnicodeProviderFactory.v11 would be a bit nicer at coding time, but this makes sense to me either way.

jerch · 2018-09-25T17:08:25Z

@dnfield Yeah either way will work. Not sure though if we will need the type info at the version level.

My idea was to create an interface, that can transparently do the version switches at runtime like this:

// terminal ctor - create the provider
this.unicodeProvider = new UnicodeProvider();
...
// some code that knows whether to switch unicode versions
this.unicodeProvider.setVersion(xy);
...
// some unicode consumer - does not care about versions at all, just gets the right method
this.unicodeProvider.wcwidth(...)

This way this.unicodeProvider can be carried around without the need to reattach after a version change or using a costly property on the terminal instance.

jerch · 2018-09-28T12:49:50Z

What I get from the discussion in #1707:

We want to deliver two wcwidth table versions for now, the old one and the new one created by @dnfield.
Ambiguous chars are not worth the trouble, as @gnachman pointed out. They are handled halfwidth by most apps, so we can do the same (already done in the legacy table, needs to be tested with the new table).
Create a new global option for the unicode version. The option would have to be set by integrators or offered to users for runtime changes.
Postpone the creation of a new escape sequence to set the unicode version, as an interface to register non standard sequences is not established yet.
No magic unicode version guesser for now. Once we do such a tool in the future, it would be outside of xterm.js anyway (maybe it could live in the org as a separate package).
In the future we might need to come up with unicode addons to keep the package size of xterm,js small.

Any takers to get that into TS code?

jerch · 2018-09-28T20:37:33Z

Did a first possible incarnation in #1714. Copied the new table over from #1707, hope thats ok (@dnfield).

dnfield · 2018-09-28T21:20:14Z

No problem!

Tyriar · 2019-05-10T15:46:00Z

#1714 is a good reference for this, but the plan is to ship several addons after the new addon model (#1128) is in, then allow the embedder to choose the right version.

mikegwhit · 2019-10-02T03:21:08Z

Please fix ASAP, updating Windows recently seemed to break this for me. I'm attempting to support a Node.js library that emoji'fies some aspects of logging for easier readability (it sounds much goofier than it is).

jerch · 2019-12-06T13:50:17Z

New attempt in #2568, hopefully we can get this rolled out with next release.

Tyriar · 2020-02-03T19:20:00Z

@jerch can we call this closed with #2568 being merged?

jerch · 2020-02-03T19:21:57Z

@Tyriar Yepp, there is also a follow up already 😸 --> #2668

jerch added the type/proposal A proposal that needs some discussion before proceeding label Sep 25, 2018

This was referenced Sep 26, 2018

Update wcwidth/CharWidth.ts #1707

Closed

Selection with search and unicode #1686

Closed

jerch mentioned this issue Sep 28, 2018

multiple unicode version support #1714

Closed

Tyriar mentioned this issue Jan 7, 2019

Unicode and emojis is terminal are sometimes the wrong width microsoft/vscode#66125

Closed

This was referenced Feb 5, 2019

VSCode ZSH Glitch when pressing tab microsoft/vscode#67904

Closed

weird cursor-word space issue with zsh, oh-my-zsh microsoft/vscode#67789

Closed

This was referenced Mar 1, 2019

wrong cursor position when using cmder in vscode microsoft/vscode#69582

Closed

The cursor of integrated terminal has weird spacing when using powerline in wsl microsoft/vscode#69263

Closed

Tyriar mentioned this issue May 31, 2019

Terminal: Emoji layout is not correct on Windows 10 1903 microsoft/vscode#74314

Closed

Tyriar mentioned this issue Jun 7, 2019

Terminal renderers' line wrapping breaks when using git-bash-prompt microsoft/vscode#75028

Closed

Tyriar mentioned this issue Jun 27, 2019

Terminal thinks single-width characters are double-width microsoft/vscode#75964

Closed

Tyriar mentioned this issue Jul 8, 2019

Emojis don't work under ConPTY microsoft/vscode#76842

Closed

minonl mentioned this issue Jul 9, 2019

Emoji gets glitched when pasted into the terminal vercel/hyper#3615

Closed

2 tasks

Tyriar mentioned this issue Jul 27, 2019

Terminal fails on fancy themes and unicode items microsoft/vscode-remote-release#1002

Closed

minonl mentioned this issue Aug 4, 2019

Consider making ConPTY and Windows Terminal treat all ambiguous-width characters as 1 cell instead of asking the font microsoft/terminal#2066

Closed

This was referenced Aug 10, 2019

Splitted terminal resize display error microsoft/vscode#77579

Closed

Whitespace Issues with prompt, on VSCode's integrated terminal microsoft/vscode#79175

Closed

Problem with emojis/unicode (assumed double-width?) #1059

Closed

Tyriar mentioned this issue Oct 7, 2019

revamp wcwidth #1467

Closed

Tyriar added area/parser type/enhancement Features or improvements to existing features and removed type/proposal A proposal that needs some discussion before proceeding labels Oct 7, 2019

Tyriar mentioned this issue Oct 8, 2019

Emojis don't take enough space, some stay floating. microsoft/vscode#51385

Closed

Tyriar mentioned this issue Oct 17, 2019

Extra characters appearing in integrated terminal with completion microsoft/vscode#82751

Closed

felixse mentioned this issue Oct 20, 2019

Unicode characters not properly displayed felixse/FluentTerminal#553

Open

This was referenced Oct 27, 2019

Support ligatures in terminal microsoft/vscode#34103

Closed

emoji text isn't rendered correctly near fixed width chars #2523

Closed

Tyriar mentioned this issue Nov 7, 2019

Characters overlaps emoji microsoft/vscode#84098

Closed

jerch self-assigned this Nov 15, 2019

This was referenced Nov 19, 2019

Icons are not properly rendered in embedded terminal microsoft/vscode#85114

Closed

Spaces after emojis are ignored in vs code terminal microsoft/vscode#85024

Closed

torch2424 mentioned this issue Nov 22, 2019

Support Asian Scripts wasmerio/webassembly.sh#62

Closed

jerch added this to the 4.4.0 milestone Dec 6, 2019

matchai mentioned this issue Dec 23, 2019

package version glitched starship/starship#781

Closed

Tyriar mentioned this issue Jan 3, 2020

Mulit code point emoji doesn't render correctly #2660

Closed

jerch closed this as completed Feb 3, 2020

jerch reopened this Feb 3, 2020

jerch mentioned this issue Feb 3, 2020

wcwidth rules for unicode 11 #2568

Merged

jerch closed this as completed Feb 3, 2020

matchai mentioned this issue Feb 28, 2020

Problems using starship with powershell on vscode terminal starship/starship#968

Closed

Tyriar mentioned this issue Mar 22, 2021

Some emojis not displaying correctly in VSCode macOS microsoft/vscode#118905

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode handling in xterm.js #1709

Unicode handling in xterm.js #1709

jerch commented Sep 25, 2018

dnfield commented Sep 25, 2018

jerch commented Sep 25, 2018

jerch commented Sep 28, 2018 •

edited

Loading

jerch commented Sep 28, 2018

dnfield commented Sep 28, 2018

Tyriar commented May 10, 2019

mikegwhit commented Oct 2, 2019

jerch commented Dec 6, 2019

Tyriar commented Feb 3, 2020

jerch commented Feb 3, 2020

Unicode handling in xterm.js #1709

Unicode handling in xterm.js #1709

Comments

jerch commented Sep 25, 2018

dnfield commented Sep 25, 2018

jerch commented Sep 25, 2018

jerch commented Sep 28, 2018 • edited Loading

jerch commented Sep 28, 2018

dnfield commented Sep 28, 2018

Tyriar commented May 10, 2019

mikegwhit commented Oct 2, 2019

jerch commented Dec 6, 2019

Tyriar commented Feb 3, 2020

jerch commented Feb 3, 2020

jerch commented Sep 28, 2018 •

edited

Loading