Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode handling in xterm.js #1709

Closed
jerch opened this issue Sep 25, 2018 · 10 comments · Fixed by #2568
Closed

Unicode handling in xterm.js #1709

jerch opened this issue Sep 25, 2018 · 10 comments · Fixed by #2568
Assignees
Labels
area/parser type/enhancement Features or improvements to existing features
Milestone

Comments

@jerch
Copy link
Member

jerch commented Sep 25, 2018

Coming from #1707 it seems the correct unicode handling is more and more an issue for people due to emojis. Since we all love emojis this should get fixed ASAP 😄

Proposal:
Create a provider for different unicode versions, that is capable of hiding the version specific data and implementations behind a nice API. Currently we only need version dependent implementations for wcwidth, so a rough sketchup could look like this:

interface IUnicodeProvider {
  supportedVersions(): string[];
  getVersion(): string;
  setVersion(version?: string);  // version optional for fallback behavior
  wcwidth(ucs: number): number;
  getStringCellWidth(s: string): number;
  ... // more to come with support of other unicode features
}

Ideally the provider is self containing, thus the terminal just needs to deal with the interface methods and updates the version/locale when needed. The provider would have to deal with the low level stuff to provide the correct data sets so the methods just work as expected for a supported version.
Within the provider we then can decide whether the data is provided statically in the code base or even tries to create the data on the fly. First will have quite an impact on xterm.js' size, the second will raise async questions (remember - most of the core parts are synchronous atm). The whole unicode stuff could also be bundled into some addon like feature for version XY.

Up for discussion.
/cc @Tyriar, @bgw, @mofux, @dnfield

@jerch jerch added the type/proposal A proposal that needs some discussion before proceeding label Sep 25, 2018
@dnfield
Copy link

dnfield commented Sep 25, 2018

I think something like this might be more approachable:

interface IUnicodeProvider {
  getVersion(): string;
  wcwidth(ucs: number): number;
  getStringCellWidth(s: string): number;
  ... // more to come with support of other unicode features
}

With something like a UnicodeProviderFactory.v11 would be a bit nicer at coding time, but this makes sense to me either way.

@jerch
Copy link
Member Author

jerch commented Sep 25, 2018

@dnfield Yeah either way will work. Not sure though if we will need the type info at the version level.

My idea was to create an interface, that can transparently do the version switches at runtime like this:

// terminal ctor - create the provider
this.unicodeProvider = new UnicodeProvider();
...
// some code that knows whether to switch unicode versions
this.unicodeProvider.setVersion(xy);
...
// some unicode consumer - does not care about versions at all, just gets the right method
this.unicodeProvider.wcwidth(...)

This way this.unicodeProvider can be carried around without the need to reattach after a version change or using a costly property on the terminal instance.

@jerch
Copy link
Member Author

jerch commented Sep 28, 2018

What I get from the discussion in #1707:

  • We want to deliver two wcwidth table versions for now, the old one and the new one created by @dnfield.
  • Ambiguous chars are not worth the trouble, as @gnachman pointed out. They are handled halfwidth by most apps, so we can do the same (already done in the legacy table, needs to be tested with the new table).
  • Create a new global option for the unicode version. The option would have to be set by integrators or offered to users for runtime changes.
  • Postpone the creation of a new escape sequence to set the unicode version, as an interface to register non standard sequences is not established yet.
  • No magic unicode version guesser for now. Once we do such a tool in the future, it would be outside of xterm.js anyway (maybe it could live in the org as a separate package).
  • In the future we might need to come up with unicode addons to keep the package size of xterm,js small.

Any takers to get that into TS code?

@jerch
Copy link
Member Author

jerch commented Sep 28, 2018

Did a first possible incarnation in #1714. Copied the new table over from #1707, hope thats ok (@dnfield).

@dnfield
Copy link

dnfield commented Sep 28, 2018

No problem!

@Tyriar
Copy link
Member

Tyriar commented May 10, 2019

#1714 is a good reference for this, but the plan is to ship several addons after the new addon model (#1128) is in, then allow the embedder to choose the right version.

@mikegwhit
Copy link

Please fix ASAP, updating Windows recently seemed to break this for me. I'm attempting to support a Node.js library that emoji'fies some aspects of logging for easier readability (it sounds much goofier than it is).

@Tyriar Tyriar mentioned this issue Oct 7, 2019
@Tyriar Tyriar added area/parser type/enhancement Features or improvements to existing features and removed type/proposal A proposal that needs some discussion before proceeding labels Oct 7, 2019
@jerch
Copy link
Member Author

jerch commented Dec 6, 2019

New attempt in #2568, hopefully we can get this rolled out with next release.

@Tyriar
Copy link
Member

Tyriar commented Feb 3, 2020

@jerch can we call this closed with #2568 being merged?

@jerch
Copy link
Member Author

jerch commented Feb 3, 2020

@Tyriar Yepp, there is also a follow up already 😸 --> #2668

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/parser type/enhancement Features or improvements to existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants