Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new parser #1399

Merged
merged 52 commits into from
May 21, 2018
Merged

new parser #1399

merged 52 commits into from
May 21, 2018

Conversation

jerch
Copy link
Member

@jerch jerch commented Apr 21, 2018

WIP - don't merge...

  • basic transition to ansi complicant parser done
  • speed gain ~30%

TODO:

  • tests for glue class ParserTerminal
  • refactoring with API for hooking custom escape codes and methods
  • maybe merge ParserTerminal with InputHandler
  • better decoupling of the higher level stuff

@jerch
Copy link
Member Author

jerch commented Apr 22, 2018

This PR is up for a first technical and conceptual review:

About EscapeSequenceParser:

  • implements the reference from https://vt100.net/emu/dec_ansi_parser as FSM and should therefore support all legacy escape sequences
  • newer sequences like CSI with parameter element colon notation are not supported atm (were not part of the old DEC machines), can be added with some effort
  • custom transition maps via parser argument
  • custom transitions on the fly possible via TransitionTable.add
  • FSM actions are hardcoded as switch statement in the parse method for performance reasons
  • hot parse loop avoids memory allocations as much as possible (no string copies)
  • supports error propagation (back propagation not implemented yet)
  • fully tested

Interface to xterm.js:

  • realized by class ParserTerminal implementing the action callbacks needed by EscapeSequenceParser
  • no magic here - was straight forward to integrate with existing InputHandler methods
  • good place to allow custom sequence handler hooks
  • DCS not yet implemented
  • no tests yet

Results so far:

  • passes all tests, no awkward behavior during live tests
  • 10% speed gain by switching to the new parser
  • performance tests revealed InputHandler.addChar as main bottleneck (localizing _terminal.buffer gave 20% performance boost, seems the getters to the buffer implementation are very expensive - Should this be tracked in a new issue?)

There is still alot refactoring needed, several questions popped during coding (also see code comments):

  • The Parser uses atm the global transition table by default. Should this be copied over to new instances instead to allow hot custom state patches without altering other terminals?
  • Should all escape sequence types be customizable or only OSC and DCS?
  • Should ParserTerminal be merged with InputHandler? If so this could be the base for a screen/input decoupled terminal class since it could handle all the escape sequence related stuff.
  • There are still many references to the _terminal object. Are there any plans yet to decouple this further?
  • The execute action does not define any method for XON or XOFF (copied over). Is this done by a different layer? Or does xterm.js currently not support XON/XOFF from slave side?
  • The execute action falls back to addChar atm (again copied over). I think this needs to be revised since it might lead to weird output sometimes of characters that should not be printed at all.
  • in ESC action: Does xterm.js support single G2/G3 shifts?
  • in ESC action: I did not understand the code for ESC / - isnt this code here unreachable?
  • How to deal with custom escape sequence handlers? Registering/deregistering could be done in a typical event like fashion or by a more restrictive coupling to some API methods. Not sure what is better here, if the class would just emit escape sequence events people might get greedy with this and that, also performance will suffer. It would be more versatile though.

This is alot at once, maybe someone finds some time to address some of the questions and basic layout ideas. @Tyriar @parisk @mofux 😄


actionCSI(collected: string, params: number[], flag: string): void {
this._terminal.prefix = collected;
switch (flag) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan is eventually to move all these back to a map? I moved these from a switch to a map to reduce the number of conditionals at the cost of a function pointer for each option.

Also would something like this be better?

interface IEscapeSequenceParser {
  registerCsiHandler(char: string, callback: (params: x) => y): void;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, gonna try to find out, how to jump with the lowest costs. I think emitting the charCode from EscapeSequenceParser and placing the function pointers directly into an object will be the fastest (at least in chrome numerical properties avoid the key hashing). As soon as string type properties are involved, a switch is faster until ~20 entries or so.
About the interface - since I am not that good with typescript declarations any more typescripty suggestion is welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was getting at with the interface was a different way of thinking about the problem; instead of having ParserTerminal implement actionCSI, what if ParserTerminal/InputHandler called:

parser.registerCsiHandler('@', this._inputHandler.insertChars);
parser.registerCsiHandler('A', this._inputHandler.cursorUp);

(as opposed to)

            case '@': return this._inputHandler.insertChars(params);
            case 'A': return this._inputHandler.cursorUp(params);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats basically the decorator pattern I had in mind for the DCS part, should work for the others as well. 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm do we need support for multiple handlers for a single action? This would allow stacking functionality for addons on top of the the basic functionality. Not sure if maintaining an array of function pointers will hurt performance wise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe starting with support for a single handler is enough for the moment. With the suggested API it wouldn't be complicated to add support for multiple handlers later on if a use-case comes up for it. I'd think that emitting more higher-level events from the input handler (like "linefeed" or "cursorMove") is more appealing for most use-cases.

inst_P?: (dcs: string) => void;
inst_U?: () => void;
inst_E?: () => void; // TODO: real signature
actionPrint?: (data: string, start: number, end: number) => void;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should make this mandatory (remove ?), otherwise you would need to check for its existence before calling every time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda obsolete with the new jump tables.


// default transition table points to global object
// Q: Copy table to allow custom sequences w'o changing global object?
export class EscapeSequenceParser {
public initialState: number;
public currentState: number;
public transitions: TransitionTable;
public osc: string;
public params: number[];
public collected: string;
public term: any;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any just temporary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, term got removed.

this.transitions = new TransitionTable(4095);
this.transitions.table.set(TRANSITION_TABLE.table);
constructor(
terminal?: IParserTerminal | any,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be:

constructor(
  public terminal: IParserTerminal
  ...
)

if (code > 0x9f) {
switch (currentState) {
case 0: // GROUND -> add char to print string
case STATE.GROUND: // add char to print string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using whitespace for alignment, align with the : or put on a new line.

code: code, // actual character code
state: currentState, // current state
print: printed, // print buffer start index
dcs: dcs, // dcs buffer start index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If key and value are the same you can just use one:

{
  pos: i,
  code,
  state: currentState,
  print,
  dcs,
  ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice shortcut.


// FSM actions
export const enum ACTION {
ignore = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use this naming convention for enums:

xterm.js/src/Parser.ts

Lines 153 to 162 in 716a8d5

export enum ParserState {
NORMAL = 0,
ESCAPED = 1,
CSI_PARAM = 2,
CSI = 3,
OSC = 4,
CHARSET = 5,
DCS = 6,
IGNORE = 7
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. Oh and the enums must be const, otherwise Typescript will place JS identifiers into the parse method with worse performance.

Copy link
Member

@Tyriar Tyriar Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, that's a huge difference! We must constify all the enums!

enum a {
	b,
	c,
	d
}

const enum A2 {
	B2,
	B3,
	B4
}

console.log(a.b);
console.log(A2.B2);

Compiles to:

var a;
(function (a) {
    a[a["b"] = 0] = "b";
    a[a["c"] = 1] = "c";
    a[a["d"] = 2] = "d";
})(a || (a = {}));
console.log(a.b);
console.log(0 /* B2 */);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably not enough of an argument against it, but if we ever wanted to use babel to compile the typescript source directly, it's unable to handle const enums, because they require type information.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure either if you want enums to be const everywhere. I need this feature here to get the plain number at JS level avoiding costly deref at runtime, still I wish I had access to the enum definition (e.g. have the states named for hot patches) and already thought about workarounds (with no handy solution yet lol).

this.transitions.table.set(TRANSITION_TABLE.table);
constructor(
terminal?: IParserTerminal | any,
transitions: TransitionTable = VT500_TRANSITION_TABLE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again you can add public before transitions and then remove the member above.


// default transition table points to global object
// Q: Copy table to allow custom sequences w'o changing global object?
export class EscapeSequenceParser {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to hide behind an interface and use the readonly trick for properties that shouldn't be edited outside?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure on this one yet: the transition table takes 5kB as Uint8Array and much more as fallback Array implementation (> 40kB, not done yet). I tend to make the transition table private and give the parser a copy on write method for custom states to save memory for multiple terminals.
Not sure either whether to hide EscapeSequenceParser and give ParserTerminal/InputHandler the main control API.

- quick'n dirty merge of classes - ParserTerminal is subclass of InputHandler
- basic fallback handlers implemented
- test regressions due to multiple changes in EscapeSequenceParser
- runtime down to 2.3s
- TODO: fix tests, implement DCS functions, full merge of classes
@jerch
Copy link
Member Author

jerch commented Apr 29, 2018

The function lookup tables are in place. The lookup tables are objects without a prototype to cut property lookups. I am not yet happy with the lookup code itself:

ident = collect + String.fromCharCode(code);
if (this._escHandlers[ident]) this._escHandlers[ident](params, collect);
else this._escHandlerFb(collect, code);

There are still some conceptual issues:

  • Current function map does not allow "half defined" lookups, either a function for a sequence is set or the fallback function will be called (that's a major drawback compared to switch, where you can easily define fall through states).
  • The function registration does not follow a unique scheme - for CSI a function is registered for flag while DCS and ESC use both collect and flag. The flag method used for CSI is much faster but due to the messed up escape sequences for ESC and DCS where methods for a certain flag have not much in common for different collect values not applicable there.

and minor performance issues:

  • DCS and ESC keys are strings (10x slower than numercial keys) - How to avoid?
  • double lookup: if (<lookup>) <lookup>() though I think this gets optimized by the engine and is no biggie

Still it performs better than the switch version (~5% gain), compared to the parser in master this version is ~2.5x faster (320ms vs. 120ms for ls -lR /usr/lib).

I also merged ParserTerminal with InputHandler and moved addChar into the print method which gave some further performance boost (ls -lR /usr/lib runs at 2.3 - 2.5s now). I think in print there is still some room for optimization since the gc kicks in there with ~10% of the total JS runtime (~250 ms). This might be related to the Array creations there, no clue yet.

@jerch
Copy link
Member Author

jerch commented Apr 30, 2018

Made some progress towards a nicer integration. The current version does all callback registering directly in InputHandler. I decided to remove ParserTerminal since it mostly passed InputHandler functionality to the parser with almost no own business logic.
Still I dont quite understand the separation level of InputHandler vs. Terminal - some escape sequence actions directly access terminal internals while others do this via an InputHandler method. Maybe the direct accesses could also be managed by InputHandler?

@Tyriar
Copy link
Member

Tyriar commented May 15, 2018

I can rewrite the OSC API, proside is a more unique "look'n feel" of the API + ~5x speed gain for OSC, downside the rewrite and additional tests. Since it would involve API changes it might be better to do it before this PR lands. What you think?

Let's definitely defer any more changes like this, I want to merge this ASAP to prevent it from becoming stale.

@Tyriar
Copy link
Member

Tyriar commented May 15, 2018

C0 control codes without an explicit handler - what to do with those? To reassemble the old behavior those should call InputHandler.print. I still think printing those is faulty behavior, but maybe we should do to keep in line with old parser and address it later?

Let's address it later after we're sure there are no major issues with the changes. Shall we create a follow up issue?

C1 control codes - enable or disable? They were not part of xterm.js before, to reassemble old behavior should they be disabled? Btw the old logic would also just print those.

Again let's defer this.

DECRQSS and the valid/invalid flag: I dont quite understand the old code for this but it seems the response always contained 0 as validation indicator. Problem here is the fact that DEC specifies 0 as valid and xterm spec as invalid (they flipped the meaning). I have no clue which one is right but would guess that most modern apps would follow the xterm meaning.

Looks like this was fixed upstream after the fork chjj/term.js@36c1e5c

@jerch
Copy link
Member Author

jerch commented May 15, 2018

Let's address it later after we're sure there are no major issues with the changes. Shall we create a follow up issue?

Good idea for easier tracking. There is at least one issue that might be affected by this (cant find it atm, was something about a terminal game with weird control code output).

Looks like this was fixed upstream after the fork chjj/term.js@36c1e5c

Yup, it follows the xterm meaning there too. This also contains the DCS sequence for custom keys and a tmux passthrough thing. Might be worth a closer look later on.

@Tyriar
Copy link
Member

Tyriar commented May 21, 2018

C0 explciit handler: #1461
Enable C1: #1462

* DECRQSS (https://vt100.net/docs/vt510-rm/DECRQSS.html)
* Request Status String (DECRQSS), VT420 and up.
* Response: DECRPSS (https://vt100.net/docs/vt510-rm/DECRPSS.html)
* FIXME: xterm and DEC flip P0 and P1 to indicate valid requests - which one to go with?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix this before merge

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns atm 1 for a valid and 0 for an invalid request (xterm style). Should this be flipped? If not I can simply remove the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're right it is already fixed 😄 let's remove the FIXME

switch (this._data) {
// valid: DCS 1 $ r Pt ST (xterm)
case '"q': // DECSCA
return this._terminal.send(C0.ESC + 'P1' + '$r' + '0"q' + C0.ESC + '\\');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These long strings are more readable imo using string interpolation:

${C0.ESC}P1$r0"q${C0.ESC}\\'

Any reason for not going this way?

Copy link
Member Author

@jerch jerch May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will that work with 'ES5' as target? Thought this is ES6 style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TS compiles it down. This in Renderer.ts:

this._terminal.screenElement.style.width = `${this.dimensions.canvasWidth}px`;

Becomes:

this._terminal.screenElement.style.width = this.dimensions.canvasWidth + "px";

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sweet, gonna change it.

@Tyriar
Copy link
Member

Tyriar commented May 21, 2018

@jerch I'll merge once I get another go ahead from you 😃

I can pull this into vscode insiders in a few days, I just did a big update with heaps of rendering changes so want to give that some bake time first.

@jerch
Copy link
Member Author

jerch commented May 21, 2018

Guess I am done with this PR.

Note that I already have some perf optimizations pending for InputHandler.print, I can get them PR'ed as soon as this PR has landed.

@Tyriar
Copy link
Member

Tyriar commented May 21, 2018

🚀 😨

@Tyriar Tyriar merged commit 099c7d8 into xtermjs:master May 21, 2018
@jerch
Copy link
Member Author

jerch commented May 21, 2018

😓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reference A closed issue/pr that is expected to be useful later as a reference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants