Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow control/back pressure #2077

Closed
jerch opened this issue May 11, 2019 · 8 comments
Closed

Flow control/back pressure #2077

jerch opened this issue May 11, 2019 · 8 comments
Assignees
Labels
type/documentation Regarding website, API documentation, README, etc.
Milestone

Comments

@jerch
Copy link
Member

jerch commented May 11, 2019

The time-based input limiting contains a buffer, that gets drained only once all chunks were handled.
This way fast producers can easily exhaust memory and will crash the terminal / browser engine.

Relevant code:

xterm.js/src/Terminal.ts

Lines 1441 to 1447 in af81acc

if (this.writeBuffer.length > bufferOffset) {
// Allow renderer to catch up before processing the next batch
setTimeout(() => this._innerWrite(bufferOffset), 0);
} else {
this._writeInProgress = false;
this.writeBuffer = [];
}

Partial fix: Trim/slice already handled chunks from the buffer.

Even with this partial fix a long running producer that is faster than the terminal can handle incoming chunks will grow the buffer up to a possible crash. For a full fix we have to implement a reliable back pressure mechanism.

@jerch jerch added type/bug Something is misbehaving important labels May 11, 2019
@jerch jerch self-assigned this May 11, 2019
@jerch
Copy link
Member Author

jerch commented May 17, 2019

Here is whats going on with a very fast producer like yes in the demo:

image

The numbers in front are the actual length of writeBuffer, which grows very fast by chunks of 50-100kB of data. The chunks arrive every 5ms (hardcoded in server.js), but the terminal can only process one chunk in >15ms (due to lineFeed being very expensive).

Theoretical calculation:

  • 1000ms / 5ms = 200 chunks/s with 50kB ==> ~10 MB/s
  • 2/3 are not processed in time ==> ~7 MB/s buffered up
    ==> after 1 min we have already 0.42 GB allocated for writeBuffer

The reality looks a bit different:
Due to websocket latency the chunks arrive slower than every 5ms in the browser. Instead they get stored in server.js. After running yes for 3 min I see these numbers in top:

  • node: 1.5 GB
  • chrome: 750 MB

Note that this is constantly growing, I did not wait for a process to run into OOM or getting killed by the kernel.

With useFlowControl enabled (XON/XOFF actually works under linux), I see these stable numbers after 3 min:

  • node: 300 - 400 MB
  • chrome: 180 - 200 MB

Conclusion: Systems that dont offer XON/OFF as flow control mechanism are likely to produce OOM for remote xterm.js usage with fast producers (both on server and client side).

@jerch
Copy link
Member Author

jerch commented May 17, 2019

Proposal:
Make backpressure possible. There are various ways to achieve this, kinda all would need the terminal to track the current writeBuffer size. Most simple solution would be a watermark limit and returning its state with Terminal.write (true - we are still below the high watermark keep sending data, false - pause the data/any later data is not guaranteed to end up in the terminal). Once the buffer is drained (or below some low watermark) a 'drain' event could notify the producer to resume the data stream.

This basically resembles the old stream API of nodejs:

pty.on('data', data => {
  if (!terminal.write(data)) {
    pty.pause();
  }
});
terminal.on('drain', () => pty.resume());

With websockets in between it gets more complicated - we have first to find a way to send the PAUSE/RESUME events through to the backend with low performance impact.

Additional note:
The OOM problem prolly does not exist on electron systems, that use the pty and xterm.js terminal in one JS context due to forced serialization in one event loop. Here backpressure is kinda "cheated" by one loop run taking longer and longer, thus the pty OS buffer will effectively pause the slave program. Since it still will hurt the overall performance due to slowing down the event loop a real backpressure mechanism, that can step in earlier, might be a benefit.

@Tyriar
Copy link
Member

Tyriar commented May 17, 2019

Maybe we should split this issue into 2 separate ones as I think there are 2 distinct problems:

  1. writeBuffer keeps growing until OOM is hit unless it's completely empty (should be a pretty easy fix)
  2. Back pressure doesn't work when using web sockets. This one's a little more involved, I see the solution to this living inside the attach addon but it depends on what's needed I guess. If we need to expose events or some other API on Terminal then we can consider that.

@jerch
Copy link
Member Author

jerch commented May 21, 2019

@Tyriar Yes its basically two issues in one report.
Gonna fix the first one first as its quite easy to lower the memory burden.

The second thing needs more thinking about the right way to integrate it. To me it seems all use cases are affected, except one - electron with pty and xterm.js within one event loop.

@Tyriar Tyriar changed the title input memory exhaustion / flow control Flow control/back pressure May 21, 2019
@Tyriar Tyriar added area/performance type/enhancement Features or improvements to existing features and removed important type/bug Something is misbehaving labels May 21, 2019
@Tyriar
Copy link
Member

Tyriar commented May 21, 2019

Moved the bug part to #2108

@Tyriar Tyriar added type/documentation Regarding website, API documentation, README, etc. and removed area/performance type/enhancement Features or improvements to existing features labels Oct 7, 2019
@Tyriar
Copy link
Member

Tyriar commented Oct 7, 2019

This will be mainly a documentation task now I think?

@jerch
Copy link
Member Author

jerch commented Oct 24, 2019

Left to do:

  • document a simple but effective flow control mechanism on webpage
  • example impl for the demo + tests

@jerch
Copy link
Member Author

jerch commented Nov 7, 2019

documentation done in xtermjs/xtermjs.org#103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/documentation Regarding website, API documentation, README, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants