Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedicated "paint" interface for VtRenderer #10596

Closed
skyline75489 opened this issue Jul 9, 2021 · 11 comments
Closed

Dedicated "paint" interface for VtRenderer #10596

skyline75489 opened this issue Jul 9, 2021 · 11 comments
Labels
Area-VT Virtual Terminal sequence support Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Product-Conpty For console issues specifically related to conpty

Comments

@skyline75489
Copy link
Collaborator

Description

As shown in #10563, ConPTY suffers from significant performance drop with too much scrolling request. A typical repro is:

time bash -c 'yes | head -n1000000'

This is a typical trace of OpenConsole.exe under the above workload:

image

Discussion

The main issue that caused this, I think, is that VtRenderer suffers from using the same interface as other graphic renderer. To break it down:

  1. Carriage return & linefeed can trigger graphic renderer to circling the buffer & force repaint. But for VtRenderer, it dosen't really need to do too much, other than print the newly added line. It doesn't really need to flush the buffer (EndPaint as shown above), because the actual text added to the buffer is only a single line (120 chars in a typical display).
  2. The _PaintBufferOutputHelper method insider renderer is obviously graphic-oriented. After all these years of development, I don't think it suits the need for VtRenderer anymore. An example is Reduce string operation cost in VtEngine::PaintBufferLine #10567, in which I try to reduce the cost of reassemble the line buffer. The reason why we need to reassemble the clusters, is because they are produced by splitting up the original buffer in _PaintBufferOutputHelper, which is heavily required by graphic renderers. If we have a dedicated interface, unnecessary operations like these can be eliminated.
  3. In VtEngine & XtermEngine, the actual number of methods that're both overridden and practically useful is rather small, comparing to the large amount of base methods in IRenderEngine. This indicates that the fundamental goal of VtEngine isn't really the same as the graphic renderers.

Proposed solution

I'm proposing a three-phase refactoring process for VtEngine to move away from IRenderEngine:

  1. Separate the methods in VtEngine that are heavily burdened by the current design, such as _PaintBufferOutputHelper. We add them as base methods to IRenderEngine. VtEngine is still a child class of IRenderEngine, but we add dedicated IO methods in it and test the performance gain.
  2. Separate the rest the methods in VtEngine same way as above.
  3. When we finished the first two steps, we should have a clear path towards adding a dedicated interface just for VtEngine, probably name it IVtEngine. Then we can create IVtEngine from IRenderEngine, and let IRenderEngine be what's left in it.

CC @DHowett @miniksa @zadjii-msft @lhecker @j4james for discussion.

(How can I just At everyone who's collaborator of this project. That would be helpful.)

@skyline75489 skyline75489 added the Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. label Jul 9, 2021
@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Jul 9, 2021
@skyline75489 skyline75489 added Area-VT Virtual Terminal sequence support Product-Conpty For console issues specifically related to conpty labels Jul 9, 2021
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Jul 9, 2021
@skyline75489
Copy link
Collaborator Author

skyline75489 commented Jul 9, 2021

Ideally I would really want this to be a smooth transition just as my previous DxFontRenderData related work. But for this particular issue, the difficulty is obvious. Personally I'm not very familiar with legacy conhost code, which is where I think most of the obstacles reside. I could really use some help from you guys on this. I hate to break legacy code and cause troubles for the team.

I'm expecting 50x-100x increment in throughput out of this task, which should be easily measurable after the first step is finished.

@miniksa
Copy link
Member

miniksa commented Jul 9, 2021

In your trace... can you expand EndPaint? Is it the WriteFile call that is making it take so long?

I don't necessarily want to start from the conclusion that we need to throw the whole architecture out when it could just be that we need to decouple the blocking write action from the rest of the processing that generates text.

@lhecker
Copy link
Member

lhecker commented Jul 9, 2021

An interim fix would indeed be to add a buffer like the til::spsc queue in between the renderer and WriteFile call.

In the long term it might be worthwhile though to see if VT handling without the renderer lends itself to a simpler implementation, as we wouldn't be constrained to any drawing-oriented logic anymore.
For instance printing a newline at the moment necessarily triggers a VT draw (as can be seen above), which is quite heavy, requires all sorts of class instantiations and flushes the output. Preventing this means adding more special case logic. Inversely the drawing-oriented renderer code is bound to also support the VT renderer and its quirks. We basically have to fit a single shoe to all feet at the moment.
If we didn't use the renderer infrastructure for VT, we could potentially reduce code complexity and quite likely improve performance rather effectively.

Basically what I'm saying is: If we add a buffer and decouple the renderer and WriteFile call, isn't this a "band aid fix", because the architecture doesn't support what we want to do? Solving this from the ground up with our 2021 knowledge how we need to do things, might allow us to drastically simplify our code.

@DHowett
Copy link
Member

DHowett commented Jul 9, 2021

In the long term it might be worthwhile though to see if VT handling without the renderer lends itself to a simpler implementation, as we wouldn't be constrained to any drawing-oriented logic anymore.

The Vt Rendering engine rather concerns itself with turning a console buffer--which can be written by applications using any of the Win32 console APIs--into a stream of VT that represents that buffer. Yes, we can probably add a path to each API implementation that does something different if conhost is in pty mode, but I don't expect that it's going to provide the correctness (or peri-correctness) that snapping the buffer and presenting it as though it is a screen does.

conhost has to soak up a lot of weird stuff like writing an attribute (only!) over a range of characters (for which there's no equivalent VT sequence), copying a region from position A to position B, etc.

@miniksa
Copy link
Member

miniksa commented Jul 9, 2021

I just always thought the long term VT handling without the renderer framework was VT pass through mode... not rearchitecting the VT renderer.

This is a discussion though so I'm not ruling anything out.

@skyline75489
Copy link
Collaborator Author

skyline75489 commented Jul 9, 2021 via email

@miniksa
Copy link
Member

miniksa commented Jul 9, 2021

There are other deficiencies in the architecture which we may be able to capture in a revision with something like flags or stages that each individual engine can opt into or out of to reduce the amount of work that is done in the parent while still maintaining the advantages of a unified interface.

For instance, the DirectX pipeline is having performance troubles as well because PaintBufferLineHelper divides up any time the colors change when it could happily accept a run with multiple color specifications instead of hopping in and out and in and out for every color change.

I just think we might be able to pull those requirements together with the ones for VT and made an adaptation that has the best of all worlds.

I do agree with you that the base was written 100% with GDI in mind as the only target. And it was just convenient to us that the BGFX and WDDM and then the VT and DX ones worked on the same interface, even if suboptimally. It could be time to refine it taking all of their unique considerations in mind.

@j4james
Copy link
Collaborator

j4james commented Jul 9, 2021

conhost has to soak up a lot of weird stuff like writing an attribute (only!) over a range of characters (for which there's no equivalent VT sequence), copying a region from position A to position B, etc.

Well there's DECCARA (Change Attributes in Rectangular Area), and DECCRA (Copy Rectangular Area), which more or less cover those operations. And while I haven't done an audit of all the console APIs, I'm optimistic that we should be able to reproduce most of them reasonably well with existing VT sequences.

It's certainly my long term plan to try and completely replace the current conpty layer with a pure pass-through implementation, where the legacy console APIs are translated directly to VT sequences without an intermediate buffer (there may still need to be a buffer for the buffer reading APIs, but the conpty layer would ideally not be rendering from that buffer).

Whether I can ever actually achieve that plan is another question, and of course you may not want to accept a PR of that sort, but that's the direction I'm heading (very very slowly I'll admit).

So from my point of view, the VtRenderer is a stop-gap solution which I hope we can drop one day. But realistically that may never happen, so I'm totally in favour of any improvements to the performance in the meantime.

@skyline75489
Copy link
Collaborator Author

skyline75489 commented Jul 10, 2021 via email

@skyline75489
Copy link
Collaborator Author

skyline75489 commented Jul 11, 2021

Structurally I think #10610 should be considered the prior task. After #10610 is done, I can add the optimization described in this PR without messing around the IRenderEngine interface any more. My current solution is very ugly.

For the record, what I did in my experimental branch is adding another method in IRenderEngine for renderer.cpp to determine which engine is used:

For example in VtEngine:

[[nodiscard]] RenderEngineKind Kind() noexcept
{
    return RenderEngineKind::VirturlTerminal;
}

Then in renderer we can do this:

if (pEngine->Kind() == RenderEngineKind::VirturlTerminal) {
    // ...
}

@skyline75489
Copy link
Collaborator Author

Closed in favor of #10610 & #10615

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-VT Virtual Terminal sequence support Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Product-Conpty For console issues specifically related to conpty
Projects
None yet
Development

No branches or pull requests

5 participants