Dedicated "paint" interface for VtRenderer #10596

skyline75489 · 2021-07-09T10:55:32Z

Description

As shown in #10563, ConPTY suffers from significant performance drop with too much scrolling request. A typical repro is:

time bash -c 'yes | head -n1000000'

This is a typical trace of OpenConsole.exe under the above workload:

Discussion

The main issue that caused this, I think, is that VtRenderer suffers from using the same interface as other graphic renderer. To break it down:

Carriage return & linefeed can trigger graphic renderer to circling the buffer & force repaint. But for VtRenderer, it dosen't really need to do too much, other than print the newly added line. It doesn't really need to flush the buffer (EndPaint as shown above), because the actual text added to the buffer is only a single line (120 chars in a typical display).
The _PaintBufferOutputHelper method insider renderer is obviously graphic-oriented. After all these years of development, I don't think it suits the need for VtRenderer anymore. An example is Reduce string operation cost in VtEngine::PaintBufferLine #10567, in which I try to reduce the cost of reassemble the line buffer. The reason why we need to reassemble the clusters, is because they are produced by splitting up the original buffer in _PaintBufferOutputHelper, which is heavily required by graphic renderers. If we have a dedicated interface, unnecessary operations like these can be eliminated.
In VtEngine & XtermEngine, the actual number of methods that're both overridden and practically useful is rather small, comparing to the large amount of base methods in IRenderEngine. This indicates that the fundamental goal of VtEngine isn't really the same as the graphic renderers.

Proposed solution

I'm proposing a three-phase refactoring process for VtEngine to move away from IRenderEngine:

Separate the methods in VtEngine that are heavily burdened by the current design, such as _PaintBufferOutputHelper. We add them as base methods to IRenderEngine. VtEngine is still a child class of IRenderEngine, but we add dedicated IO methods in it and test the performance gain.
Separate the rest the methods in VtEngine same way as above.
When we finished the first two steps, we should have a clear path towards adding a dedicated interface just for VtEngine, probably name it IVtEngine. Then we can create IVtEngine from IRenderEngine, and let IRenderEngine be what's left in it.

CC @DHowett @miniksa @zadjii-msft @lhecker @j4james for discussion.

(How can I just At everyone who's collaborator of this project. That would be helpful.)

The text was updated successfully, but these errors were encountered:

skyline75489 · 2021-07-09T11:47:55Z

Ideally I would really want this to be a smooth transition just as my previous DxFontRenderData related work. But for this particular issue, the difficulty is obvious. Personally I'm not very familiar with legacy conhost code, which is where I think most of the obstacles reside. I could really use some help from you guys on this. I hate to break legacy code and cause troubles for the team.

I'm expecting 50x-100x increment in throughput out of this task, which should be easily measurable after the first step is finished.

miniksa · 2021-07-09T17:41:06Z

In your trace... can you expand EndPaint? Is it the WriteFile call that is making it take so long?

I don't necessarily want to start from the conclusion that we need to throw the whole architecture out when it could just be that we need to decouple the blocking write action from the rest of the processing that generates text.

lhecker · 2021-07-09T17:58:54Z

An interim fix would indeed be to add a buffer like the til::spsc queue in between the renderer and WriteFile call.

In the long term it might be worthwhile though to see if VT handling without the renderer lends itself to a simpler implementation, as we wouldn't be constrained to any drawing-oriented logic anymore.
For instance printing a newline at the moment necessarily triggers a VT draw (as can be seen above), which is quite heavy, requires all sorts of class instantiations and flushes the output. Preventing this means adding more special case logic. Inversely the drawing-oriented renderer code is bound to also support the VT renderer and its quirks. We basically have to fit a single shoe to all feet at the moment.
If we didn't use the renderer infrastructure for VT, we could potentially reduce code complexity and quite likely improve performance rather effectively.

Basically what I'm saying is: If we add a buffer and decouple the renderer and WriteFile call, isn't this a "band aid fix", because the architecture doesn't support what we want to do? Solving this from the ground up with our 2021 knowledge how we need to do things, might allow us to drastically simplify our code.

DHowett · 2021-07-09T18:08:27Z

In the long term it might be worthwhile though to see if VT handling without the renderer lends itself to a simpler implementation, as we wouldn't be constrained to any drawing-oriented logic anymore.

The Vt Rendering engine rather concerns itself with turning a console buffer--which can be written by applications using any of the Win32 console APIs--into a stream of VT that represents that buffer. Yes, we can probably add a path to each API implementation that does something different if conhost is in pty mode, but I don't expect that it's going to provide the correctness (or peri-correctness) that snapping the buffer and presenting it as though it is a screen does.

conhost has to soak up a lot of weird stuff like writing an attribute (only!) over a range of characters (for which there's no equivalent VT sequence), copying a region from position A to position B, etc.

miniksa · 2021-07-09T18:09:31Z

I just always thought the long term VT handling without the renderer framework was VT pass through mode... not rearchitecting the VT renderer.

This is a discussion though so I'm not ruling anything out.

skyline75489 · 2021-07-09T19:54:25Z

To Dustin: I like the idea of VtEngine is subclass of IRenderEngine because I think it’s cool and also reasonable. The thing is, there’s also cons in this approach, which is what I listed above. I’m OK with still considering IVtEngine a similar interface as IRenderEngine, because this is the way to minimize the changeset and preserve correctness. I would love to see special handling for IVtEngine in renderer.cpp though, because it provides us with chances for all those optimization. To Michael: yes WriteFile is one of the things that take long time. But the thing l hate the most (and takes double the time than WritrFile) is in PaintBufferLineHelper. For the example in the issue, if you print 1000000 new line, with 120 chars in a line, you having to call a bunch of code for 120000000 times in PaintBufferLineHelper, some of them are about fetching patternId, which provides no functionality in VtEngine. This is why I feel “there’s something wrong arch in the architecture”. I also consider ConPTY passthrough as the eventual plan. This proposal here is more of a low-hanging fruit comparing to ConPTY passthrough, which requires zero modification in the client side(for example Alacritty). To Leonard: I was asking Michael about the spsc just yesterday. Michael has his own idea about that so I can’t speak for him. To everyone: no I am not losing sleep because of the controversy. I am only having trouble with a mosquito or two that I really have to kill now. 获取 Outlook for iOS<https://aka.ms/o0ukef>

miniksa · 2021-07-09T20:13:50Z

There are other deficiencies in the architecture which we may be able to capture in a revision with something like flags or stages that each individual engine can opt into or out of to reduce the amount of work that is done in the parent while still maintaining the advantages of a unified interface.

For instance, the DirectX pipeline is having performance troubles as well because PaintBufferLineHelper divides up any time the colors change when it could happily accept a run with multiple color specifications instead of hopping in and out and in and out for every color change.

I just think we might be able to pull those requirements together with the ones for VT and made an adaptation that has the best of all worlds.

I do agree with you that the base was written 100% with GDI in mind as the only target. And it was just convenient to us that the BGFX and WDDM and then the VT and DX ones worked on the same interface, even if suboptimally. It could be time to refine it taking all of their unique considerations in mind.

j4james · 2021-07-09T22:38:08Z

conhost has to soak up a lot of weird stuff like writing an attribute (only!) over a range of characters (for which there's no equivalent VT sequence), copying a region from position A to position B, etc.

Well there's DECCARA (Change Attributes in Rectangular Area), and DECCRA (Copy Rectangular Area), which more or less cover those operations. And while I haven't done an audit of all the console APIs, I'm optimistic that we should be able to reproduce most of them reasonably well with existing VT sequences.

It's certainly my long term plan to try and completely replace the current conpty layer with a pure pass-through implementation, where the legacy console APIs are translated directly to VT sequences without an intermediate buffer (there may still need to be a buffer for the buffer reading APIs, but the conpty layer would ideally not be rendering from that buffer).

Whether I can ever actually achieve that plan is another question, and of course you may not want to accept a PR of that sort, but that's the direction I'm heading (very very slowly I'll admit).

So from my point of view, the VtRenderer is a stop-gap solution which I hope we can drop one day. But realistically that may never happen, so I'm totally in favour of any improvements to the performance in the meantime.

skyline75489 · 2021-07-10T11:41:44Z

I managed to drop the time needed 1000000 “yes” from 8.6s down to 4.5s, with the method I described (delay the Flush and add optimized path for VtEngine circling). With a bit more fine-tuning and aggressive optimization, I can see a final 3.3s result. I am honestly disappointed by this because I have to find the 47x boost somewhere else. 获取 Outlook for iOS<https://aka.ms/o0ukef>

skyline75489 · 2021-07-11T02:17:18Z

Structurally I think #10610 should be considered the prior task. After #10610 is done, I can add the optimization described in this PR without messing around the IRenderEngine interface any more. My current solution is very ugly.

For the record, what I did in my experimental branch is adding another method in IRenderEngine for renderer.cpp to determine which engine is used:

For example in VtEngine:

[[nodiscard]] RenderEngineKind Kind() noexcept
{
    return RenderEngineKind::VirturlTerminal;
}

Then in renderer we can do this:

if (pEngine->Kind() == RenderEngineKind::VirturlTerminal) {
    // ...
}

skyline75489 · 2021-07-18T11:35:41Z

Closed in favor of #10610 & #10615

skyline75489 added the Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. label Jul 9, 2021

ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Jul 9, 2021

skyline75489 added Area-VT Virtual Terminal sequence support Product-Conpty For console issues specifically related to conpty labels Jul 9, 2021

ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Jul 9, 2021

This was referenced Jul 9, 2021

[Performance] Replace std::vector with more lightweight structure in RenderData::GetPatternId #10564

Open

Use memcmp for TextAttribute & TextColor comparison #10566

Merged

skyline75489 mentioned this issue Jul 11, 2021

Reduce virtual function usage to mitigate performance penalty due to CFG #10610

Open

skyline75489 closed this as completed Jul 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dedicated "paint" interface for VtRenderer #10596

Dedicated "paint" interface for VtRenderer #10596

skyline75489 commented Jul 9, 2021

skyline75489 commented Jul 9, 2021 •

edited

Loading

miniksa commented Jul 9, 2021

lhecker commented Jul 9, 2021 •

edited

Loading

DHowett commented Jul 9, 2021

miniksa commented Jul 9, 2021

skyline75489 commented Jul 9, 2021 via email •

edited by ghost

Loading

miniksa commented Jul 9, 2021

j4james commented Jul 9, 2021

skyline75489 commented Jul 10, 2021 via email •

edited by ghost

Loading

skyline75489 commented Jul 11, 2021 •

edited

Loading

skyline75489 commented Jul 18, 2021

Dedicated "paint" interface for VtRenderer #10596

Dedicated "paint" interface for VtRenderer #10596

Comments

skyline75489 commented Jul 9, 2021

Description

Discussion

Proposed solution

skyline75489 commented Jul 9, 2021 • edited Loading

miniksa commented Jul 9, 2021

lhecker commented Jul 9, 2021 • edited Loading

DHowett commented Jul 9, 2021

miniksa commented Jul 9, 2021

skyline75489 commented Jul 9, 2021 via email • edited by ghost Loading

miniksa commented Jul 9, 2021

j4james commented Jul 9, 2021

skyline75489 commented Jul 10, 2021 via email • edited by ghost Loading

skyline75489 commented Jul 11, 2021 • edited Loading

skyline75489 commented Jul 18, 2021

skyline75489 commented Jul 9, 2021 •

edited

Loading

lhecker commented Jul 9, 2021 •

edited

Loading

skyline75489 commented Jul 9, 2021 via email •

edited by ghost

Loading

skyline75489 commented Jul 10, 2021 via email •

edited by ghost

Loading

skyline75489 commented Jul 11, 2021 •

edited

Loading