Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential cursor redrawing crash #2965

Merged
merged 2 commits into from
Oct 18, 2019

Conversation

skyline75489
Copy link
Collaborator

@skyline75489 skyline75489 commented Sep 29, 2019

Summary of the Pull Request

Fixes crash introduced by #2932

References

PR Checklist

Detailed Description of the Pull Request / Additional comments

See #2932

@skyline75489
Copy link
Collaborator Author

I've been tweaking around with #2932 and my other related PRs. For me if we don't include this fix, we better revert #2932 in case of very frequent crashes.

@skyline75489
Copy link
Collaborator Author

I found #2924 is also related to this. Feel free to cherry-pick or close this.

@DHowett-MSFT
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@DHowett-MSFT
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@DHowett-MSFT
Copy link
Contributor

You will probably need to merge in master to get the CI pipeline to work again. Sorry about that!

@skyline75489 skyline75489 force-pushed the fix/cursorRedrawCrash branch 2 times, most recently from 38f259a to 405d183 Compare October 2, 2019 12:26
@skyline75489
Copy link
Collaborator Author

@miniksa This is also a very important PR. If we don’t include this one, we better just revert #2932 , as I’ve experienced severe crashes. To reproduce, just do cmatrix.

@DHowett-MSFT
Copy link
Contributor

I think the real problem here is that some writer isn't locking the console buffer before writing, or some reader isn't locking the console before forcing the cursor to redraw. It looks like Cursor is updated in a great number of places, and it'll be hard to audit callers in conhost (to find good examples of where locking is required around cursor manipulation.)

@skyline75489
Copy link
Collaborator Author

I was thinking the same thing. Furthermore, an obvious design issue is that cursor redrawing is related to the entire text buffer's refreshing. #2932 unintentionally made cursor redrawing slower and make this crush super severe.

@miniksa
Copy link
Member

miniksa commented Oct 11, 2019

Yeah, this isn't right. This is just adjusting the parameters of the problem. We need to identify the locking issue for real.

The discussion about this is now crossing like 4 different PRs and issues. @DHowett-MSFT, is there a follow on task to investigate where the locking has gone wrong here (I saw that you reverted #2932).

@skyline75489 skyline75489 force-pushed the fix/cursorRedrawCrash branch from 405d183 to 7b1a9e1 Compare October 12, 2019 08:10
@skyline75489
Copy link
Collaborator Author

I find that the read lock in SetCursorVisible indeed fixes the crash with no obvious performance panalty.

CC @DHowett-MSFT @miniksa

@skyline75489
Copy link
Collaborator Author

skyline75489 commented Oct 15, 2019

Again theres' a catch ...

Adding lock in SetCursorVisible fixes the crash (use after free) from TermControl::_KeyDownHandler, but the crash (invalid args) from Renderer::PaintFrame() is still there, which #2986 describes.

My original comment in #2932 described these two crashes. And I thought #2924 fixed the invalid args. So I only focused on the use-after-free in this PR. It turns out that the invalid-args crash is still there.

@skyline75489
Copy link
Collaborator Author

skyline75489 commented Oct 16, 2019

OK, here's what I got so far. There're mainly three threads that's competing for _buffer access, which are "Main Thread", "Render Thread", and "Output Thread". "Main Thread" and "Output Thread" compete for the write lock. "Render Thread" competes for the read lock.

With #2932, there're race conditions which is not handled correctly

  1. Crashes involves moving cursor are caused by SetCursorVisible which is called from TermControl::_KeyDownHandler (and TermControl::_BlinkCursor). The access is from main thread. But there is not read lock for this. Adding a read lock fixes this.
  2. Crashes involving moving cursor are also caused by _PaintCursor which is called from Renderer::PaintFrame(). The access is from render thread. When moving cursor quickly, the render thread are likely to get the upperhand and render cursor with old and invalid data, causing invalid args exception. This is especially severe since Patch fix for #1360 until WriteStream (#780) can be implemented. #2924 is reverted, making it very easy for pos.X to be 120.
  3. Crashes involving window resizing are caused by _PaintCursor which is called from Renderer::PaintFrame(). The access is from render thread. The root cause is the lock in main thread, which resides in TermControl::_SwapChainSizeChanged. This is this first function that gets called when resizing happenes. It acquires the lock and modified the buffer using TextBuffer::ResizeTraditional (in Terminal::UserResize). So far so good. But when it releases the lock, it is uncertain that whether render thread or output thread is a winner for the lock. If the output thread wins and acquires the lock, it will modify the buffer to adopt new window size and then notify render thread to paint frame. This route is good. But if it's the other way around then we have a problem. Suppose we have a window size 120 x 80 and cursor is at (21, 5). And we set the new window width to something less than 21. Remember that at this very moment, the buffer still holds the old data which has the cursor position of (21, 5), and the buffer is already modified by TextBuffer::ResizeTraditional. Trying to access the cursor position in buffer will result in invalid-args runtime exception.

This second and third case is obviously more complicated and painful. I may be wrong but I don't think there's easy fix for this.

@skyline75489
Copy link
Collaborator Author

Ideally, the execution order of render thread and output thread should be in our control, as in #3075, quote:

The terminal should read and parse as much data as possible, only stopping for updating its UI according to the monitor refresh rate, typically 60 times per second (or maybe at a hardwired 60Hz if refresh rate can't be detrmined or the concept doesn't exist – I don't know how it goes in VMware). If updating the UI takes so much time that there's hardly any time left for processing incoming data, it should start dropping frames.

The basic infrastructure is exactly like it. The render thread has a sleep interval that's limited to s_FrameLimitMilliseconds. But the output thread spend so much time doing layout work in _outputHandlers(in ConhostConnection::_OutputThread, which is essentially Terminal::Write). While the output thread is busy doing text layout, the render thread can easily be awaken from sleep and start rendering, adding extra burden to both CPU and GPU. This is the root cause of the slowness described in #3075, and also the root cause of these messy crashes.

I crafted some PR that's intended to reduce the CPU usage in output thread. But apparently it's not enough. Maybe the ultimate solution would be adding another thread for layout, freeing output thread from all those heavy work. I don't have a clear thought on this yet.

@skyline75489 skyline75489 force-pushed the fix/cursorRedrawCrash branch from 7b1a9e1 to 30956d7 Compare October 17, 2019 09:39
@skyline75489
Copy link
Collaborator Author

skyline75489 commented Oct 17, 2019

There you have it. Now it fixes the crashes. But truly this is a ugly solution. I wish there could be a more elegant way.

@@ -533,6 +553,8 @@ void Terminal::_InitializeColorTable()
// - isVisible: whether the cursor should be visible
void Terminal::SetCursorVisible(const bool isVisible) noexcept
{
auto lock = LockForReading();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you're setting something about the buffer, this would be a LockForWriting(), not reading.

@@ -425,6 +440,11 @@ void Terminal::_WriteBuffer(const std::wstring_view& stringView)
}
}

if (proposedCursorPosition.X > bufferSize.RightInclusive())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intentionally put this back? It looks like @DHowett-MSFT just removed this in #3212

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line the caused regression was proposedCursorPosition.Y++;. I only took the first line and reset position.X to a legal value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately caused #3277.

auto cursorPosition = cursor.GetPosition();
if (cursorPosition.X > viewportSize.X)
{
cursorPosition.X = viewportSize.X - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused as to why the X dimension is reset to Size.X-1 and the Y direction is reset to Size.Y exactly.

I would think that the rectangle is either inclusive in both dimensions, or exclusive on both directions.

Additionally, the std::min/std::max functions are great for situations like this as a short hand way of writing this.

Copy link
Collaborator Author

@skyline75489 skyline75489 Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime failure is caused by

THROW_HR_IF(E_INVALIDARG, !limits.IsInBounds(pos));

The limits.IsInBounds check is using pos.X < RightExclusive() so I have to use Size.X - 1 to pass it. Practially I've not seen crashes caused by Size.Y. But you are right, It should be unifed.

However, I don't really know if viewportSize.X - 1 is the right thing to do. I did it just to pass the runtime assertion

@miniksa miniksa added the Severity-Blocking We won't ship a release like this! No-siree. label Oct 17, 2019
@miniksa
Copy link
Member

miniksa commented Oct 17, 2019

Tagging as blocking. We'd like to either refine this and bring it in before we cut 0.6 or roll back #2932 and then perhaps release a patch release to 0.6 bringing it back later next week.

@skyline75489
Copy link
Collaborator Author

The thread dance is too complicated to fix. So I just tried limiting the cursor position.

If we decided to rollback #2932 we'd better still keep an eye on the thread problem, which may be the cause of many other crashes like this.

cursorPosition.X = std::min(cursorPosition.X, static_cast<SHORT>(viewportSize.X - 1));
cursorPosition.Y = std::min(cursorPosition.Y, static_cast<SHORT>(viewportSize.Y - 1));

cursor.StartDeferDrawing();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks strange but it's necessary. Because SetPosition will call RedrawCursor, and eventually find its way to Terminal::IsCursorDoubleWidth(), causing the crash. So I had to disable redrawing first, correct the position, and enable redrawing. IMO the SetPosition (and related method) are too aggresive doing redrawing, which is not a good design.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. We should likely fix the design on the redrawing and come up with something better. We can take this for now. I know it's ugly and complicated to fix the right way, but I'm glad you found something workable.

Copy link
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this. Well... mostly. I'm overall as unhappy as @skyline75489 at how messed up this whole situation is. But as long as this alleviates some of the crashing for now... it's what we need to get 0.6 out and we should revisit the cursor drawing and some of the threading locks before 1.0, hopefully.

@miniksa
Copy link
Member

miniksa commented Oct 18, 2019

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@miniksa
Copy link
Member

miniksa commented Oct 18, 2019

@DHowett-MSFT brings up the point that this might make some sort of reverse line wrap bug with the X=0, but I'm willing to take it right now to stop crashing and cut a 0.6 candidate.

@miniksa miniksa added this to the Terminal-1910 milestone Oct 18, 2019
@miniksa miniksa added the Needs-Second It's a PR that needs another sign-off label Oct 18, 2019
@carlos-zamora carlos-zamora merged commit 926a2e3 into microsoft:master Oct 18, 2019
@skyline75489 skyline75489 deleted the fix/cursorRedrawCrash branch October 20, 2019 05:17
@ghost ghost removed the Needs-Second It's a PR that needs another sign-off label Oct 22, 2019
DHowett-MSFT pushed a commit that referenced this pull request Oct 22, 2019
DHowett-MSFT pushed a commit that referenced this pull request Oct 22, 2019
Revert "Fix cursor redrawing crash (#2965)"
This reverts commit 926a2e3.

Revert "Fix double width cursor for CJK characters (#2932)"
This reverts commit eafa884.

Fixes #3277.
Fully reverts #2965, #2932.
miniksa pushed a commit that referenced this pull request Oct 22, 2019
Revert "Fix cursor redrawing crash (#2965)"
This reverts commit 926a2e3.

Revert "Fix double width cursor for CJK characters (#2932)"
This reverts commit eafa884.

Fixes #3277.
Fully reverts #2965, #2932.
ghost pushed a commit that referenced this pull request Apr 15, 2020
# Summary of the Pull Request
This PR will allow the cursor to be double width when on top of a double width character. This required changing `IsCursorDoubleWidth` to check whether the glyph the cursor's on top of is double width. This code is exactly the same as the original PR that addressed this issue in #2932. That one got reverted at some point due to the crashes related to it, but due to a combination of Terminal having come further since that PR and other changes to address use-after-frees, some of the crashes may/may not be relevant now. The ones that seemed to be relevant/repro-able, I attempt to address in this PR.

The `IsCursorDoubleWidth` check would fail during the `TextBuffer::Reflow` call inside of `Terminal::UserResize` occasionally, particularly when `newCursor.EndDeferDrawing()` is called. This is because when we tell the newCursor to `EndDefer`, the renderer will attempt to redraw the cursor. As part of this redraw, it'll ask if `IsCursorDoubleWidth`, and if the renderer managed to ask this before `UserResize` swapped out the old buffer with the new one from `Reflow`, the renderer will be asking the old buffer if its out-of-bounds cursor is double width. This was pretty easily repro'd using `cmatrix -u0` and resizing the window like a madman.

As a solution, I've moved the Start/End DeferDrawing calls out of `Reflow` and into `UserResize`. This way, I can "clamp" the portion of the code where the newBuffer is getting created and reflowed and swapped into the Terminal buffer, and only allow the renderer to draw once the swap is done. This also means that ConHost's `ResizeWithReflow` needed to change slightly.

In addition, I've added a WriteLock to `SetCursorOn`. It was mentioned as a fix for a crash in #2965 (although I can't repro), and I also figured it would be good to try to emulate where ConHost locks with regards to Cursor operations, and this seemed to be one that we were missing.

## PR Checklist
* [x] Closes #2713
* [x] CLA signed
* [x] Tests added/passed

## Validation Steps Performed
Manual validation that the cursor is indeed chonky, added a test case to check that we are correctly saying that the cursor is double width (not too sure if I put it in the right place). Also open to other test case ideas and thoughts on what else I should be careful for since I am quite nervous about what other crashes might occur.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Severity-Blocking We won't ship a release like this! No-siree.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash in TextBufferCellIterator::TextBufferCellIterator() when moving cursor
4 participants