Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize hot path in textBufferCellIterator #10621

Merged
10 commits merged into from
Jul 27, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion src/buffer/out/OutputCellView.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Module Name:
- OutputCellView.hpp
Abstract:
- Read-only view into a single cell of data that someone is attempting to write into the output buffer.
- Read view into a single cell of data that someone is attempting to write into the output buffer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's.. not read-only any more? You can write through it?

Copy link
Collaborator Author

@skyline75489 skyline75489 Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, you can now UpdateSomething in it, which disqualified the “read-only” part, right?

Update: ah I now see what you mean. No, you can't write through it to the real buffer. It's still only a "view" of the buffer. But now you can update properties in it, so I don't think it's "read-only" anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I've tried various ways to keep the original _GenerateView (and preserve the read-only-ness in OutputCellView), but I failed to find a way to achieve the same level of performance as the current implementation. I've discussed with @lhecker about this, and he seems to concur with the solution.

- This is done for performance reasons (avoid heap allocs and copies).
Author:
Expand Down Expand Up @@ -36,6 +36,21 @@ class OutputCellView
TextAttribute TextAttr() const noexcept;
TextAttributeBehavior TextAttrBehavior() const noexcept;

void UpdateText(const std::wstring_view& view) noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why calling each of these bits separately is more performant than replacing them all at once via a new-construction. It's also sort of confusing why Behavior is left out, but the other 3 properties can be changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like exposing these methods, either. I love the idea of read-only view. Here's my understanding of why this helps.

Before this PR, the code looks like this:

const auto diff = gsl::narrow_cast<ptrdiff_t>(newPos.X) - gsl::narrow_cast<ptrdiff_t>(_pos.X);_
_attrIter += diff;
// Here we know _attrIter is hot and in cache.
//
// ...Some code here.
_view = OutputCellView(_pRow->GetCharRow().GlyphAt(_pos.X),
                           _pRow->GetCharRow().DbcsAttrAt(_pos.X),
                           // After all the _pRow operations, _attrIter may not be in cache anymore.
                           *_attrIter,
                           TextAttributeBehavior::Stored);

After this PR:

const auto diff = gsl::narrow_cast<ptrdiff_t>(newX) - gsl::narrow_cast<ptrdiff_t>(oldX);
_attrIter += diff;
// We know for sure _attrIter is hot. And after this point, it can be safely discarded.
 _view.UpdateTextAttribute(*_attrIter);

This is why I think the fine-grained UpdateSomething helps the performance.

I'd love to see how @lhecker see this in the assembly view, which would be more accurate. I for one can't read assembly that well yet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why Behavior is left out, but the other 3 properties can be changed.

Well, it's because I don't need Behavior in this PR 😅

{
_view = view;
};

void UpdateDbcsAttribute(const DbcsAttribute& dbcsAttr) noexcept
{
_dbcsAttr = dbcsAttr;
}

void UpdateTextAttribute(const TextAttribute& textAttr) noexcept
{
_textAttr = textAttr;
}

bool operator==(const OutputCellView& view) const noexcept;
bool operator!=(const OutputCellView& view) const noexcept;

Expand Down
117 changes: 85 additions & 32 deletions src/buffer/out/textBufferCellIterator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -95,18 +95,66 @@ bool TextBufferCellIterator::operator!=(const TextBufferCellIterator& it) const
TextBufferCellIterator& TextBufferCellIterator::operator+=(const ptrdiff_t& movement)
{
ptrdiff_t move = movement;
auto newPos = _pos;
while (move > 0 && !_exceeded)
if (move < 0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say we are doing the "1 million yes" benchmark with a typical 120x80 windows. The += operator will execute 120 x 1000000 = 120000000 times. In this particular method, I think it's safe to say that every single instruction counts.

First, this line here is the early branching to avoid the very rare case (move < 0) and leave it to -= operator. Next I replace the IncrementInBounds with an simplified, cache-friendly version, without compromising functionally. And finally I replace the _GenerateView with fine-grained Update methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if _GenerateView is that bad... I feel like we should replace all usages of it, not just this one += usage of it.

Same goes for IncrementInBounds and DecrementInbounds.

I'm OK with it being more complex inside this operator specifically as long as it's well documented why things are ordered and computed in the way they are (like specifically calling out which items below are taking advantage of cache improvements so no one tries to refactor it out to be "easier to read" but significantly worse performance in the future.)

Copy link
Collaborator Author

@skyline75489 skyline75489 Jul 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the "early optimization is the root of all evil" cliche, I'd like to point out that _GenerateView, IncrementInBounds and DecrementInbounds are not actually expensive methods in terms of CPU costs.

Take a look at my "before" screenshot and you will see WalkInBoundsCircular (which is essentially what IncrementInBounds calls) is only 1.35% of the CPU time. And it's in fact accurate, because that method itself isn't guilty for the performance drop. It's the unfortune combination of a whole lot of other methods in the hot paths that ruins the icache (instruction cache) or dcache (data cache) at some particular point.

Another example: I've actually tried to remove the code inside this method step by step. At some point, even a single line like _pos = newPos can show up >7% in the trace, which is obviously not a really expensive operation, but triggers the cache miss.

In conclusion, it takes accurate benchmarking to "port" this optimization somewhere else. And the optimization may very well be NOT necessary in other code paths.

I got the inspiration from @lhecker and he might want to add his opinions here, too. I'll try to add more comments as documentation to prevent "easier to read" refactoring in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great justification. I'm fine with you not optimizing the other ones. Just call out some of this in the comments in the code and I'm sold.

{
return (*this) -= (-move);
}

auto oldX = _pos.X;
auto newX = oldX;
auto newY = _pos.Y;
bool yChanged = false;
const auto boundsRightInclusive = _bounds.RightInclusive();
const auto boundsLeft = _bounds.Left();
const auto boundsBottomInclusive = _bounds.BottomInclusive();
const auto boundsTop = _bounds.Top();
while (move > 0)
{
_exceeded = !_bounds.IncrementInBounds(newPos);
if (newX == boundsRightInclusive)
{
newX = boundsLeft;
newY++;
yChanged = true;
if (newY > boundsBottomInclusive)
{
newY = boundsTop;
_exceeded = true;
break;
}
}
else
{
newX++;
_exceeded = false;
}
move--;
}
while (move < 0 && !_exceeded)

if (_exceeded)
{
_exceeded = !_bounds.DecrementInBounds(newPos);
move++;
return (*this);
}

if (yChanged)
{
_pRow = s_GetRow(_buffer, { newX, newY });
_attrIter = _pRow->GetAttrRow().cbegin() + newX;
_pos.X = newX;
_pos.Y = newY;
_GenerateView();
}
else
{
const auto diff = gsl::narrow_cast<ptrdiff_t>(newX) - gsl::narrow_cast<ptrdiff_t>(oldX);
_attrIter += diff;
_view.UpdateTextAttribute(*_attrIter);

const CharRow& charRow = _pRow->GetCharRow();
_view.UpdateText(charRow.GlyphAt(newX));
_view.UpdateDbcsAttribute(charRow.DbcsAttrAt(newX));
_pos.X = newX;
}
_SetPos(newPos);

return (*this);
}

Expand All @@ -118,7 +166,36 @@ TextBufferCellIterator& TextBufferCellIterator::operator+=(const ptrdiff_t& move
// - Reference to self after movement.
TextBufferCellIterator& TextBufferCellIterator::operator-=(const ptrdiff_t& movement)
{
return this->operator+=(-movement);
ptrdiff_t move = movement;
if (move < 0)
{
return (*this) += (-move);
}

auto newPos = _pos;
while (move > 0 && !_exceeded)
{
_exceeded = !_bounds.DecrementInBounds(newPos);
move--;
}

if (newPos.Y != _pos.Y)
{
_pRow = s_GetRow(_buffer, newPos);
_attrIter = _pRow->GetAttrRow().cbegin();
_pos.X = 0;
}

if (newPos.X != _pos.X)
{
const auto diff = gsl::narrow_cast<ptrdiff_t>(newPos.X) - gsl::narrow_cast<ptrdiff_t>(_pos.X);
_attrIter += diff;
}

_pos = newPos;

_GenerateView();
return (*this);
}

// Routine Description:
Expand Down Expand Up @@ -197,30 +274,6 @@ ptrdiff_t TextBufferCellIterator::operator-(const TextBufferCellIterator& it)
return _bounds.CompareInBounds(_pos, it._pos);
}

// Routine Description:
// - Sets the coordinate position that this iterator will inspect within the text buffer on dereference.
// Arguments:
// - newPos - The new coordinate position.
void TextBufferCellIterator::_SetPos(const COORD newPos)
{
if (newPos.Y != _pos.Y)
{
_pRow = s_GetRow(_buffer, newPos);
_attrIter = _pRow->GetAttrRow().cbegin();
_pos.X = 0;
}

if (newPos.X != _pos.X)
{
const auto diff = gsl::narrow_cast<ptrdiff_t>(newPos.X) - gsl::narrow_cast<ptrdiff_t>(_pos.X);
_attrIter += diff;
}

_pos = newPos;

_GenerateView();
}

// Routine Description:
// - Shortcut for pulling the row out of the text buffer embedded in the screen information.
// We'll hold and cache this to improve performance over looking it up every time.
Expand Down
1 change: 0 additions & 1 deletion src/buffer/out/textBufferCellIterator.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@ class TextBufferCellIterator
COORD Pos() const noexcept;

protected:
void _SetPos(const COORD newPos);
void _GenerateView();
static const ROW* s_GetRow(const TextBuffer& buffer, const COORD pos);

Expand Down