-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use memcmp for TextAttribute & TextColor comparison #10566
Conversation
This feels a lot like a micro-optimization... doing one comparison instead of four and growing the non-default case to five comparisons doesn't seem like it's going to win us any races. Do you have profiling data for this change? If this really is a micro-optimization we want to take on, I'd rather it be using
/cc @lhecker |
(under ARM64 is optimizes down to an actual call to |
+1 for using memcmp on primitive types. |
This comment has been minimized.
This comment has been minimized.
If these 10 comparisons are a significant burden via our compiler, I'm blown away. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
If you were in fact using |
Yeah maybe. I am very new to godbolt and the assembly word. Please don’t hesitate to point out any mistakes I made.
获取 Outlook for iOS<https://aka.ms/o0ukef>
|
For the record, I'm also +1 on using |
This comment has been minimized.
This comment has been minimized.
0b6b525
to
e1c5726
Compare
@@ -8,7 +8,7 @@ Licensed under the MIT license. | |||
#define BG_ATTRS (BACKGROUND_BLUE | BACKGROUND_GREEN | BACKGROUND_RED | BACKGROUND_INTENSITY) | |||
#define META_ATTRS (COMMON_LVB_LEADING_BYTE | COMMON_LVB_TRAILING_BYTE | COMMON_LVB_GRID_HORIZONTAL | COMMON_LVB_GRID_LVERTICAL | COMMON_LVB_GRID_RVERTICAL | COMMON_LVB_REVERSE_VIDEO | COMMON_LVB_UNDERSCORE) | |||
|
|||
enum class ExtendedAttributes : BYTE | |||
enum class ExtendedAttributes : uint16_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to make std::has_unique_object_representations_v<TextAttribute>
happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to keep this as BYTE
& add a uint8_t zeropad
to TextAttribute
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea! If @skyline75489 wants to he could still change it. 🙂
Personally I'd be fine merging it either way though.
OK, the actual performance gain from this PR is very limited. But I still like some of the tricks in it. So feel free to merge this or leave it. I'm moving to #10596 from now on. |
I dunno if I love the tricks given that it doesn't net us much gain. It's cool that we're using C++ fixtures to ensure that we're not doing something ~ ~ weird ~ ~, but... perhaps our optimization talent is better-spent somewhere else? |
@DHowett I was wrong the performance gain from this PR being very limited. With the initial work in #10596 done on my experimental branch, this PR leads to ~10%+ in throughput (3.6s -> 3.3s). And even after the fix the comparison is still somewhat expensive. Let's just leave the PR here and I can come back to this later. |
@lhecker do you want this? I was going to close this, |
@skyline75489 I'm personally still in favor of this PR, but I believe it has to wait for 1.12. |
OK well that's interesting. In any case, looking at the original operator == implementation, the fields are not compared in "natural" order and since && is short-circuiting, it's vaguely possible that the optimizer can't see past the weird order of field access. For some reason the project isn't building right on my machine which is a separate issue I'm working through but you might try just rewriting operator== as... constexpr bool operator==(const TextAttribute& a, const TextAttribute& b) noexcept (basically comparison of _hyperlinkId was at the end instead of second as per the layout) Independently, it seems like you have a basic correctness problem with the sizeof() based memcmp solution. The sizeof(TextAttribute) is going to be 16 but as far as I can tell only 13 bytes are filled in by explicit field references (unless UNIT_TESTING is defined but still then there's a big alignment hole, the contents of which are uninitialized). Fortunately since it's final there's no vtable so you could play the memset(this, 0, sizeof(*this)) game in the constructor but this is really going down a worse path. I think you should coerce the compiler to do the right thing without any of these games. (This could break whatever the stuff is under UNIT_TESTING, if it has vtables, the memset() technique famously will screw that up badly.) I'm trying to find the definitions of TextColor and ExtendedAttributes to understand them better. TextColor seems nontrivial (given what I saw in the TextAttribute constructor) and may be part of the problem. |
Good idea! I tried that and it didn't work unfortunately. I also tried whether the spaceship operator I've put all of them together into a single comparison: https://godbolt.org/z/5qdKxPq4f
The struct has an alignment of 2 and this PR changes the last member to be 2 bytes large. That way the struct should have a
TextColor can be found here: https://github.com/microsoft/terminal/blob/main/src/buffer/out/TextColor.h
I was wondering the same... If I remove the |
Ah btw: |
Thanks I'll figure out my build problems and tinker with this and have some more concrete recommendations. I used to be quite adept at getting the msvc compiler to generate good code in cases like this, I'm getting weird access denied errors in the build. Is there some forum for asking about problems like this? (I'll hold back from obvious spitballing of possible problems since this is not the right context.) |
@EmJayGee Feel free to create an issue or a discussion right here on GitHub, with screenshots or text outputs of the issue you're facing. I'm sure I'm speaking for everyone on the team when I say that we'd be happy to help you set up the project. 🙂 |
What gives you the impression that the struct/class has an alignment (packing?) of 2? I don't see any packing/alignment indications. I've been away from the code for a while so perhaps the norms have changed here but I would expect 8 without coercion. Small members may be packed within it on smaller boundaries but maybe I lost context somewhere. Is there a projectwide setting or something? That would, in general, be a performance problem since 2-byte alignment, in general, is not performant for most 4 byte and larger objects. If you have these assumptions, I would definitely encode them in compile time asserts. I see comments on the data members which perhaps are in a format that some linter recognizes but nothing beats the compiler doing the verification itself. My C++-fu is about 7 years out of date so I need to begin my training over, time to put on the weight vest and enter the gravity chamber. |
The first member, I've added the following static (compile-time) assertions a while ago, ensuring this doesn't get accidentally changed: terminal/src/buffer/out/TextAttribute.cpp Lines 8 to 13 in f588223
(I've put the assertion in the .cpp file, because the corresponding header gets included in way too many files, leading to unnecessary recompilations on every change. No one had time to clean that up yet unfortunately.) When we run-length-encode our struct {
TextAttribute value;
uint16_t length;
}; elements, which given the |
I don't know what your portability requirements are but on Windows, the heap allocators will always give you cache line aligned allocations so you'll be good. Well I left before the arm64 stuff became mainstream so I don't know what the deal is there but you're good to go on the intel/amd architectures. I can do a PR for moving the static asserts to the headers. I'm trying to get my coding legs back especially in this new world of GIT so I'll see what I can whip together for that next week. Can you point me to where the disassemble the code in situ (binary, what function references the equality operator where we would ideally see an inlined optimized comparison) to see the generated code best? And/or what kind of workload you did to measure the original issue so I can repro it with the current windows perf toolkit tools, see the issue there so I can push the comparison operators around and observe how my changes change things? Thanks, Mike |
I'm not sure I understand what you mean... If you're asking were terminal/src/renderer/base/renderer.cpp Line 828 in f588223
There you should be able to view the disassembled code.
You can use my rainbowbench. It'll draw a rainbow as quickly as possible and tell you the resulting "kcg/s", or in other words how many thousands (k) of colored (c) glyphs (g) it can draw per second (s). Since every character has a unique color, it'll result in every character having a different |
@lhecker, sorry I was off doing other things (learning git, github practicing by checking in a trivial build break fix for retail builds) but I'm back now. Your link to rainbowbench leads to a 404 not found. Can you point me to another? In terms of where to look, I mean like a good function to disassemble to see it inlined. Just going to the equality comparison function in question isn't necessarily particularly interesting since that won't show what the inliner does in context. So _PaintBufferOutputHelper() is the hot function? Do we have any output from the various perf tools to peruse? |
@EmJayGee Sorry, the project was still private. Please try it again! It's a cmake project, but you can also just copy-paste the single main.cpp into your own project. 🙂
I'm not entirely sure I'm following... If you're asking for a tool with which the perf. impact can be seen then Visual Studio's built-in Performance Profiler is often sufficient. |
Thanks, I'll grab it now. I'm having enough problems getting things building and running, I was hoping you had saved graphs or captured ETLs or whatever the modern equivalent are that I could peruse. It seems that if I exit VS, delete bin and obj, restart VS, build and then launch, things maybe work so perhaps I'm on a decent path. |
I finally got to see generated code and I'm surprised to find that in retail builds, the generated equality comparison is done piecemeal and that the operator== invocation is not inlined. Perhaps it's not inlined because it's so big so let me see what I can do to fix the former. Here's the invocation...
and here's the operator == code. It's almost like /Od code which is weird but I did pick the retail config in visual studio.
|
OK so just reordering the comparisons gets a much saner operator== implementation. I think that should be step 1 and possibly step omega depending on measurements.
|
... and the memcmp() technique would still have to be done carefully since there is one byte of padding on the end of the struct since the ExtendedAttributes is a BYTE-derived enum but the size of the overall class is 14. If possible I would just make ExtendedAttributes a WORD-derived enum but that'll probably break too many other things. I'm prepping a separate PR with just this one change in it that we might agree with since it's super low impact and then we can figure out what else we might want to do. |
Okay so I maybe lost the track here a bit between this thread & #12866 - can someone give me a TL;DR? Do we still want this one, or nah? |
This is exactly the way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shipit
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@msftbot merge this in 15 minutes |
Hello @carlos-zamora! Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:
If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you". |
🎉 Handy links: |
TextAttribute
andTextColor
are commonly used structures in hot paths.This commit replaces more complex comparisons where each field is compared
independently with a single call to
memcmp
. This compiles down to justa few instructions. This reduces code and binary size and improves
performance for paths were
TextAttribute
s need to be compared.PR Checklist
Validation Steps Performed
Co-authored-by: Leonard Hecker lhecker@microsoft.com