-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSVC <bit> #795
MSVC <bit> #795
Conversation
tests/std/tests/P0553R4_bit_rotating_and_counting_functions/test.cpp
Outdated
Show resolved
Hide resolved
In fact, GCC's Later ISA extensions like AVX imply SSE4.2 and thus in practice (Technically, I think Intel's documentation only specifies that the AVX1 feature flag implies that the VEX encoding of instructions like Some AMD-only features like XOP and FMA4 have come and gone, and AMD SSE4a isn't implied by AVX on Intel CPUs. But Intel features are incremental and can be seen a "feature levels" not just feature bitmaps. And the coupling between SSE feature levels and bit-manipulation is that some software already does assume that SSE4.2 implies popcnt. I'm pretty sure any virtual machine / emulator that indicated SSE4.2 support via CPUID but faulted on popcnt would not be able to run some real-world software, therefore it's safe for compilers to make the assumption that no such systems exist, or don't need to be supported. (i.e. they're not real x86 systems). For the record, I don't know if Intel documents anywhere that later CPUID feature bits imply earlier ones, especially not the SSE4.2 / popcnt thing. So we'd be relying on widely-established practice here, probably not explicit documentation. |
I think that for tlcnt / lzcnt feature-check and handling zero should be done after the counting instruction. This way it is possible to convert potentially mispredicted jump to
Or possibly even just perform static check, but no runtime check and have no conditional jumps:
|
Note that Interesting idea, but not clear whether it's a good idea to put a cmov on the critical path dependency chain through TZCNT. It could be worse depending on the use-case, especially pre-Broadwell where Ideally you'd like the compiler to hoist the And if value-range optimizations can prove that the input value is never |
Apparently MSVC is not doing this for bit counting in a loop, all it does is caching |
Unfortunately we can't do that with l/tzcnt since ivybridge has sse 4.2 but not l/tzcnt |
I'm not going to flip the condition because I just don't want to deal with it, and we'd need better benchmark tests to make sure it's worth it. As for static checking, that's something I'd like, but can wait a bit. Dynamic checking is required because most programs are not compiled with non-default arch flags, and I still want that to work. |
update bit for msvc (more updates) fixup arm countr and popcount small comment about intended widening.
Co-authored-by: Stephan T. Lavavej <stl@nuwen.net>
Co-authored-by: Stephan T. Lavavej <stl@nuwen.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks perfect!
Thanks for your contribution, even if it did only involve a |
enables
<bit>
functions on MSVC._Countr_zero
has moved to<limits>
to support numericisa_available
once the internal PR is up to add the bits needed toisa_available