Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Make til::color's COLORREF conversion more optimal (#7619)
Clang (10) has no trouble optimizing the COLORREF conversion operator to a simple 32-bit load with mask (!) even though it's a series of bit shifts across multiple struct members. MSVC (19.24) doesn't make the same optimization decision, and it emits three 8-bit loads and some shifting. In any case, the optimization only applies at -O2 (clang) and above. In this commit, we leverage the spec-legality of using unions for type conversions and the overlap of four uint8_ts and a uint32_t to make the conversion very obvious to both compilers. x86_64 msvc | O0 | O1 | O2 ------------|----|----|-------------------- shifts | 12 | 11 | 11 (fully inlined) union | 5 | 1 | 1 (fully inlined) x86_64 clang | O0 | O1 | O2 + O3 -------------|----|----|-------------------- shifts | 14 | 5 | 1 (fully inlined) union | 9 | 3 | 1 (fully inlined) j4james brought up some concerns about til::color's minor wastefulness in #7578 (comment). This is a clear, simple transformation that saves us a few instructions in a relatively common case, so I'm accepting a micro-optimization even though we don't have data showing this to be a hot spot.
- Loading branch information