VectorX<T>.ConditionalSelect doesn’t get optimized for const masks on non-AVX512 platforms #104001
Labels
area-CodeGen-coreclr
CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
tenet-performance
Performance related issue
Milestone
Description
I was comparing
VectorT.ConditionalSelect
(128-bit in particular) andSse41.BlendVariable
disassembly output and found out that the first method gets only optimized into the second one only if the mask is the result ofCompare*
intrinsic. However we can actually optimize it if the mask is constant (e.g.Vector.Create
) since we can check its contents in the JIT during compilation. Here is the reproduction repo. I’ve also implemented an optimization in JIT here (worth mentioning that I’ve currently haven’t optimizedVectorT.ConditionalSelect(Vector.Create(fieldOrVariable))
since I didn’t find a way to inspect the values of field/variable,GT_LCL_VAR
andGT_LCL_FLD
in the JIT tree).Configuration
Regression?
Not a regression.
Data
Reproduction, benchmarks, and disassembly: https://github.com/ezhevita/ConditionalSelectReproduce
Analysis
The current implementation only checks for
Compare*
intrinsics however we can actually check and prove that the mask is indeed per-element if the vector is constant.Also another solution might be a new method in the API for the consumers which restricts the vector to be per-element mask.
The text was updated successfully, but these errors were encountered: