VectorX<T>.ConditionalSelect doesn’t get optimized for const masks on non-AVX512 platforms #104001

ezhevita · 2024-06-25T23:17:53Z

Description

I was comparing VectorT.ConditionalSelect (128-bit in particular) and Sse41.BlendVariable disassembly output and found out that the first method gets only optimized into the second one only if the mask is the result of Compare* intrinsic. However we can actually optimize it if the mask is constant (e.g. Vector.Create) since we can check its contents in the JIT during compilation. Here is the reproduction repo. I’ve also implemented an optimization in JIT here (worth mentioning that I’ve currently haven’t optimized VectorT.ConditionalSelect(Vector.Create(fieldOrVariable)) since I didn’t find a way to inspect the values of field/variable, GT_LCL_VAR and GT_LCL_FLD in the JIT tree).

Configuration

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3737/23H2/2023Update/SunValley3)
AMD Ryzen 7 5800X3D, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-preview.5.24307.3
  [Host]     : .NET 9.0.0 (9.0.24.30607), X64 RyuJIT AVX2
  DefaultJob : .NET 9.0.0 (9.0.24.30607), X64 RyuJIT AVX2

Regression?

Not a regression.

Data

Reproduction, benchmarks, and disassembly: https://github.com/ezhevita/ConditionalSelectReproduce

Analysis

The current implementation only checks for Compare* intrinsics however we can actually check and prove that the mask is indeed per-element if the vector is constant.
Also another solution might be a new method in the API for the consumers which restricts the vector to be per-element mask.

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-06-25T23:18:13Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

EgorBo · 2024-06-26T09:33:17Z

Thanks for the idea! I guess we can introduce a general helper to check for SIMD node to be either zeroes or allbits set in its components.

I didn’t find a way to inspect the values of field/variable, GT_LCL_VAR and GT_LCL_FLD in the JIT tree).

Lowering is a bit too conservative place to perform such transformation, it's better to do in e.g. optVNBasedFoldExpr where we can easily look through locals.

PS: similar optimizations can be done around other types of intrinsics, the best example is Vector128.ExtractMostSignificantBits on arm64

Are you interested in contributing it?

ezhevita · 2024-06-26T11:00:48Z

Are you interested in contributing it?

Yes, absolutely! I just would need some more time since it’s the first time I’m working on JIT and I still don’t get a lot of the internals of it

ezhevita · 2024-06-26T16:04:52Z

Update: I’ve realized that we can’t optimize Vector128.ConditionalSelect(Vector128.Create(_field), …) into a const since the array is mutable anyway and can be changed at any point of execution; however we can optimize storing the vector itself, example:

public class TestClass
{
	private static readonly Vector128<byte> _maskVector = Vector128.Create<byte>([255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0, 255, 0]);

	public Vector128<byte> Test()
	{
		return Vector128.ConditionalSelect(_maskVector, Vector128<byte>.One, Vector128<byte>.Zero);
	}
}

ezhevita added the tenet-performance Performance related issue label Jun 25, 2024

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 25, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jun 25, 2024

EgorBo added this to the Future milestone Jun 26, 2024

EgorBo removed the untriaged New issue has not been triaged by the area owner label Jun 26, 2024

ezhevita mentioned this issue Jun 27, 2024

Optimize VectorX<T>.ConditionalSelect for constant masks #104092

Merged

ezhevita changed the title ~~VectorT.ConditionalSelect doesn’t get optimized for const masks on non-AVX512 platforms~~ VectorX<T>.ConditionalSelect doesn’t get optimized for const masks on non-AVX512 platforms Jul 1, 2024

tannergooding closed this as completed in #104092 Jul 4, 2024

github-actions bot locked and limited conversation to collaborators Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VectorX<T>.ConditionalSelect doesn’t get optimized for const masks on non-AVX512 platforms #104001

VectorX<T>.ConditionalSelect doesn’t get optimized for const masks on non-AVX512 platforms #104001

ezhevita commented Jun 25, 2024 •

edited

Loading

dotnet-policy-service bot commented Jun 25, 2024

EgorBo commented Jun 26, 2024

ezhevita commented Jun 26, 2024

ezhevita commented Jun 26, 2024

VectorX<T>.ConditionalSelect doesn’t get optimized for const masks on non-AVX512 platforms #104001

VectorX<T>.ConditionalSelect doesn’t get optimized for const masks on non-AVX512 platforms #104001

Comments

ezhevita commented Jun 25, 2024 • edited Loading

Description

Configuration

Regression?

Data

Analysis

dotnet-policy-service bot commented Jun 25, 2024

EgorBo commented Jun 26, 2024

ezhevita commented Jun 26, 2024

ezhevita commented Jun 26, 2024

ezhevita commented Jun 25, 2024 •

edited

Loading