Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mono] [Arm64] Added SIMD support for vector 2/3/4 methods #98761

Merged
merged 16 commits into from
Mar 15, 2024

Conversation

jkurdek
Copy link
Member

@jkurdek jkurdek commented Feb 21, 2024

Implements #91394.

@jkurdek jkurdek marked this pull request as ready for review February 21, 2024 19:36
@jkurdek
Copy link
Member Author

jkurdek commented Feb 21, 2024

/azp run runtime-extra-platforms

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@fanyang-mono fanyang-mono left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM!

src/mono/mono/mini/mini-llvm.c Outdated Show resolved Hide resolved
src/mono/mono/mini/simd-intrinsics.c Outdated Show resolved Hide resolved
Copy link
Member

@matouskozak matouskozak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks for implementing these. I left a few comments with questions.

Before merging, please take a look if fullAOT llvm compiles correctly these new intrinsic.

src/mono/mono/mini/simd-intrinsics.c Outdated Show resolved Hide resolved
src/mono/mono/mini/simd-intrinsics.c Outdated Show resolved Hide resolved
src/mono/mono/mini/simd-intrinsics.c Show resolved Hide resolved
src/mono/mono/mini/simd-intrinsics.c Show resolved Hide resolved
case SN_Lerp: {
#if defined (TARGET_ARM64)
MonoInst* v1 = args [1];
if (!strcmp ("Quaternion", m_class_get_name (klass))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quaternion.Lerp is not marked as intrinsic in the libraries:

/// <summary>Performs a linear interpolation between two quaternions based on a value that specifies the weighting of the second quaternion.</summary>
/// <param name="quaternion1">The first quaternion.</param>
/// <param name="quaternion2">The second quaternion.</param>
/// <param name="amount">The relative weight of <paramref name="quaternion2" /> in the interpolation.</param>
/// <returns>The interpolated quaternion.</returns>
public static Quaternion Lerp(Quaternion quaternion1, Quaternion quaternion2, float amount)

However, if this implementation generates better codegen we should probably keep it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intrinsified version is around 40% faster on my machine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the intrinsified version faster here? Is it fundamentally doing something differently from the managed implementation or is there potentially a missing JIT optimization?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps there is simply missing a change on the managed side and so its using scalar logic rather than any actual vectorization and a better fix is to update the managed impl?


We've typically tried to keep a clear separation between intrinsic functionality and more complex methods.

APIs like operator + or Sqrt are generally mapped to exactly 1 hardware instruction and this is the case for most platforms.

APIs like DotProduct or even Create may be mapped to exactly 1 hardware instruction on some platforms and are fairly "core" to the general throughput considerations of many platforms.

APIs like Quaternion.Lerp or CopyTo are more complex functions which use multiple instructions on all platforms and which may even require branching or masking logic. So, we've typically tried to keep them in managed and have them use the intrinsic APIs instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should align Mono's behavior with CoreCLR, not intrinsifying Quaternion.Lerp or CopyTo for mono either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the intrinsified version faster here? Is it fundamentally doing something differently from the managed implementation or is there potentially a missing JIT optimization?

In general, Mono's mini JIT doesn't have as comprehensive optimizations as CoreCLR's RyuJIT.

Comment on lines 695 to 697
MonoInst *sum = emit_simd_ins (cfg, klass, OP_ARM64_XADDV, arg->dreg, -1);
sum->inst_c0 = INTRINS_AARCH64_ADV_SIMD_FADDV;
sum->inst_c1 = MONO_TYPE_R4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that Arm has typically pushed us away from using FADDV as it does not perform well on some hardware.

Rather instead they had us use a sequence of FADDP (AddPairwise) instructions which tend to have better perf/throughput: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/gentree.cpp#L25190-L25210

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the information, @tannergooding. @jkurdek Feel free to create an issue to address it in a future PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkurdek jkurdek marked this pull request as ready for review March 15, 2024 09:35
@jkurdek jkurdek merged commit 516f5c4 into dotnet:main Mar 15, 2024
109 of 111 checks passed
@jkurdek jkurdek deleted the vector-additional-methods branch March 15, 2024 11:59
@github-actions github-actions bot locked and limited conversation to collaborators Apr 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants