Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX512 codegen issues #100404

Closed
TechPizzaDev opened this issue Mar 28, 2024 · 7 comments
Closed

AVX512 codegen issues #100404

TechPizzaDev opened this issue Mar 28, 2024 · 7 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@TechPizzaDev
Copy link
Contributor

TechPizzaDev commented Mar 28, 2024

Description

NET 8:
net8 image

NET 9 (or NET 8 with AVX512F_VL=0):
net9 image

Reproduction Steps

It is terribly difficult to narrow down the issue.

There is a "massive" method at https://github.com/TechPizzaDev/SharpBlaze/blob/f330516bd8e88b607e36b6362e06b90f30472029/SharpBlaze/Linearizer.cs#L459. This method contains a lot of smaller inlined methods, and since most of the small methods are marked with AggressiveInlining, the massive ProcessUncontained method turns out to around 7kB of JIT ASM, and the JIT gives up on inlining at the end which causes helper methods of InlineArray to show up.
As soon as the JIT gives up on inlining (by tweaking MethodImplOptions in hot paths), the issues seemingly start to appear.

This project is a verbatim port of https://github.com/aurimasg/blaze to C#, and the 30k-Paris assets can be found at that original repo.

Expected behavior

Inlining and tiered compilation does not affect behavior. Release and debug builds create the same result.

Actual behavior

Tiered compilation changes the result (most likely because of inlining changing between JIT tiers).

Changing from [MethodImpl(MethodImplOptions.AggressiveInlining)] to [MethodImpl(MethodImplOptions.NoInlining)] reduces the problem in some hot paths (tiered compilation will sometimes stabilize to a correct result, but the result changing at all should not occur):
https://github.com/TechPizzaDev/SharpBlaze/blob/f330516bd8e88b607e36b6362e06b90f30472029/SharpBlaze/Linearizer.cs#L938-L940

Debug builds produce a correct result unlike Release builds.

Regression?

Since AVX512 plays a role, it seems unlikely that .NET 7 is affected (since AVX512 was only exposed in .NET 8).

Known Workarounds

Disable AVX512V_FL with DOTNET_EnableAVX512F_VL="0".

Updating to .NET 9 Preview does make the issues disappear, but looking at the disassembly it could simply be because .NET 9 has better inlining, which could hide the underlying issue.

Configuration

.NET: 8.0.3
OS: Microsoft Windows 11 Home (10.0.22631 version 22631)
CPU: 11th Gen Intel(R) Core(TM) i3-1115G4 @ 3.00GHz
Arch: x64

Other information

I could not narrow the issue down to any specific instruction sequence, and the only obvious difference I could see from a glance between NET8 and NET9 was a few vbroadcasti32x4 instead of vmovups that look like the result of #92017 but this is unlikely the problem.

Here is a zip with some JIT disasm.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 28, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 28, 2024
@tannergooding
Copy link
Member

The issue looks to be rooted in SharpBlaze.Matrix.Map(FloatPoint point) and specifically around m[0] * Vector128.Create(point.X) where AVX512 is emitting:

       vmovsd   xmm1, qword ptr [r8]
       vmulpd   xmm0, xmm0, xmm1

While AVX2 is emitting:

       vmovsd   xmm0, qword ptr [r8]
       vmovddup xmm0, xmm0
       vmulpd   xmm0, xmm0, xmmword ptr [rcx]

This looks to be a disconnect in the containment logic around embedded broadcasts as it is expected AVX-512 would have instead emitted vmulpd xmm0,xmm0,QWORD PTR [r8]{1to2} or at worst preserved the vmovddup as AVX2 did.

@tannergooding
Copy link
Member

CC. @dotnet/avx512-contrib

@tannergooding
Copy link
Member

The following is a minimal repro:

using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;

internal class Program
{
    private static void Main(string[] args)
    {
        Console.WriteLine(Map(Vector128<double>.One, new FloatPoint(2.0, 3.0)));
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static Vector128<double> Map(Vector128<double> m0, FloatPoint point)
    {
        return m0 * Vector128.Create(point.X);
    }
}

public struct FloatPoint(double x, double y)
{
    public double X = x;
    public double Y = y;
}

@tannergooding
Copy link
Member

It looks like this was fixed in .NET 9, so potentially just something that hasn't been backported or that isn't in the latest .NET 8 release yet.

TechPizzaDev added a commit to TechPizzaDev/SharpBlaze that referenced this issue Mar 28, 2024
@tannergooding
Copy link
Member

This is #97783, which hasn't been backported.

tannergooding added a commit to tannergooding/runtime that referenced this issue Mar 28, 2024
@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.x milestone Mar 29, 2024
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Mar 29, 2024
tannergooding added a commit that referenced this issue Apr 4, 2024
…cks supporting SIMD scalar loads (#100417)

* Ensure that Sse3.MoveAndDuplicate correctly tracks supporting SIMD scalar loads (#97783)

* Add a regression test for #100404

* Fix the regression test to not be called Main
@tannergooding
Copy link
Member

The fix was backported.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

3 participants