Fixing several of the Sse/Sse2.Compare* intrinsics to account for NaN inputs #34204

tannergooding · 2020-03-27T18:39:28Z

This resolves #34094 by updating several of the Sse/Sse2.Compare* intrinsics to account for NaN inputs when their isn't a direct comparison mode supported by the underlying hardware.

tannergooding · 2020-03-27T18:40:47Z

CC. @CarolEidt, @echesakovMSFT

As called out in #34094 this is a breaking change if one of the inputs was NaN, but it was also a bug and caused a difference in behavior if you were using FloatComparisonMode directly on AVX enabled hardware.

tannergooding · 2020-03-27T18:42:22Z

src/coreclr/src/jit/hwintrinsicxarch.cpp

+
+            if (compSupports(InstructionSet_AVX))
+            {
+                retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op1, op2, gtNewIconNode(14), NI_AVX_Compare, baseType, simdSize);


On AVX hardware, we can just use the hardware supported comparison mode. On non-AVX enabled hardware, we have to fallback to doing a different operation with the operands swapped.

tannergooding · 2020-03-27T18:42:55Z

src/coreclr/src/jit/hwintrinsicxarch.cpp

+                                        nullptr DEBUGARG("Clone op1 for Sse.CompareScalarGreaterThan"));
+
+                retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, op2, op1, NI_SSE_CompareScalarLessThan, baseType, simdSize);
+                retNode = gtNewSimdHWIntrinsicNode(TYP_SIMD16, clonedOp1, retNode,  NI_SSE_MoveScalar, baseType, simdSize);


For the scalar versions, the non-AVX path needs to ensure that CopiesUpperBits is still respected, so we have to do an additional MoveScalar operation.

tannergooding · 2020-03-27T18:44:32Z

@BruceForstall, @jeffhandley, @terrajobst; What is the current process for getting sign-off on breaking changes like this?

tannergooding · 2020-03-27T18:48:13Z

Just a note, this is the kind of change that could have been easily handled in managed land (rather than in importation) if we weren't blocked due to the more complex trees that it produces (#956).

terrajobst · 2020-03-27T20:30:05Z

@BruceForstall, @jeffhandley, @terrajobst; What is the current process for getting sign-off on breaking changes like this?

I don't believe we have centralized sign-off process for this anymore. @ericstj @PriyaPurkayastha what are your thoughts?

PriyaPurkayastha · 2020-03-30T18:41:34Z

Correct, there is no compat council that reviews/approves breaking changes. Review happens during Tactics. General guidelines that are given to teams is to determine what are the driving factors for making the breaking change (is this a customer reported issue etc.). What is the cost to make the change in a compatible way - can opt-in/opt-out switches be provided? Should additional data be gathered to understand impact of change? (e.g. contact .NET Technical Insights team). Other action items are to add/update functional tests for code paths changed and documenting the breaking change by using the https://github.com/dotnet/docs/issues/new?template=dotnet-breaking-change.md issue template.
@marklio do you have additional comments/feedback for this proposed breaking change?

marklio · 2020-03-30T19:14:50Z

Things I would be interested in: * It sounds like whether this is breaking depends slightly on what kind of hardware you’re on a) is that correct, and b) do we have data or an intuition about what the split is on the hardware distribution? * Are there APIs we can look for in the ecosystem to get an idea of the impact of a change? * How would customers encounter this as a break? Different output from floating point comparisons? Exceptions? How do these behaviors compare to the documentation? (are we standardizing on documented behavior? Or do the docs not cover this) * How does the behavior compare to .NET Framework? (if relevant) It definitely sounds like the new behavior is “more correct” and that we definitely want it to be the behavior going forward. So, the discussion is likely going to be about how we help customers absorb the change (rather than whether we would “approve” the change). [Edited to remove the gunk that ended up here due to responding to the github email]

tannergooding · 2020-03-30T20:25:02Z

@marklio, see below

It sounds like whether this is breaking depends slightly on what kind of hardware you’re on a) is that correct, and b) do we have data or an intuition about what the split is on the hardware distribution?

Not quite. Basically there exists two instruction sets for the purposes of this discussion. SSE2 (which has been around since 2000 and is a baseline requirement for .NET Core) and AVX (which has been around since 2011 and which, to my knowldege, is available in all Intel/AMD based VMs on Azure).

Today, if using the 8x Sse2.Compare* APIs that do GreaterThan, GreaterThanOrEqual, NotGreaterThan or NotGreaterThanOrEqual comparisons you get a behavior that doesn't correctly handle NaN (not-a-number) inputs. These APIs are available on any x86 (Intel/AMD) machine running .NET Core 3.0 or later. This is because the correct implementation of these functions must be "emulated" as there doesn't exist direct hardware support and the overloads were provided for completeness and parity with the equivalent feature in C/C++, Rust, and other langugages. floating-point behaves a bit differently form normal math, especially around NaN inputs, and so the normal inversion rules don't quite apply (that is NotGreaterThan can't be implemented as LessThanOrEqual). This was missed in review, hence the bug.

If using the related Avx.Compare* APIs, you get a different behavior that does correctly handle NaN because there is direct hardware support. This is only available on machines from 2011 and later, but is still fairly prominent being nearly 10 years old and with it being available on machines in Azure, AWS, etc.

Are there APIs we can look for in the ecosystem to get an idea of the impact of a change?

These are new APIs only introduced in .NET Core 3.0 (Sep 2019, already end of support) and available in .NET Core 3.1 (Dec 2019, supported until Dec 2022).

They are also extremely low level/advanced APIs that are designed to be (and were documented as being) essentially a 1-to-1 mapping with certain instructions exposed by the underlying hardware. You can only use certain APIs if your hardware supports it and they are meant to be used in high-performance/unsafe scenarios.

The usages, as such, would likely be limited (new and advanced use-case API) and hard to find. We aren't using them in the framework ourselves.

How would customers encounter this as a break? Different output from floating point comparisons? Exceptions? How do these behaviors compare to the documentation? (are we standardizing on documented behavior? Or do the docs not cover this)

You would get a different result as part of the comparison if either input contained a NaN floating-point value. As for documented behavior, we document ourselves to be compatible with the native equivalents of the intrinsics and with the underlying hardware instructions.

How does the behavior compare to .NET Framework? (if relevant)

This is not supported on .NET Framework, it is only available on .NET Core 3.0 and later.

marklio · 2020-03-30T20:58:42Z

Cool. So what scenarios would lead folks to calling the problematic overloads on Sse2 over the AVX ones? For the sake of argument, are those scenarios worth fixing? Should they just be deprecated? Should we handle this with an analyzer/fixer that calls the "working" API?

I assume that fixing the bug in 3.x wouldn't meet the servicing bar?

To be clear, you're under no obligation to convince me of anything. I'm just trying to help build a case for taking the fix and deciding how help customers through any pain. It seems like anyone who knows what they're doing will expect appropriate behavior from these APIs, and we haven't gotten feedback probably because very few people are using these APIs, and those who are probably aren't pushing NaN's through them. In which case, documenting the breaking change and making it probably makes the most sense.

tannergooding · 2020-03-30T21:16:05Z

Cool. So what scenarios would lead folks to calling the problematic overloads on Sse2 over the AVX ones? For the sake of argument, are those scenarios worth fixing? Should they just be deprecated? Should we handle this with an analyzer/fixer that calls the "working" API?

The primary reason would be wanting to support downlevel hardware while also accelerating further (such as by operating on 256-bits per iteration, rather than 128-bits) on newer hardware.
We could fix it with an analyzer/obsoleting the existing API, but seems like a poor alternative to just fixing it, given how new the API is and that the alternatives aren't as straightforward to use.

That is, the Avx overloads are primarily for 256-bit operations (vs the 128-bit operations that Sse2 provides) and the Avx overload that does operate on 128-bits takes a much more complex enum where the operation mapping isn't as straightforward (CompareGreaterThan(left, right) would be Compare(left, right, FloatComparisonMode.OrderedGreaterThanSignaling);). Likewise, on hardware without Avx support, you have to know that CompareGreaterThan(left, right) needs to be implemented as CompareLessThan(right, left) and that you need an additional MoveScalar(result, left) to ensure bit propagation remains for the Scalar variants of the APIs.

I assume that fixing the bug in 3.x wouldn't meet the servicing bar?

I'm unsure whether or not this would meet the bar and would defer to @jeffhandley and/or @BruceForstall.

marklio · 2020-03-30T21:56:07Z

I definitely wouldn't want to break folks in servicing for this. It was more of a rhetorical question about whether it had been considered. I definitely could have phrased that more clearly.

jeffhandley · 2020-03-31T00:27:38Z

I definitely wouldn't want to break folks in servicing for this.

So is fixing it only in .NET 5.0+ the best approach here, @marklio? @tannergooding, do you know of any reason to lobby for it to be fixed in 3.x, or would 5.0 be OK with you?

marklio · 2020-03-31T01:57:38Z

So is fixing it only in .NET 5.0+ the best approach here, @marklio?

Yes, that would be the position I'd take to tactics. Take the fix in 5.0 and document through the breaking change process.

We wouldn't bring this for 3.x because:

It is a breaking change.
We don't have any customers hitting this. (likely few folks using it, and they may not be passing NaN
We think people using these features are advanced folks who will likely be moving to bleeding edge to get more of these features, and won't mind moving to 5 to get this fix (this is conjecture on my part, so we have any data to back this up? I'll look for some tomorrow).

We want the fix in 5.0 because:

It is "correct" behavior WRT specification, and moreover represents a consolidation of similar behaviors being added to the product.
We have time to get feedback, and a feature audience that is likely adopting new versions quickly (again, conjecture)

I'd also watch for signal in previews of anyone playing with 5.0 that encounters this as a break in their code. That might lead us to other mitigations.

BruceForstall · 2020-04-03T23:08:20Z

A minor point is that if we do NOT fix it in servicing, and people start using this relatively new API in their code, we could end up with MORE user code that needs to be updated in the long run.

I wonder (and this is a general query, not just for this case) if there is a way we can (or should) annotate the 3.1 documentation now indicating a post-3.1 breaking change has been made to a particular API, to try and encourage people not to take a dependency on the subsequently-broken behavior.

saucecontrol · 2020-04-03T23:40:10Z

In this case, there's a clean and forward-compatible workaround, so it's not a big deal if it isn't fixed in 3.1. But if it weren't as simple, the argument that the breakage footprint will never be smaller than now is really important.

jeffhandley · 2020-04-07T17:58:09Z

We solidified the decision today, @tannergooding:

Make the fix in .NET 5.0
Do not apply the fix to 3.x
Update documentation to call out the bug and the upcoming breaking change, illustrating a forward-compatible workaround

tannergooding · 2020-04-07T18:03:40Z

I've rebased ontop of current master and this should be ready for review. I'll get a change up for the docs-repo that calls out the breaking change before merging.

PriyaPurkayastha · 2020-04-07T18:12:02Z

Here's the breaking change issue template to be used https://github.com/dotnet/docs/issues/new?template=dotnet-breaking-change.md

tannergooding · 2020-04-10T18:09:51Z

Had to update to use the new compExactlyDependsOn method.

src/coreclr/src/jit/hwintrinsicxarch.cpp

tannergooding · 2020-04-14T14:50:48Z

Test failures are unrelated and tracked by #34905

… inputs

CarolEidt

LGTM - I just had a question about whether it could be simplified.

CarolEidt · 2020-04-14T19:39:29Z

src/coreclr/src/jit/hwintrinsicxarch.cpp


    // The Prefetch and StoreFence intrinsics don't take any SIMD operands
    // and have a simdSize of 0
    assert((simdSize == 16) || (simdSize == 0));

    switch (intrinsic)
    {
+        case NI_SSE_CompareGreaterThan:


Since these cases are all basically the same, with different constants, it would seem that you could add a function to determine the constant to use for the AVX intrinsic, based on the constant associated with the SSE intrinsic. Does that make sense or am I missing something?

Yes, we probably could define a getInverseFloatingComparison method or something similar. A similar function could also be useful in lowering as it would allow either operand to be contained.

I added a new function HWIntrinsicInfo::lookupFloatingComparisonForSwappedArgs and cleaned up the importation logic.

…ault on Unix

…rThan functions

BruceForstall · 2020-04-14T23:12:42Z

You can disable the formatter around this section if that makes the most sense.

src/coreclr/src/jit/hwintrinsic.h

…x86 HWIntrinsics

…e _CMP_* macros

echesakov

LGTM (minus comment about C++ comments :) )

…n the C++ side

Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 27, 2020

tannergooding added the breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. label Mar 27, 2020

tannergooding commented Mar 27, 2020

View reviewed changes

tannergooding force-pushed the fix-34094 branch from feb6754 to 2774dc6 Compare April 7, 2020 18:02

tannergooding force-pushed the fix-34094 branch from 2774dc6 to 0e2130f Compare April 10, 2020 18:09

tannergooding commented Apr 13, 2020

View reviewed changes

src/coreclr/src/jit/hwintrinsicxarch.cpp Outdated Show resolved Hide resolved

tannergooding mentioned this pull request Apr 13, 2020

Implement Vector{Size}<T>.AllBitsSet #33924

Merged

tannergooding force-pushed the fix-34094 branch from 0e2130f to 3991a2c Compare April 13, 2020 20:19

tannergooding requested review from CarolEidt and echesakov April 14, 2020 16:16

tannergooding added 4 commits April 14, 2020 10:45

Fixing several of the Sse/Sse2.Compare* intrinsics to account for NaN…

deb4f61

… inputs

Applying format patch

545ee75

Switch to using compOpportunisticallyDependsOn

0311747

Use the named intrinsic comparison macros rather than magic numbers

e68d7ce

tannergooding force-pushed the fix-34094 branch from 3991a2c to e68d7ce Compare April 14, 2020 18:18

Applying formatting patch

c8e1678

CarolEidt approved these changes Apr 14, 2020

View reviewed changes

tannergooding added 3 commits April 14, 2020 13:40

Define the _CMP hwintrinsic macros since they aren't available by def…

d5dcd81

…ault on Unix

Simplifying the special importation logic for Sse/Sse2 compare Greate…

feb4fcd

…rThan functions

Applying formatting patch

a7467da

tannergooding force-pushed the fix-34094 branch from fc167f4 to a7467da Compare April 14, 2020 23:02

This comment has been minimized.

Sign in to view

BruceForstall reviewed Apr 14, 2020

View reviewed changes

src/coreclr/src/jit/hwintrinsic.h Outdated Show resolved Hide resolved

tannergooding added 3 commits April 14, 2020 16:23

Add a comment explaining the naming of the _CMP_* macros used by the …

1d43747

…x86 HWIntrinsics

Switch to mirroring the FloatComparisonMode enum rather than using th…

cbf0ee9

…e _CMP_* macros

Apply formatting patch

646229d

echesakov approved these changes Apr 16, 2020

View reviewed changes

Don't use XML style doc comments for the FloatComparisonMode mirror o…

cbb239e

…n the C++ side

tannergooding closed this Apr 16, 2020

tannergooding reopened this Apr 16, 2020

jaredpar mentioned this pull request Apr 16, 2020

Restore failing on our projects #35057

Closed

tannergooding added 2 commits April 17, 2020 14:29

Ensure the base type is set before it is checked

0e6a3bd

Applying formatting patch

181d73d

tannergooding merged commit 2bd14de into dotnet:master Apr 18, 2020

This was referenced Apr 21, 2020

Builds legs getting abandoned due to agent notification issues #35223

Closed

OSX machines are de-provisioned during CI / PR runs leading to failures #34472

Closed

jeffhandley mentioned this pull request May 4, 2020

Vector2/4.Lerp do not always return value2 when amount is 1 #35529

Closed

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

tannergooding deleted the fix-34094 branch November 11, 2022 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing several of the Sse/Sse2.Compare* intrinsics to account for NaN inputs #34204

Fixing several of the Sse/Sse2.Compare* intrinsics to account for NaN inputs #34204

tannergooding commented Mar 27, 2020

tannergooding commented Mar 27, 2020

tannergooding Mar 27, 2020

tannergooding Mar 27, 2020

tannergooding commented Mar 27, 2020

tannergooding commented Mar 27, 2020

terrajobst commented Mar 27, 2020

PriyaPurkayastha commented Mar 30, 2020

marklio commented Mar 30, 2020 via email •

edited

Loading

tannergooding commented Mar 30, 2020

marklio commented Mar 30, 2020

tannergooding commented Mar 30, 2020

marklio commented Mar 30, 2020

jeffhandley commented Mar 31, 2020

marklio commented Mar 31, 2020

BruceForstall commented Apr 3, 2020

saucecontrol commented Apr 3, 2020

jeffhandley commented Apr 7, 2020

tannergooding commented Apr 7, 2020

PriyaPurkayastha commented Apr 7, 2020

tannergooding commented Apr 10, 2020

tannergooding commented Apr 14, 2020

CarolEidt left a comment

CarolEidt Apr 14, 2020

tannergooding Apr 14, 2020

tannergooding Apr 14, 2020

This comment has been minimized.

BruceForstall commented Apr 14, 2020

echesakov left a comment

Fixing several of the Sse/Sse2.Compare* intrinsics to account for NaN inputs #34204

Fixing several of the Sse/Sse2.Compare* intrinsics to account for NaN inputs #34204

Conversation

tannergooding commented Mar 27, 2020

tannergooding commented Mar 27, 2020

tannergooding Mar 27, 2020

Choose a reason for hiding this comment

tannergooding Mar 27, 2020

Choose a reason for hiding this comment

tannergooding commented Mar 27, 2020

tannergooding commented Mar 27, 2020

terrajobst commented Mar 27, 2020

PriyaPurkayastha commented Mar 30, 2020

marklio commented Mar 30, 2020 via email • edited Loading

tannergooding commented Mar 30, 2020

marklio commented Mar 30, 2020

tannergooding commented Mar 30, 2020

marklio commented Mar 30, 2020

jeffhandley commented Mar 31, 2020

marklio commented Mar 31, 2020

BruceForstall commented Apr 3, 2020

saucecontrol commented Apr 3, 2020

jeffhandley commented Apr 7, 2020

tannergooding commented Apr 7, 2020

PriyaPurkayastha commented Apr 7, 2020

tannergooding commented Apr 10, 2020

tannergooding commented Apr 14, 2020

CarolEidt left a comment

Choose a reason for hiding this comment

CarolEidt Apr 14, 2020

Choose a reason for hiding this comment

tannergooding Apr 14, 2020

Choose a reason for hiding this comment

tannergooding Apr 14, 2020

Choose a reason for hiding this comment

This comment has been minimized.

BruceForstall commented Apr 14, 2020

echesakov left a comment

Choose a reason for hiding this comment

marklio commented Mar 30, 2020 via email •

edited

Loading