Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need infrastructure to run tests in varying hardware intrinsic modes #950

Open
danmoseley opened this issue Mar 18, 2019 · 13 comments
Open
Labels
area-Infrastructure-libraries help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@danmoseley
Copy link
Member

Increasingly we are taking changes that use hardware intrinsics to accelerate parts of CoreFX. Without special care, our testing will only ever cover the AVX2 (or AVX) path, not the software path and not any SSE path.

We need a way to easily cause selected tests to re-run in each mode so every codepath is exercised.

In the ML.NET repo they repeat relevant tests under each of regular, no AVX, and no AVX/SSE using RemoteExec:
https://github.com/dotnet/machinelearning/blob/d65af0f755b9cd5739693004f1a8a8964ce23d8a/test/Microsoft.ML.CpuMath.UnitTests.netcoreapp/UnitTests.cs#L140

We can do something similar, but centralize the magic environment variables and offer a flag on RemoteExec that tests can set to enumerate over them.

cc @tannergooding @jkotas @Anipik

@ViktorHofer
Copy link
Member

Moving to future as this would be nice to have but not necessarily important for 3.0. That said I will prioritize it correctly and try to work on it as soon as other high prio items are done. I'll try to do at least a manual run asap to get some data.

@gfoidl
Copy link
Member

gfoidl commented Nov 7, 2019

It would be nice to write tests as

[Fact]
[HwIntrinsic(Intrinsic.Avx2 | Intrinsic.Ssse3)]
public void MySuperTest()
{
    // ...
}

and then the test-runtime / CI executes the tests with environment variables set (i.e. that the noted intrinsics are disabled) so that the code-path can be verified.

The approach with RemoteExec is possible, but it forces us to write some code that could belong to the infrastructure.

@maryamariyan maryamariyan transferred this issue from dotnet/corefx Dec 16, 2019
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Dec 16, 2019
@maryamariyan maryamariyan added area-Infrastructure-libraries help wanted [up-for-grabs] Good issue for external contributors labels Dec 16, 2019
@maryamariyan maryamariyan added this to the Future milestone Dec 16, 2019
@danmoseley
Copy link
Member Author

Both approaches above assume it's fairly well defined which tests we need to run this way. As more and more of our code uses intrinsics and particularly if we want some indirect test coverage we might instead want to execute the entire test suite periodically with the various environment variables.

Perhaps it could be added pairwise into our test matrix, eg., on Centos and Windows Server always go through software mode, etc - if we're confident it's an orthogonal parameter.

@ViktorHofer ViktorHofer removed the untriaged New issue has not been triaged by the area owner label Jun 23, 2020
MichalStrehovsky added a commit to MichalStrehovsky/runtime that referenced this issue Jul 1, 2021
@danmoseley
Copy link
Member Author

linking to discussion of what the knobs are/should be: #11701

@danmoseley
Copy link
Member Author

@stephentoub @jkotas now that we're being more thoughtful about our matrix, and more of it is post-commit, would it be reasonable to pairwise one or more of these flags into the mix? Eg., if we test Windows 10 x64 and Windows 11 x64 we could enable COMPlus_EnableHWIntrinsic=0 permanently on one of those. It seems to me vanishingly unlikely that that would mask any OS-specific bugs.

It's not entirely clear which switch/es to set though. COMPlus_EnableHWIntrinsic=0 will force the software path, which in some cases is (still?) exercised for free on Arm64 tests. It may be better to force the SSE2 path, eg.

cc @safern @tannergooding @BruceForstall

@danmoseley
Copy link
Member Author

I just noticed your offer @tannergooding to drive thinking around this. Sounds good to me.

@safern
Copy link
Member

safern commented Oct 6, 2021

I also can work with @tannergooding on getting this set up (whatever pieces we need to move).

@BruceForstall
Copy link
Member

Note that we do exhaustive hardware intrinsic switch testing (to force various environments, in combination with JIT stress modes) of the CoreCLR Pri-1 tests for x86 (https://dev.azure.com/dnceng/public/_build/results?buildId=1399897&view=results) and arm64 (https://dev.azure.com/dnceng/public/_build/results?buildId=1399881&view=results).

Adding a few modes (for x86, x64, and arm64) for libraries tests makes sense.

@tannergooding
Copy link
Member

tannergooding commented Oct 6, 2021

we could enable COMPlus_EnableHWIntrinsic=0 permanently on one of those

@danmoseley, I don't think that one in particular is interesting and its somewhat covered already by x86 Linux and ARM32 (where hardware intrinsics aren't supported).

Namely, the most interesting setups are probably AVX=0 and SSE3=0 given that we require SSE2 support on x86/x64 and AdvSimd support on ARM64.
However, this is also a general case that there's no real benefit to running these tests if the code hasn't changed and so running them as part of a weekly or monthly outerloop would likely catch any issues that might crop up.

Users adding new or touching existing intrinsic code can (and should) explicitly trigger the outerloop jobs against their PR.

@jkotas
Copy link
Member

jkotas commented Oct 7, 2021

now that we're being more thoughtful about our matrix, and more of it is post-commit, would it be reasonable to pairwise one or more of these flags into the mix?

What is the total size of the matrix that would give us full coverage for these?

I am wondering whether we should set these randomly. It means that the failures won't be deterministic, but it is not that different from what we are doing to stress the product in general (stress failures are never deterministic).

@jkotas
Copy link
Member

jkotas commented Oct 7, 2021

We can also use the same randomized approach for other variables with similar characteristics, like tiered compilation on/off.

@tannergooding
Copy link
Member

What is the total size of the matrix that would give us full coverage for these?

The JIT currently tests with these modes for the outerloop:

<TestEnvironment Include="jitstress_isas_incompletehwintrinsic" EnableIncompleteISAClass="1" />
<TestEnvironment Include="jitstress_isas_nohwintrinsic" EnableHWIntrinsic="0" />
<TestEnvironment Include="jitstress_isas_nohwintrinsic_nosimd" EnableHWIntrinsic="0" FeatureSIMD="0" />
<TestEnvironment Include="jitstress_isas_nosimd" FeatureSIMD="0" />
<TestEnvironment Include="jitstress_isas_x86_noaes" EnableAES="0" /> <!-- Depends on SSE2 -->
<TestEnvironment Include="jitstress_isas_x86_noavx" EnableAVX="0" /> <!-- Depends on SSE42 -->
<TestEnvironment Include="jitstress_isas_x86_noavx2" EnableAVX2="0" /> <!-- Depends on AVX -->
<TestEnvironment Include="jitstress_isas_x86_nobmi1" EnableBMI1="0" /> <!-- No dependencies -->
<TestEnvironment Include="jitstress_isas_x86_nobmi2" EnableBMI2="0" /> <!-- No dependencies -->
<TestEnvironment Include="jitstress_isas_x86_nofma" EnableFMA="0" /> <!-- Depends on AVX -->
<TestEnvironment Include="jitstress_isas_x86_nohwintrinsic" EnableBMI1="0" EnableBMI2="0" EnableLZCNT="0" EnableSSE="0" />
<TestEnvironment Include="jitstress_isas_x86_nolzcnt" EnableLZCNT="0" /> <!-- No dependencies -->
<TestEnvironment Include="jitstress_isas_x86_nopclmulqdq" EnablePCLMULQDQ="0" /> <!-- Depends on SSE2 -->
<TestEnvironment Include="jitstress_isas_x86_nopopcnt" EnablePOPCNT="0" /> <!-- Depends on SSE42 -->
<TestEnvironment Include="jitstress_isas_x86_nosse" EnableSSE="0" /> <!-- No dependencies -->
<TestEnvironment Include="jitstress_isas_x86_nosse2" EnableSSE2="0" /> <!-- Depends on SSE -->
<TestEnvironment Include="jitstress_isas_x86_nosse3" EnableSSE3="0" /> <!-- Depends on SSE2 -->
<TestEnvironment Include="jitstress_isas_x86_nosse3_4" EnableSSE3_4="0" /> <!-- Depends on SSE2 -->
<TestEnvironment Include="jitstress_isas_x86_nosse41" EnableSSE41="0" /> <!-- Depends on SSSE3 and SSE3_4 -->
<TestEnvironment Include="jitstress_isas_x86_nosse42" EnableSSE42="0" /> <!-- Depends on SSE41 -->
<TestEnvironment Include="jitstress_isas_x86_nossse3" EnableSSSE3="0" /> <!-- Depends on SSE3 -->

There are additional variants that combine with the other JitStress modes as well, but those probably aren't as relevant for the library side tests.

In particular today, the library code has code paths based on:

  • AVX
  • AVX2
  • BMI1
  • BMI2
  • FMA
  • LZCNT
  • POPCNT
  • SSE
  • SSE2
  • SSSE3
  • SSE41

And for ARM:

  • AdvSIMD

I don't think we need to do anything like randomly set the switches. Simply having the same matrix as the JIT as outerloop jobs should be sufficient. PRs that are adding new intrinsic code paths can then explicitly trigger these jobs, just as we do when adding new JIT support via azp run runtime-coreclr jitstress-isas-x86, runtime-coreclr jitstress-isas-arm.

The outerloop jobs should be plenty sufficient for catching any oddities that come about otherwise, particularly given that the code is rarely touched and any real issues should likely get caught by the JIT side tests first.

@danmoseley
Copy link
Member Author

Linking this interesting example demonstrating that library authors need these flags for their own testing, and of course that means they expect us to run clean. #60035 (comment)

@ViktorHofer ViktorHofer removed their assignment Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Infrastructure-libraries help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests

9 participants