Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some linux-arm test coverage #98022

Merged
merged 8 commits into from
Feb 22, 2024
Merged

Conversation

MichalStrehovsky
Copy link
Member

Add libraries testing and pri-0 testing as part of runtime-nativeaot-outerloop.

Cc @dotnet/ilc-contrib

@ghost
Copy link

ghost commented Feb 6, 2024

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

Issue Details

Add libraries testing and pri-0 testing as part of runtime-nativeaot-outerloop.

Cc @dotnet/ilc-contrib

Author: MichalStrehovsky
Assignees: MichalStrehovsky
Labels:

area-Infrastructure-coreclr

Milestone: -

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@filipnavara
Copy link
Member

Thanks, I will look into the build failures.

@filipnavara
Copy link
Member

The checked build failure reproduces locally:

ILC: /home/navara/runtime/src/coreclr/jit/importer.cpp:9382
ILC: Assertion failed 'genActualTypeIsIntOrI(op2) : Possibly bad IL with CEE_newarr at offset 002Bh (op1=NULL op2=long stkDepth=0)' in 'SizeParamIndex.PInvoke.PassingByOutTest:MarshalCStyleArrayLong_AsByOut_AsSizeParamIndex(byref,byref):ubyte' during 'Importation' (IL size 147; hash 0x8bc3ebdf; FullOpts)

ILC: /home/navara/runtime/src/coreclr/jit/importer.cpp:9382
ILC: Assertion failed 'genActualTypeIsIntOrI(op2) : Possibly bad IL with CEE_newarr at offset 000Ch (op1=NULL op2=long stkDepth=0)' in 'Internal.CompilerGenerated.<Module>:ReverseDelegateStub__DelUlongArrByRefAsCdeclCaller(uint,uint):int' during 'Importation' (IL size 204; hash 0x41204394; FullOpts)

The Release build failure seems to be issue with the pipeline - not selecting the right Helix queue.

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member Author

The pri0 suite failure is:

2024-02-14T08:23:44.0403671Z     Process terminated. Assertion failed.
2024-02-14T08:23:44.0405627Z        at ILCompiler.DependencyAnalysis.ARM.ARMEmitter.EmitLDR(Register destination, Register source, Int32 offset) in /_/src/coreclr/tools/Common/Compiler/DependencyAnalysis/Target_ARM/ARMEmitter.cs:line 127
2024-02-14T08:23:44.0406283Z        at ILCompiler.DependencyAnalysis.ReadyToRunHelperNode.EmitCode(NodeFactory factory, ARMEmitter& encoder, Boolean relocsOnly) in /_/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/Target_ARM/ARMReadyToRunHelperNode.cs:line 36
2024-02-14T08:23:44.0406921Z        at ILCompiler.DependencyAnalysis.AssemblyStubNode.GetData(NodeFactory factory, Boolean relocsOnly) in /_/src/coreclr/tools/Common/Compiler/DependencyAnalysis/AssemblyStubNode.cs:line 59
2024-02-14T08:23:44.0407567Z        at ILCompiler.ObjectWriter.ObjectWriter.EmitObject(String objectFilePath, IReadOnlyCollection`1 nodes, IObjectDumper dumper, Logger logger) in /_/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/ObjectWriter/ObjectWriter.cs:line 391
2024-02-14T08:23:44.0420336Z        at ILCompiler.ObjectWriter.ObjectWriter.EmitObject(String objectFilePath, IReadOnlyCollection`1 nodes, NodeFactory factory, ObjectWritingOptions options, IObjectDumper dumper, Logger logger) in /_/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/ObjectWriter/ObjectWriter.cs:line 542
2024-02-14T08:23:44.0421003Z        at ILCompiler.Compilation.ILCompiler.ICompilation.Compile(String outputFile, ObjectDumper dumper) in /_/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/Compilation.cs:line 525
2024-02-14T08:23:44.0421421Z        at ILCompiler.Program.Run() in /_/src/coreclr/tools/aot/ILCompiler/Program.cs:line 585
2024-02-14T08:23:44.0421751Z        at ILCompiler.ILCompilerRootCommand.<>c__DisplayClass236_0.<.ctor>b__0(ParseResult result) in /_/src/coreclr/tools/aot/ILCompiler/ILCompilerRootCommand.cs:line 292
2024-02-14T08:23:44.0422100Z        at System.CommandLine.Invocation.InvocationPipeline.Invoke(ParseResult parseResult)
2024-02-14T08:23:44.0422354Z        at System.CommandLine.ParseResult.Invoke()
2024-02-14T08:23:44.0422610Z        at ILCompiler.Program.Main(String[] args) in /_/src/coreclr/tools/aot/ILCompiler/Program.cs:line 753
2024-02-14T08:23:44.4226529Z /__w/1/s/artifacts/bin/coreclr/linux.arm.Checked/build/Microsoft.NETCore.Native.targets(310,5): error MSB3073: The command ""/__w/1/s/artifacts/bin/coreclr/linux.arm.Checked/x64/ilc/ilc" @"/__w/1/s/artifacts/tests/coreclr/obj/linux.arm.Checked/Managed/JIT/opt/virtualstubdispatch/bigvtbl/bigvtbl_cs_r/native/bigvtbl_cs_r.ilc.rsp"" exited with code 134.

The libraries test failures are to the tune of:

[FAIL] System.Linq.Expressions.Tests.BinaryDivideTests.CheckIntDivideTest(useInterpreter: True)
Assert.Throws() Failure: Exception type was not an exact match
Expected: typeof(System.OverflowException)
Actual:   typeof(System.ArithmeticException)
---- System.ArithmeticException : Overflow or underflow in the arithmetic operation.
   at System.Linq.Expressions.Tests.BinaryDivideTests.VerifyIntDivide(Int32 a, Int32 b, Boolean useInterpreter) + 0x10f
   at System.Linq.Expressions.Tests.BinaryDivideTests.CheckIntDivideTest(Boolean useInterpreter) + 0x47

We have an option to just start disabling these on bugs or if this looks like obvious fixes, we can just fix.

The Preview 2 cutoff will be at the end of next week. I think think this is close enough that we could start enabling this in the product (producing official builds, etc.). @filipnavara let me know if you'd like to do that or if you'd prefer me to do it.

@filipnavara
Copy link
Member

We have an option to just start disabling these on bugs or if this looks like obvious fixes, we can just fix.

These should be obvious fixes, I think. I'll have a look tomorrow morning.

let me know if you'd like to do that or if you'd prefer me to do it.

I'm particularly bad at the YAML and build things, so I would prefer you to do it if it's not too much work.

@filipnavara
Copy link
Member

The div/mul tests should be fixed by #98416.

I briefly looked at the EmitLDR code gen. We could easily emit this for large immediate offsets:

add rDEST, rSRC, #imm
ldr rDEST, [rDEST, #imm12]

instead of

ldr rDEST, [rSRC, #imm12]

I just need to write the code to emit the correct instruction encoding.

@filipnavara
Copy link
Member

Can I get a re-run with the merged fixes, please? Thx

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@filipnavara
Copy link
Member

There's still a few Pri0 failures but overall it looks quite manageable. Some of them look like issues that I have discovered previously and that should be easy to fix. I'll go through each of them today to get a better assessment.

@filipnavara
Copy link
Member

Bulk of the remaining failures is fixed by #98474, tracking the rest in #97729 (comment).

@MichalStrehovsky
Copy link
Member Author

For the determinism test failure, you can grab the baseline/compare object files with runfo:

https://github.com/dotnet/runtime/blob/main/eng/testing/debug-dump-template.md

The command line is:

runfo get-helix-payload -j 656fde5e-5baa-4428-974c-b207ffbce0fd -w nativeaot.SmokeTests -o PUT_OUTPUT_PATH_HERE

This will download the helix payload (and crashdump, but that one is not very useful) into PUT_OUTPUT_PATH_HERE.

There's a baseline.object and compare.object in the ZIP. readelf is saying the diff is in .debug_info section, but I didn't investigate further because it looks like you're going to have more expertise anyway.

@filipnavara
Copy link
Member

filipnavara commented Feb 15, 2024

Thanks for the hint. The differences in the .debug_info definitely look like a bug, possibly even some uninitialized memory.

Update: Yeah, there's a BREGx opcode where it should be REGx opcode. Not sure yet how it happens but should be easy to track down. While looking at the code I found bug in DWARF info for VLT_STK_REG too.

@filipnavara
Copy link
Member

filipnavara commented Feb 15, 2024

I submitted PRs for the easy issues. Of the two remaining ones, one seems to be JIT bug where it optimizes out fat pointer check, and other one is JIT/helper issue for inlined memcpy with null values.

@filipnavara
Copy link
Member

filipnavara commented Feb 15, 2024

PRs for the test failures: #98479 #98481
Exclusions for filed issues: filipnavara@fccc70a

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines 1171 to 1176
<ExcludeList Include="$(XunitTestBinBase)/Loader/classloader/Casting/punning*">
<Issue>https://github.com/dotnet/runtime/issues/98493</Issue>
</ExcludeList>
<ExcludeList Include="$(XunitTestBinBase)/JIT/Directed/nullabletypes/Desktop/boxunboxvaluetype_*">
<Issue>https://github.com/dotnet/runtime/issues/95517</Issue>
</ExcludeList>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like I made some mistake and the exclusions don't work as intended. :-/

I submitted a fix for the second one (#98547).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and for the first one (#98561). Although I very much doubt it will go through without scrutiny from the JIT team.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the exclusions and do another test run :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I submitted a fix for the second one (#98547).

#98547 fix is not correct as discovered in #98623 (comment) . I expect that the test will keep failing.

Also, there may be unforeseen perf issues due to #98623 (comment) . I am thinking that we should revert it and wait for the proper fix from @EgorBo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that the test will keep failing.

The test doesn’t fail. I checked that locally before submitting the fix. However, it doesn’t fail because the size of the block is small enough to take the managed code path which happens to trigger NRE correctly. It would still fail for big sizes but that’s not the case in the tested unboxing scenario.

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member Author

The failure in the System.Globalization.Calendars.Tests looks real, this was also crashing in the previous run. I pulled down the dump but it's bad corruption (stack is not usable).

Pulling down the dump also pulls down the binaries and symbols, so I just tried running it on my Pi 4. It doesn't crash this way, but I do get a failing test sometimes (filed #98795 on that).

The dump/binary can be pulled down with runfo.

runfo get-helix-payload -j ec8e78d3-6418-4863-905d-c1f5d003ca7e -w System.Globalization.Calendars.Tests -o ~/helix_out/

I'll do some exclusions and try to get this merged.

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member Author

@dotnet/ilc-contrib could someone review this infra-only change to stand up ARM32?

I also added testing for musl-arm64 since I finally figured out includeAllPlatforms: true is the magic juju to allow testing of some of the platforms. The failure on musl-arm64 is a variation of #94653 (intermittent issue). This test has some problems that are specific to MUSL in general and none of the failure modes generate dumps.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@MichalStrehovsky MichalStrehovsky merged commit a98016b into main Feb 22, 2024
176 of 180 checks passed
@MichalStrehovsky MichalStrehovsky deleted the MichalStrehovsky-patch-1 branch February 22, 2024 23:53
@filipnavara
Copy link
Member

Thanks a lot! 🎉

@github-actions github-actions bot locked and limited conversation to collaborators Mar 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants