Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: change profile slop assert to a jitdump note #81377

Merged

Conversation

AndyAyersMS
Copy link
Member

Stop asserting if we see unusually large discrepancies in the outgoing profile flow from a block. Instead just make a note in the jit dump.

Fixes #77450.

Stop asserting if we see unusually large discrepancies in the outgoing profile
flow from a block. Instead just make a note in the jit dump.

Fixes dotnet#77450.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 30, 2023
@ghost ghost assigned AndyAyersMS Jan 30, 2023
@ghost
Copy link

ghost commented Jan 30, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Stop asserting if we see unusually large discrepancies in the outgoing profile flow from a block. Instead just make a note in the jit dump.

Fixes #77450.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

@EgorBo PTAL
cc @dotnet/jit-contrib

Note edge profile weights are slated to be revised during .NET 8, so this is just an interim fix to stop this assert from firing.

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should have a special format for sort of warnings in the DUMP

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jan 31, 2023

Failure is a crash in ILC building crossgen2. Likely unrelated since this PR is only changing an assert into a dump.

  crossgen2 -> /__w/1/s/artifacts/bin/coreclr/linux.arm64.Debug/crossgen2/crossgen2.dll
  Generating native code
  Segmentation fault (core dumped)
/__w/1/s/artifacts/transport/coreclr/build/Microsoft.NETCore.Native.targets(281,5): error MSB3073: The command ""/__w/1/s/artifacts/transport/coreclr/x64/ilc/ilc" @"/__w/1/s/artifacts/obj/coreclr/crossgen2/arm64/Debug/native/crossgen2.ilc.rsp"" exited with code 139. [/__w/1/s/src/coreclr/tools/aot/crossgen2/crossgen2.csproj]
##[error]artifacts/transport/coreclr/build/Microsoft.NETCore.Native.targets(281,5): error MSB3073: (NETCORE_ENGINEERING_TELEMETRY=Build) The command ""/__w/1/s/artifacts/transport/coreclr/x64/ilc/ilc" @"/__w/1/s/artifacts/obj/coreclr/crossgen2/arm64/Debug/native/crossgen2.ilc.rsp"" exited with code 139.

@AndyAyersMS
Copy link
Member Author

Hmm, the failure is persistent. I am going to see if I can get a local repro.

@MichalStrehovsky is there any way to get crash dumps from CI when ILC fails like this? Or a stack trace?

@MichalStrehovsky
Copy link
Member

@MichalStrehovsky is there any way to get crash dumps from CI when ILC fails like this? Or a stack trace?

That would be a question for @dotnet/runtime-infrastructure - I don't think we are able to collect crash dumps in build legs. This is using live-built ILC with live-built JIT, hosted on top of whatever CoreCLR was used to build the repo. Such capability would have been useful in the past, and not just when using live-built tools.

I'll trigger more thorough NativeAOT testing run just in case we can hit this on a less obscure configuration.

@MichalStrehovsky
Copy link
Member

/azp run runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

I'll trigger more thorough NativeAOT testing run just in case we can hit this on a less obscure configuration.

Seems like lots of issues?

This change only affects jit internal diagnostics and is changing an assert into something that won't assert. So hard for me to see how it could cause any sort of failures.

@MichalStrehovsky
Copy link
Member

I'll trigger more thorough NativeAOT testing run just in case we can hit this on a less obscure configuration.

Seems like lots of issues?

Yes, we had a regression (that is now fixed), but this would show up as a build break, not test break and those are easy to see. The only build break I see is in the Build windows-x64 Release NativeAOT_Checked_Libs leg and that one is #81460.

This change only affects jit internal diagnostics and is changing an assert into something that won't assert. So hard for me to see how it could cause any sort of failures.

I'm a bit worried if you say this happened on repeated run. I'm not aware of seeing this in other PRs. I guess we could merge this and if this starts happening everywhere, we can start suspecting something bad like a C++ codegen bug.

@AndyAyersMS
Copy link
Member Author

Let me bounce this and see if it still repros with merged-up bits.

@AndyAyersMS AndyAyersMS closed this Feb 1, 2023
@AndyAyersMS AndyAyersMS reopened this Feb 1, 2023
@AndyAyersMS
Copy link
Member Author

arm64 linux CI machines see to be mis-provisioned or something?

CMake Error at /usr/share/cmake-3.25/Modules/CMakeTestCCompiler.cmake:70 (message):
  The C compiler

    "/usr/bin/clang-9"

  is not able to compile a simple test program.

@AndyAyersMS
Copy link
Member Author

retry didn't fix the build issues, so bounced the PR once more.

@AndyAyersMS
Copy link
Member Author

ILC failure didn't repro. Extra platform failures are unrelated (and mostly fixed elsewhere).

@AndyAyersMS AndyAyersMS merged commit 6c9190c into dotnet:main Feb 2, 2023
@MichalStrehovsky
Copy link
Member

MichalStrehovsky commented Feb 2, 2023

#81476 now hit the crossgen2 musl arm64 crossbuild crash that we observed here. The run started after this PR merged to main so it's still inconclusive as to whether this change triggered it. I've not seen this failure before. Cc @ivanpovazan

@AndyAyersMS
Copy link
Member Author

Hopefully we can figure this out; let me know if I can help somehow.

@ivanpovazan
Copy link
Member

#81476 now hit the crossgen2 musl arm64 crossbuild crash that we observed here. The run started after this PR merged to main so it's still inconclusive as to whether this change triggered it. I've not seen this failure before. Cc @ivanpovazan

After a rerun, the failure was not observed.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[pgo] Assertion failed '((-slop) <= diff) && (diff <= slop)' during 'Compute edge weights (2, false)'
4 participants