Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.IO.Tests crash in CI (Linux arm64) #100441

Closed
jkotas opened this issue Mar 29, 2024 · 9 comments
Closed

System.IO.Tests crash in CI (Linux arm64) #100441

jkotas opened this issue Mar 29, 2024 · 9 comments
Labels
area-System.IO Known Build Error Use this to report build issues in the .NET Helix tab
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Mar 29, 2024

  Discovering: System.IO.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.IO.Tests (found 736 of 744 test cases)
  Starting:    System.IO.Tests (parallel test collections = on [2 threads], stop on fail = off)
./RunTests.sh: line 180:    20 Killed                  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.IO.Tests.runtimeconfig.json --depsfile System.IO.Tests.deps.json xunit.console.dll System.IO.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/root/helix/work/workitem/e
----- end Fri Mar 29 11:20:29 UTC 2024 ----- exit code 137 ----------------------------------------------------------

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=623676
Build error leg or test failing: System.IO.Tests.WorkItemExecution
Pull request: #100433

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": ["arm64", "System.IO.Tests", "Killed", "-- exit code 137 --"],
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=623676
Error message validated: [arm64 System.IO.Tests Killed -- exit code 137 --]
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 3/29/2024 2:52:42 PM UTC

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
@jkotas jkotas added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab labels Mar 29, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 29, 2024
@jozkee jozkee added this to the 9.0.0 milestone Jul 3, 2024
@jozkee jozkee removed the untriaged New issue has not been triaged by the area owner label Jul 3, 2024
@adamsitnik
Copy link
Member

137 means out of memory. The tests started to fail not only in main but also in older branches where we have not touched the code at all: #100558

@dotnet/area-infrastructure-libraries Is it possible that the test VMs simply have less memory available now?

@ViktorHofer
Copy link
Member

ViktorHofer commented Jul 18, 2024

I don't think that we have access to that information for a Helix test client. Might make sense to print some diagnostics in the RunTests.sh/cmd script, i.e. available RAM and disk space.

@carlossanlop
Copy link
Member

carlossanlop commented Jul 18, 2024

Is it possible that the test VMs simply have less memory available now?

@adamsitnik I'd be surprised if something like that happened, but we can double check: @dotnet/dnceng do you know?

The thing is, this OOM failure is only happening in System.IO and System.IO.Net5Compat . I am pretty sure I don't see it anywhere else.

One thing that could help you is that this failure is also happening in 6.0 and 8.0, meaning something got backported, so that could help you narrow down the checkins, as we don't modify System.IO often. Nevermind, you already answered that above.

@carlossanlop
Copy link
Member

This is an intermittent issue, so maybe widen up the dates a bit more? When was the last time a System.IO change happened in servicing before April?

@jkotas
Copy link
Member Author

jkotas commented Jul 18, 2024

The failure was most likely triggered by Linux kernel update, docker container update or test infra update. These updates are rolled out regularly in the background. I do not think it is a good use of time to try to find the exact update that triggered this failure months ago. We won't be able to do much with that information.

The failure is likely triggered by a test that consumes too many resources. It does not have to be direct memory use. For example, the test can be creating too many file handles that manifests as 137. I think we should try to find the offending test or tests, e.g. by trying to reproduce the failure with verbose logging.

@jeffhandley jeffhandley modified the milestones: 9.0.0, 10.0.0 Aug 6, 2024
@JulieLeeMSFT
Copy link
Member

Failed for below leg in runtime-coreclr libraries-pgo/20240810.1

 net9.0-linux-Release-arm64-fullpgo_random_gdv-(Ubuntu.2004.Arm64.Open)Ubuntu.2004.Armarch.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-20.04-helix-arm64v8
  Starting:    System.IO.Tests (parallel test collections = on [2 threads], stop on fail = off)
./RunTests.sh: line 182:    26 Killed                  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.IO.Tests.runtimeconfig.json --depsfile System.IO.Tests.deps.json xunit.console.dll System.IO.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE

@jeffschwMSFT jeffschwMSFT removed the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Oct 8, 2024
@jeffschwMSFT
Copy link
Member

removing 'blocking-clean-ci' label as it has not failed in 30 days

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0

@jkotas
Copy link
Member Author

jkotas commented Oct 8, 2024

Fixed by #107163

@jkotas jkotas closed this as completed Oct 8, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Nov 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO Known Build Error Use this to report build issues in the .NET Helix tab
Projects
None yet
Development

No branches or pull requests

8 participants