Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow "Mono llvmfullaot Pri0 Runtime Tests Run Linux arm64 release" #65626

Open
EgorBo opened this issue Feb 20, 2022 · 18 comments
Open

Slow "Mono llvmfullaot Pri0 Runtime Tests Run Linux arm64 release" #65626

EgorBo opened this issue Feb 20, 2022 · 18 comments

Comments

@EgorBo
Copy link
Member

EgorBo commented Feb 20, 2022

Mono llvmfullaot Pri0 Runtime Tests Run Linux arm64 release takes around 2.5H to finish.

There are some interesting anomalies in the logs, e.g.:
image
(I checked various runs)

It says that prejitting of a single managed assembly Microsoft.Win32.SystemEvents.dll takes almost 10 minutes 😮 (mostly in LLVM's opt+llc)

I parsed the output into an excel table:
image

is it possible to move some libs/tests to the outerloop? e.g. JIT/Methodical/MDArray/GaussJordan/classarr_cs_do/classarr_cs_do test. And I guess we need to figure out what exactly makes Microsoft.Win32.SystemEvents.dll so long to prejit - there are not much stuff in it.

cc @akoeplinger @vargaz @steveisok

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 20, 2022
@ghost
Copy link

ghost commented Feb 20, 2022

Tagging subscribers to this area: @directhex
See info in area-owners.md if you want to be subscribed.

Issue Details

Mono llvmfullaot Pri0 Runtime Tests Run Linux arm64 release takes around 2.5H to finish.

There are some interesting anomalies in the logs, e.g.:
image
(I checked various runs)

It says that prejitting of a single managed assembly Microsoft.Win32.SystemEvents.dll takes almost 10 minutes 😮 (mostly in LLVM's opt+llc)

I parsed the output into an excel table:
image

is it possible to move some libs/tests to the outerloop? e.g. JIT/Methodical/MDArray/GaussJordan/classarr_cs_do/classarr_cs_do test. And I guess we need to figure out what exactly makes Microsoft.Win32.SystemEvents.dll so long to prejit - there are not much stuff in it.

cc @akoeplinger @vargaz @steveisok

Author: EgorBo
Assignees: -
Labels:

untriaged, area-Infrastructure-mono

Milestone: -

@steveisok
Copy link
Member

Adding @SamMonoRT

@vargaz
Copy link
Contributor

vargaz commented Feb 20, 2022

Yes, these are very slow, they run opt+llc on unlinked assemblies.

@agocke
Copy link
Member

agocke commented Mar 4, 2022

This test run is pretty regularly timing out -- can we get someone to investigate if the slowness is a bug, or if we need to adjust the timeout?

@marek-safar marek-safar removed the untriaged New issue has not been triaged by the area owner label Mar 7, 2022
@marek-safar marek-safar added this to the 7.0.0 milestone Mar 7, 2022
@SamMonoRT
Copy link
Member

SamMonoRT commented Mar 7, 2022

This test run is pretty regularly timing out -- can we get someone to investigate if the slowness is a bug, or if we need to adjust the timeout?

This PR (#66157) should help ease the timeouts seen in last couple weeks. Even with that fix, the lane is 2.5+hrs long. Still discussing this, but we might possibly 1. want to exclude certain long running tests as part of PR runs in this lane, 2. Extend the timeout to stabilize CI in the short term

@agocke
Copy link
Member

agocke commented Mar 7, 2022

@SamMonoRT which PR?

@SamMonoRT
Copy link
Member

#66157

@agocke
Copy link
Member

agocke commented Mar 7, 2022

Looks like that resolved the problem. I'm going to close this out for now.

@agocke agocke closed this as completed Mar 7, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Apr 7, 2022
@EgorBo
Copy link
Member Author

EgorBo commented Jan 24, 2023

It doesn't look fixed to me, every time this job is triggered it takes 4-5 hours, e.g. https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_apis/build/builds/146722/logs/1353 (from #81094)

and since it's not an optional pipeline I think it's either has to be moved to be so or not all of the tests have to be precompiled with AOT.

@EgorBo
Copy link
Member Author

EgorBo commented Jan 24, 2023

I've wrote a quick parser for the output (for today's PR ^) and sorted assemblies by the time it takes to run LLVM (opt and llc) for them:

image

@EgorBo
Copy link
Member Author

EgorBo commented Jan 24, 2023

E.g. just by moving AdvSimd tests alone to an outerloop pipeline we can save ~30 minutes (4 dlls)

@steveisok
Copy link
Member

E.g. just by moving AdvSimd tests alone to an outerloop pipeline we can save ~30 minutes (4 dlls)

I think I'd rather move the whole thing out and then analyze what we can run per PR.

@dotnet dotnet unlocked this conversation Jan 24, 2023
@steveisok
Copy link
Member

@EgorBo thanks for putting together the updated list!

@SingleAccretion
Copy link
Contributor

SingleAccretion commented Jan 24, 2023

Wanted to mention that we should be careful to leave enough testing on PRs to reliably catch failures introduced by adding new Jit tests. In my experience these are not uncommon.

steveisok added a commit that referenced this issue Jan 25, 2023
Since it takes quite a while to complete this leg, we should move it off of running every PR.

Addresses #65626
@akoeplinger akoeplinger removed this from the 7.0.0 milestone Dec 6, 2023
@akoeplinger akoeplinger added this to the 9.0.0 milestone Dec 6, 2023
@SamMonoRT
Copy link
Member

@kotlarmilos @vitek-karas - not sure if this is something your team owns now and what more remains here? Please can you re-assign as appropriate.

@steveisok
Copy link
Member

I'll take this as it likely has to do w/ the aot compiler performance itself.

@steveisok steveisok assigned steveisok and unassigned SamMonoRT Feb 8, 2024
@agocke
Copy link
Member

agocke commented Feb 8, 2024

My general philosophy is, "PR is for fast reliable tests" so I agree with the theory of moving everything out, then moving things back in that meet that criteria. Ideally we can find the sweet spot of fast + high confidence in finding bugs.

@steveisok steveisok removed their assignment Aug 6, 2024
@steveisok steveisok modified the milestones: 9.0.0, Future Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants